elkproxy

A highly configurable filtering proxy server for Elastic Search

View on GitHub

 logo elkproxy

ElkProxy is a highly configurable proxy server that sits in front of an Elastic Search server and filters/rewrites searches, document downloads, uploads, and updates.

The proxy uses a plugin system and a json based configuration file that scripts the plugins.

Installing elkproxy

cd elkproxy/app
python3 setup.py install

Running elkproxy

elkproxy --config myconfig.json

Demo environment

This repository comes with a docker compose demo environment based on the official open source version of the ELK (Elastic Search, Logstash, Kibana) stack. It can be started with the following command

docker compose up -d

and will expose kibana with some testing data on http://localhost:5601. You can modify elkproxy/app/config.json and run

docker-compose stop elkproxy
docker-compose up elkproxy

to try out different configurations. If you run the elkproxy container without -d, logging configured in config.json will print to your terminal, making it possible to debug your plugin scripts.

Config language

At the top level, the configuration consist of a JSON object with a set of properties described below

Upstream ElastSearch url

"upstream": "http://elasticsearch:9200"

The url to the upstream ElastSearch that elkproxy should connect to.

Server hostname and port

"host": "localhost:9200"

The hostname (interface) and port that elkproxy should be reachable at.

Logging

"logging": {
    "version": 1,
    "loggers": {...},
    "handlers": {...},
    "formatters": {...}}

Python logging configuration, in the format that can be supplied to logging.config.dictConfig.

Plugin points and script lists

"auths": [{"type": "cookie",
           "args": {...}}
          ...],
"query_filters": [{"type": "template",
                   "args": {...}},
                  {"type": "rest",
                   "args": {...}}
                  ...],
"doc_savers": [{"type": "template",
                "args": {...}},
               {"type": "template",
                "args": {...}}
               ...]

The main operation of elkproxy is controlled by script lists run at various plugin points. The plugin points are for authentication plugins, filters for elastic search queries and filters for uploaded documents, respectively.

A script list is executed in order from top to bottom until a line is reached that is able to process the request. That is, if multiple lines match a given request, the first one will be executed.

Each script list line consists of a type and some arguments. The type is the name of a plugin class (a python entrypoint registered by an installed python package), and the arguments are (named) constructor arguments to the python class. It is entirely up to the class to determine what arguments are accepted or required, or what they mean.

Sakstig expressions and sakform templates

Some of the plugins can be scripted using sakstig expressions for matching requests, and using sakform templates for generating queries and documents. These allow fine grained programmatic control of JSON data matching and transformation, respectively.

Sakstig extensions

http()

The SakStig function

http(url, {name=value...})

makes a http request using the python requests module. The optional dictionary of named parameters is sent to the requests method except for the parameter method that is used to select the method itself (it defaults to get). The function returns a QuerySet with one dictionary in it with the following members:

{"status": HTTP_STATUS,
 "content": STRING_OR_JSON_OBJECT,
 "headers": {NAME:VALUE...}}

Execution context

Plugins run in a plugin point dependent context, described under each plugin point below. However, all contexts share a kwargs member with a dictionary of the http request properties. Plugins are free to modify these properties even_if they do not match the current request, and script execution continues with the next line.

KWARGS = {"metadata": METADATA_DICT,
          "method": HTTP_METHOD,
          "params": URL_ARGS,
          "headers": HTTP_HEADERS,
          "path": URL_PATH}}

METADATA is a dictionary that starts out empty, but that can be populated and modified by plugins. It is not used when generating the http request to the upstream ElasticSearch server. Instead, it can be used to transfer data between blugins, e.g. for storing user authentication data by the auth plugins that is later accessed by e.g. query filter plugins to generate restricted queries for the current user.

Available plugins

Plugins common to all plugin points

Return

{"type": "return",
 "args": {"match": "$[@.kwargs.path is '_template/kibana_index_template:.kibana']",
          "document": {"response": {}, "status":200}}}

The return plugin matches the current request using a sakstig expression, and uses a sakform template to generate a direct response, without passing the request on to the real ElasticSearch server.

Since the sakstig expression can do http queries, it is possible to use this to e.g. check if a document already exists, or has a certain property set to some value, before allowing a document upload to proceed.

Auth plugins

Authentication plugins sets $.kwargs.metadata.username to the name of the authenticated user if they succeed.

{"type": "cookie",
 "args": {"name": "COOKIE NAME", "secret": "MY SECRET", "hashfunction": "md5"}}

The cookie plugin authenticates against a simple hash signed cookie. The cookie value must be on the format

username:hash

where hash must match hashfunction(username+secret). The supported hashfunctions are the ones available in the python hashlib module.

Query filter plugins

Query filter plugins provide extra filter context query terms that are logically ANDed to elastic search queries. They are run in the following context:

{"query": ES_QUERY,
 "body": REQUEST_BODY,
 "kwargs": KWARGS}

Template

{"type": "template",
 "args": {"match": "$[@.query.*.bool.filter.*.term.type is 'index-pattern']",
          "filter": {"term": {"config.username" : {"$": "$.kwargs.metadata.username"}}}}}

The template plugin matches the query using a sakstig expression, and uses a sakform template to generate the extra query terms to be ANDed with the user query.

Both match and filter are optional - if match is missing, the line matches any query, if filter is missing, no extra terms are added to the query.

Document saver plugins

Document saver plugins can rewrite documents uploaded to ElasticSearch. They are run in the following context:

{"body": DOCUMENT_BODY,
 "kwargs": KWARGS}

Template

{"type": "template",
 "args": {"match": "$[@.body.type is 'mydoc']",
          "template": {
              "username": {"$": "$.kwargs.metadata.username"},
              "_": {"$": "$.body + @template()"}}}}

The template plugin matches the document body using a sakstig expression, and uses a sakform template to rewrite it.

Both match and template are optional - if match is missing, the line matches andy document, if template is missing the document is left unchanged.

Example config

Example config.json suitable for kibana.

How to write plugins

Plugins are python classes that are registered using setuptools entrypoints. The entry points are named after the plugin points, prefixed by “elkproxy_”. Here, an auths plugin “myplugin” is registered:

setup(
    entry_points={
        'elkproxy_auths': [
            'myplugin = mypackage.mymodule:MyPluginClass'
        ]})

Plugin classes are instantiated with keyword constructor arguments taken from the script line. It is entirely up to the plugin what arguments to accept or require, and how to interpret them.

Plugin instances must be callable, and are always called with a dictionary containing at least two members: body and kwargs. kwargs is the request kwargs, as described above, while body depends on the plugin point - for query_filters, this would be the query, for doc_savers its the document, and for auths it’s just None.

Plugins are expected to return either False, meaning that the script list execution continues, or some other value, meaning that the script list execution finishes with that value as output. For query_filters this is a query term to logically AND with the original query, for doc_savers this is the rewritten document body. For auths the output value is ignored.

Note that plugins can modify the value of kwargs, even if they return False.

class MyPluginClass(object):
    def __init__(self, **args):
        pass
    def __call__(self, context):
        return True