Managing the Elasticsearch Connector

The Composer Elasticsearch connector lets you access the data available in the Elasticsearch storage using the Composer client. The Composer Elasticsearch connector supports the following Elasticsearch versions.

  • Elasticsearch 6.0 - 6.7
  • Elasticsearch 7.0 - 7.6

You cannot import or export Elasticsearch data sources (or the visuals and dashboards that use those Elasticsearch data sources) if the version of the Elasticsearch connector in the Composer environment is different from the version used by the data sources. For example, you cannot import an Elasticsearch 6 data source you have exported if your Composer environment only has an Elasticsearch 7 connector defined. When you change connector versions in your Composer environment, we recommend that you also create new data source configurations (and associated visuals and dashboards) for the newer version.

Before you can establish a connection from Composer to Elasticsearch storage, a connector server needs to be installed and configured. See Managing Connectors and Connector Servers for general instructions and Connecting to Elasticsearch for details specific to the Elasticsearch connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Composer v5 Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Creating Dashboards.

This section covers the following topics:

Composer Feature Support

The Elasticsearch connector supports all Composer features, except for:

Connecting to Elasticsearch

When establishing a connection to Elasticsearch, make sure you:

  1. Specify the connection string in the following format:

    Protocol Connection String Format Example
    HTTP/HTTPS <schema>://<host1>:<port1>,...,<hostN>:<postN>/<prefix> http://ip-10-2-2-241.ec2.internal:80/es
    Transport/Transports <schema>://<host1>:<port1>,...,<hostN>:<portN> transports://10.2.2.2:9010,10.2.2.3:9010

    where <schema> is the protocol that you want to use:

    • http or https (with SSL support)
    • transport or transports (with SSL support)

    Bear in mind that you must specify the hosts within one cluster.

  2. Specify your Elasticsearch cluster name.

  3. If required, specify your Elasticsearch User Name and Password.

  4. Select Validate to confirm your connection.

To connect to your Elasticsearch cluster and data set secured by X-Pack, see Support of X-Pack for Elasticsearch.

Connecting to Elasticsearch with a Configured Custom Certificate

If your Elasticsearch cluster is configured with a custom certificate, you should configure a truststore for the Elasticsearch connector.

To connect to an Elasticsearch data store with a configured custom certificate:

  1. Copy a truststore to the machine on which the Elasticsearch connector is running.

  2. Add the following lines to file /etc/zoomdata/edc-elasticsearch-6.0.jvm or /etc/zoomdata/edc-elasticsearch-7.0.jvm. Copy these files from the /opt/zoomdata/conf directory if a copy is not in /etc/zoomdata/.

    -Djavax.net.ssl.trustStore=<path_to_truststore>
    -Djavax.net.ssl.trustStorePassword=<truststore_password>

    Replace:

    • <path_to_truststore> with an absolute path to your truststore
    • <truststore_password> with a password for your truststore

Data Source Configuration Notes

When setting up an Elasticsearch data source configuration, consider the following notes for the Indices tab.

You select the indices and types to be queried, and select the fields to be handled. You can do this in three steps:

  1. Select indices and aliases to be queried.

  2. You can select indices Manually or Automatically.

    • If you want to get the data only from specific indices, select the Manually option and choose the corresponding indices from the list below.

    • The Automatically option is more flexible. It lets you set the pattern by which the indices will be selected automatically.

      For this option, you can select one of the pattern types. Note that when no indices match the pattern while querying, your visuals are returned empty.

      • Native - specify the pattern for index names. Use an asterisk (*) to replace one character or a set of characters.

        For example, you want to get all the indices whose name starts with log and ends with 16. In this case, specify the following pattern:

        log*16 
      • Time-Based - set the time pattern to get the matching indices. Check the supported date and time patterns.

        For example, the time pattern YYYY-MM will return all the indices, whose name will match the pattern in the following examples. Note that if the Index Name includes text with the time and date pattern, you need to enclose the text portion in brackets [ ]:

        Examples:

        Index name Pattern
        2016-01 YYYY-MM
        2016-3 YYYY-Q
        10:23:11 HH:MM:SS
        logstash-2016-06-14 [logstash-]YYYY-MM-DD
  3. The fields for indexes will not be refreshed. If new fields are added to your data source, they are added to Composer only after you select the Refresh Fields button on the Fields tab of the data source configuration. If there are some changes in the existing fields (for example, if a field has been removed) they won't be applied.

  4. Optionally, configure filtering by type. If you need to filter by type, select the Enable Filter By Type checkbox. The type by which filtering will occur is shown. Select Edit to alter the filter by type by selecting one from a list of types available for the selected index

    If the Enable Filter By Type checkbox is cleared, all the types that refer to the selected indices are selected.

    If some fields have different data types in types, you are not able to use them for grouping, filters, and so on. However, the option is still available for raw export.

When you connect to your Elasticsearch data source, the additional service field _type is added. The _type field contains all the selected Elasticsearch types you can visualize as attributes on your visuals.

Working with Elasticsearch

Distinct Counts and Percentiles

Distinct count and percentiles metrics return approximate values in Elasticsearch. The precision of the result returned by distinct count metric depends on the precision threshold setting (default value is 1000).

You can change the value of the precision threshold by setting the elasticsearch.query.cardinality.precision.threshold property in the zoomdata.properties file.

See Elasticsearch's documentation on the following for more information:

The table below lists all available properties that you can modify to work with Elasticsearch.

Property Default Use Notes
elasticsearch.query.cardinality.precision.threshold 1000 control the level of accuracy of the distinct counts The maximum supported value is 40000. However, Composer does not recommend to set such value as it may result in performance issues and the data source itself may return errors. For more info, refer to the Precision Control section by Elasticsearch.
elasticsearch.query.limit.nongrouped 10000 set the limit for the number of non-grouped records (per shard) to execute on.
elasticsearch.query.limit.grouped 10000 set the limit for the number of grouped records (per shard) to execute on.

If you need to change the default settings, you can add the corresponding properties (listed above) to the zoomdata.properties file and assign the required values. For more details about working with the zoomdata.properties file, refer to the topic Managing Configurations in Composer .

Tokenization

Keep in mind that Elasticsearch, by default, tokenizes or analyzes fields that are of type text. As a result, strings consisting of two or more words may become separate fields when connected to Composer (for example, city names like Las Vegas). To disable this process and ensure that a string field is not analyzed, specify its type as keyword:

City: {
type: "keyword"
}

IP Addresses

The IP Address data type is supported for Elasticsearch data connectors. Fields of this type are treated as ATTRIBUTEs and can be used in:

  • An Elasticsearch text search box. When searching via the text search, Composer also supports the CIDR notation for IP addresses as described in the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/ip.html).
  • The Group By selection box.
  • Filters, although Composer does not support CIDR notation in filters for an IP address field. An exact match is required.
  • Row-level expressions. In row-level expressions, Composer treats IP addresses as strings and expect an exact match.

Elasticsearch 7 Composite Aggregation

Composite aggregations are implemented by Elasticsearch 7 connectors. This support optimizes aggregations of Elasticsearch 7 data, except for queries with:

  • histograms
  • time groups with WEEK granularity
  • multiple groups when group fields belong to different nested contexts.

An Elasticsearch 7 configuration property elasticsearch.query.composite-agg.max-fetch-size in the Elasticsearch 7 configuration file (edc-elasticsearch-7.0.properties) can be used to specify the maximum number of buckets to return for each query within a composite aggregation. Valid values must be greater than zero; the default value is 10000. This property corresponds to the Elasticsearch setting search.max_buckets, that also has a default value of 10000. If you elect to increase the value of the elasticsearch.query.composite-agg.max-fetch-size property, be sure to correspondingly increase the value of the Elasticsearch search.max_buckets setting.