Multisource Exploration Enhancements
The following enhancements were made to allow you to explore data from multiple sources in Zoomdata 3.
Zoomdata enhanced support for keysets. Keysets can be used in the multipass and multisource exploration of your data. They allow you to collect a list of unique data values for a field and apply that list to your data analysis of another, or the same, data set.
The keyset creation flow changed. Keysets can now be defined directly from charts and from data points in a chart using the radial menu (the radial menu changed as a result). In prior releases, keysets could only be added within the Save Filters dialog. For complete information about keysets, see Using Keysets.
You can now link fields from different data sources on a dashboard. This allows you to simultaneously apply filters to all charts on a dashboard that use data sources with linked fields. In addition, you can simultaneously apply the same time filter using cross-linked fields in the time bar. This new feature introduces a new dialog and new options on the time bar and filter menus. See Using Cross-Source Links.
Fusion makes multiple data sources appear as a single source, without physically moving the data to a common data store. Data can be combined from relational and nonrelational sources, including structured and unstructured sources.
Zoomdata 3 enhanced data fusion capabilities and processing in the following ways:
You can now explicitly specify the kind of join that occurs: inner join, left outer join, or full outer join. The join type can be selected for the fields on which the join occurs. In past releases, full outer joins were not supported and left outer joins were assumed unless you specified otherwise for specific fields in the fused data.
Joins are usually performed in-memory. However, if a data connector supports pushdown joins and the data to be joined comes via the same data connection, Zoomdata pushes the join operation to the underlying data engines and allows those data stores to join the data instead. This capability is currently supported only for Impala and Hive data stores.
In addition, if the join is an inner join and aggregate functions SUM, MIN, MAX, or COUNT are used in the data, the Zoomdata engine intelligently pushes the aggregate queries to the underlying data engines, thus reducing the amount of data that needs to be processed. This aggregate pushdown occurs when joining data from the same or from different data sources.
See Optimizing Joins for more information.
Because most joins are performed in-memory, a new configurable limit has been placed on the number of records that can be processed from each joined source. This limit is initially set at 1,000,000 records per joined data source and can be configured by your Zoomdata administrator or supervisor using the
qe.zengine.edc.rows.limitproperty in the
query-engine.propertiesfile. See Managing the Zoomdata Query Engine. If you find you are hitting this limit, use filtering to reduce the number of records to be fused.
In the UI, fused attributes are now referred to as join definitions. Only a single join definition is allowed between two data sources. The join definition can contain multiple join conditions (mappings), previously called forms. Join definitions must adhere to specific rules.
Your Fusion data sources from past releases are not automatically upgraded. You must manually edit them and reconfigure them after upgrading to this version. In addition, preexisting dashboards that used your Fusion data sources from previous versions will not work and will need to be recreated.
Rules and limitations of the new data fusion processing introduced in this release are described in Data Fusion Join Rules and Data Fusion Limitations. For complete information about data fusion, see Fusing Data Sources.