Enabling Data Sharpening

Data Sharpening™ is Zoomdata's patented technique to deliver fast and responsive visualizations for large volumes of data. Conceptually, Data Sharpening is similar to the way large image files or streaming video files display in a browser. When you start to load the image file, you see a blurry approximation of the image. But as the file loads in the background, the image sharpens until the entire image eventually comes into clear focus.

When you create or modify a chart for a large data set in Zoomdata, Data Sharpening can immediately display a partial or approximate rendering of the data. Zoomdata continuously updates the chart with more and more data until the fully sharpened result is available.

While the chart(s) sharpens, you can continue to interact with it, zooming into more detail or changing the group-by without having to wait for the entire query to return. Essentially, you can continue your big data exploration without waiting for long-running queries over billions of rows of data to complete. Zoomdata adjusts on-the-fly based on your input. One thing to keep in mind—Data Sharpening may not always be needed when visualizing your data. If Zoomdata is able to complete its query of the data quickly (within a few seconds), then you will simply see the final result rendered in your selected chart. Data Sharpening is a tool that is leveraged when the chart may not render immediately due to the volume of the data being queried.

Data Sharpening is not currently supported by the zEngine introduced in Zoomdata 4.6. To use Data Sharpening, you must currently reenable the legacy version of the query engine. See Reenabling the Legacy Query Engine.

How Data Sharpening Works

Keep in mind that Zoomdata connects to and runs queries in your original data source and can be resource intensive. The full query runs in the background at the same time as a series of microqueries that sample data across partitions and refine estimates.

Prerequisites for Data Sharpening

In order for Zoomdata to perform Data Sharpening with a data source, a "playable" time field is needed. Zoomdata will attempt to automatically detect this playable field from your data source during the source creation. To determine what makes a time attribute "playable," refer to Table 1. Afterward, an appropriate time attribute needs to be specified in your data source's global default settings page. The granularity of this time field will henceforth be referred to as the driving time field granularity (DTFG) and plays an important role in determining whether and how Data Sharpening is executed (this will be further elaborated in the next section). Meanwhile, Data Sharpening Setup and Process walks you through the process of enabling Data Sharpening for your connected data sources.

The setup for Data Sharpening differs slightly depending on the data source. Although a playable time field is required for a source in order for sharpening to occur, the time field requirement is based on the data source. The table below details the time field requirements for the different data sources supported in Zoomdata.

Data Source Time Field Requirement
Amazon Redshift Sort Key (only the first sort key is selected)

Cloudera Impala, Hive

Partitioned time field (The time field that is partitioned needs to be configured from the "Fields" page. A single partitioned column is needed for Data Sharpening to work in Impala or Hive sources.)

Search-based sources
(Cloudera Search, Elasticsearch, Apache Solr)
Indexed time field*. Zoomdata automatically detects for indices.
SQL-based sources
(MySQL, Oracle, PostgreSQL, SQL Server)
Indexed time field*. Zoomdata automatically detects for indices.
* The indexed time field should already be set in the data source, so no additional configuration is needed in Zoomdata.

Determining When Data Sharpening is Executed

When you create a chart, Zoomdata determines whether Data Sharpening is necessary based on the chart style selected and the time attribute parameters that are set for it.

For non-trend visuals (like bars, donuts, and heat maps), the granularity of the driving time field must be less than 10% of the range that is set in the time bar (determined by the MIN/MAX range set in the data source). The minimum granularity used by Zoomdata will always be minute. Thus, even if your DTFG is second, Zoomdata will still use minute when performing this "10% rule" calculation.

For trend visuals (like Line and Bars Trend and Line Trend: Attribute Values), Zoomdata executes an internal check to determine whether Data Sharpening should execute. Similar to the non-trend visuals, a 10% criteria is used, but it is slightly modified for the trend visual scenario. If the granularity of the driving time field for the source is less than 10% of the time granularity set to be used in the particular trend chart, then Data Sharpening will execute.

The bottom line is that Zoomdata will try to perform Data Sharpening when warranted based on the size of the data set, the time attributes available, and the time granularity that is set. If Zoomdata ascertains that results can be rendered in the chart quickly without Data Sharpening, it will do so. Otherwise, Zoomdata will attempt to use Data Sharpening to return near instantaneous result sets that are refined over time until the query completes.

Data Sharpening Setup and Process

To enable the Data Sharpening feature for a data source connected in Zoomdata, you will need to enable the time settings in the data source's settings page. Specifically, the 'playable time field' needs to be set in 'Charts' > 'Global Default Settings' for the data source. However, Cloudera Impala sources require additional configuration as as described in Data Sharpening on Cloudera Impala Sources.

Follow the steps below to enable the time settings:

  1. Log into Zoomdata (either as an administrator or as a user who has been assigned to a group with data source management privileges).
  2. Select the Sources menu item.

  3. Select your data source.

  4. Select the Charts tab.

  5. Select the Time Attribute in Global Default Settings.

  6. Select Enable Playback or Enable Live Mode.

  7. Save your changes.

Data Sharpening on Cloudera Impala Sources

Data Sharpening works with certain partitioned Impala sources. The partitioned field should be a time-based attribute and in a supported time format (for example, yyyy-MM-dd). Follow the steps below to set up Impala for Data Sharpening. An example scenario is provided to illustrate when Data Sharpening would occur when you explore a large data set.

Steps to set up your Zoomdata connection to Cloudera Impala for Data Sharpening:

  1. Log into Zoomdata (either as an administrator or as a user who has been assigned to a group with data source management privileges).

  2. Select the Sources menu item.

  3. Access your Cloudera Impala source from the My Data Sources list.

  4. Navigate to the Fields tab. Locate the time field that will serve as the driving time field for data sharpening. This is the time field that needs to be specified in the global default settings.

  5. Identify the partition type that will be used, and change the setting in the Partitions column. Select a partition type from the drop-down list.

  6. In the Configure column, select an appropriate time granularity, as shown below. Consider the 10% rule to ensure Data Sharpening execution.

  7. Select the Charts page.

  8. Select the Time Attribute in Global Default Settings.

  9. Save your changes.

The scenario below illustrates setting up Cloudera Impala for Data Sharpening.


  • You have 3 years of historical data on Cloudera Impala
  • Your data is partitioned by month (another column Order_Date_Month that contains data from column Order_Date truncated to month)
  • The time stamp in your data provides granularity to the day level (column Order_Date)

Partition Steps

  1. Determine whether there are sub-folders in Impala. If so, the Label must include the full date format (for example, month=201501, which is in time format 'yyyyMM').
  2. Configure the Impala source in the Fields page:
    • For the Order_Date field, make sure "Day" granularity is selected.
    • For the Order_Date_Month field:
      • In the Partitions column, set the partitioned time field to Date (or verify that it is selected)
      • In the Default column, set the option to Pattern and enter the appropriate time format (for this example, the time format is 'yyyyMM').
        select granularity to be Month (make sure time granularity of partitioned column is more than the granularity of linked time field).
      • Link the partition to the field. Select a Time Field from the drop-down list (for this example, 'Order_Date' is selected).
  3. Continue to the Charts page and, in the Global Default Settings option, select Order_Date from the drop-down menu list.
  4. Save your work!

In order for sharpening to work in this example, the time range should be at least 10 times greater than the time interval for the selected chart. So if we show one month's worth of data in the chart, and the time granularity is set to Day, Data Sharpening will execute (30 days in April, which meets the 10% threshold).

Please note that if your Impala partition breaks out time attributes into separate fields, Data Sharpening will not be available. For example, if YEAR, MONTH, and DAY are all separate partitioned fields, they would need to be combined into one field in order for Data Sharpening to be available.

Notes and Caveats

When Zoomdata connects to your data source for the first time, the application runs an initial query to return a sample of the data set—approximately 100 rows of data—to provide an initial time range. Meanwhile, Zoomdata continues to run a comprehensive query to obtain the actual MIN/MAX range based on the entire data set. In this instance, based on the results of the sample query, Data Sharpening may not activate since the time range and time granularity results will most likely fall short of the criteria for sharpening to execute. However, after Zoomdata completes the full query, Data Sharpening will work as expected as long as the correct parameters have been applied and the time criteria are met. Depending on the size of your data set, the comprehensive query process may take a few minutes to complete its execution (additional constraints include the size and type of the data source and other factors such as competing resources on the database and resource and performance limitations).