Managing the Spark SQL Connector

The Zoomdata Spark SQL connector lets you access the data available in Spark SQL databases for visualization and exploration using the Zoomdata client. The Zoomdata Spark SQL connector supports Spark SQL versions 2.3 and 2.4.

Before you can establish a connection from Zoomdata to Spark SQL storage, a connector server needs to be installed and configured. See Managing Connectors and Connector Servers for general instructions and Connecting to Spark SQL for details specific to the Spark SQL connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and charts from your data. See Creating Dashboards and Creating Charts.

Zoomdata Feature Support

The Spark SQL connector supports all Zoomdata features, except for the following features:

Connecting to Spark SQL

When establishing a connection to Spark SQL, you need to provide the following information when setting up the partition settings.

Configure the partition settings. For the partitioned fields you can select one of the following options:

  • No
  • Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.

For the Configure column, numeric and time-based fields may be edited:

  • Numeric types including Number and Integer - ability to select a default aggregation function
  • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied

Select fields for Distinct Counts as needed.

When you create a data source, the specific number of distinct values for the attribute fields are saved in Zoomdata depending on the data sample from your data set. You can filter the data on your chart by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), click the Refresh button in the Statistics column.