Manage the Spark SQL Connector

The Composer Spark SQL connector lets you access the data available in Spark SQL databases using the Composer client. The Composer Spark SQL connector supports Spark SQL versions 2.3 through 3.0.

Before you can establish a connection from Composer to Spark SQL storage, a connector server needs to be installed and configured. See Manage Connectors and Connector Servers for general instructions and Connect to Spark SQL for details specific to the Spark SQL connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Create Dashboards.

Composer Feature Support

Spark SQL connector support for specific Composer features is shown in the following table.

Key: P - Supported; O - Not Supported; N/A - not applicable

Feature Supported? Notes
Admin-Defined Functions P  
Box Plots P
Custom SQL Queries P  
Derived Fields (Row-Level Expressions) P
Distinct Counts P
Fast Distinct Values N/A  
Group By Multiple Fields P  
Group By Time P  
Group By UNIX Time P  
Histogram Floating Point Values P  
Histograms P  
Kerberos Authentication P
To enable Kerberos authentication, see Connect to Spark SQL Sources on a Kerberized HDP Cluster.
Last Value P  
Live Mode and Playback P  
Multivalued Fields N/A
Nested Fields N/A  
Partitions P  
Pushdown Joins for Fusion Data Sources P  
Schemas P  
Text Search N/A  
User Delegation O
Wild Card Filters P  
Wild Card Filters, Case-Insensitive Mode P
Wild Card Filters, Case-Sensitive Mode P

Connect to Spark SQL

When establishing a connection to Spark SQL, you need to provide the following information when setting up the partition settings.

Configure the partition settings. For the partitioned fields you can select one of the following options:

  • No
  • Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.

For the Configure column, numeric and time-based fields may be edited:

  • Numeric types including Number and Integer - ability to select a default aggregation function
  • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied

Select fields for Distinct Counts as needed.

When you create a data source, the specific number of distinct values for the attribute fields are saved in Composer depending on the data sample from your data set. You can filter the data on your visual by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), select Refresh in the Statistics column.