Apache Impala Data Connection- FineDataLink Help Document

Last update: November 25, 2024

Overview

Version

FineDataLink Version	Function Description
3.5	Scheduled Task supported data reading from Apache Impala.
4.0.1.1	Scheduled Task supported data writing into Apache Impala.
4.1	Data Service supported Apache Impala.
4.1.8.3	Supported Apache Impala 3.4 and 4.1.
4.2.0.2	Added Kudu Setting to the data connection setting items: Scheduled Task supported data reading from and writing into the Kudu table. Pipeline Task supported data writing into the Kudu table. Data Service supported the publication of Kudu data.

Function Description

FineDataLink supports connection to Apache Impala for data reading and writing using scheduled tasks, data writing using pipeline tasks, and releasing data services.

Starting from FineDataLink 4.2.0.2, Kudu Setting is added to the data connection setting items.

Scheduled Task supports data reading from and writing into the Kudu table.
Pipeline Task supports data writing into the Kudu table.
Data Service supports the publication of Kudu data.

Preparation

Prerequisite

1. For details, see Overview of Data Connection.

2. For FineDataLink 4.2.0.2 and later versions, you need to use Kudu of V1.7.0 and later versions to read data from and write data into the Kudu table.

3. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address.

Version and Driver

Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.

Supported Database Version	Driver Package
Impala 2.2 Impala 2.3 Impala 2.8 Impala 2.9 Impala 2.10 Impala 3.4 Impala 4.1	ImpalaJDBC41.zip
Impala 2.10 Kudu1.5	ClouderaImpalaJDBC41_2.5.43.rar

Supported Database Version

Driver Package

Impala 2.2

Impala 2.3

Impala 2.8

Impala 2.9

Impala 2.10

Impala 3.4

Impala 4.1

Impala 2.10 Kudu1.5

Connection Information Collection

Collect the following information before connecting FineDataLink to the database.

IP address and port number of the database server
Database name
Username and password (if username and password authentication is used); client principal and keytab file path (if Kerberos authentication is used)
Kudu address (if you need to read data from or write data into the Kudu table using FineDataLink of V4.2.0.2 and later versions)

Procedure

1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, select a folder, and create a data connection, as shown in the following figure.

2. Set the data connection name. You can also modify the directory of the data connection.

3. Find the data source by searching the data source name or filtering the data source by Data Source Type, Supported Form, and Compatible Module, as shown in the following figure.

4. Click Custom, select the driver uploaded in the "Version and Driver" section, and enter the connection information collected in the "Connection Information Collection" section.

Setting items are described in the following table.

Setting Item	Description
Authentication Method	Apache Impala supports Kerberos authentication. For details about the Kerberos authentication method, see Kerberos Authentication in Data Connection.
Kudu Address	Note: This setting item is available in FineDataLink 4.2.0.2 and later versions. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address. Specify the Kudu Master address in the format of IP address:Port number, with multiple addresses separated by comma (,).

Setting Item

Description

Authentication Method

Apache Impala supports Kerberos authentication. For details about the Kerberos authentication method, see Kerberos Authentication in Data Connection.

Kudu Address

Note:

This setting item is available in FineDataLink 4.2.0.2 and later versions. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address.

Specify the Kudu Master address in the format of IP address:Port number, with multiple addresses separated by comma (,).

5. Click Test Connection. If the connection is successful, click Save, as shown in the following figure.

Data Source Usage

Data Development - Scheduled Task

1. Scheduled Task supports data reading from and writing into Apache Impala. To write data into Apache Imapla, Kudu Address on the data connection setting page must be configured.

2. When you write data into Apache Impala using scheduled tasks and set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.

3. When you write data into Apache Impala using scheduled tasks, setting a logical primary key is not supported, and you must specify a physical primary key.

4. Scheduled Task supports data reading from and writing into the Kudu partition table.

If you set Target Table to Existing Table, you can click View Partition Key Setting to view the partition key settings.
If you set Target Table to Auto Created Table, you can configure the partition key after configuring the physical primary key.

Setting items are described in the following table.

Setting Item

Description

Partitioning Method

Supported methods include Range Partitioning and Hash Partitioning.

You can use the two methods to specify the partition at the same time.

Range Partitioning

Setting Item

Description

Partition Field

The dropdown list displays the mapped field that has been set as the primary key. The field chosen as the partition field in Hash Partitioning cannot be selected, and a prompt appears when you click the field.

You can select multiple partition fields, in which case the value retrieval method in Partition Configuration only supports Specific Value.

Partition Configuration

You can retrieve the value by specifying the interval or the specific value.

You can add multiple partitions and specify the interval and the specific value at the same time.

Hash Partitioning

You can configure multiple partitions.

Setting Item

Description

Partition Field

The dropdown list displays the mapped field that has been set as the primary key. The field chosen as the partition field in Range Partitioning cannot be selected, and a prompt appears when you click the field.

Partition Configuration

You can set the number of partitions for all partition fields, entering integers not less than 2.

Number of Partitions (Hash) of each partition field only supports one value.

You can create multiple partition fields using different fields.

After you configure the partition key, the selected partition field is marked as the partition key in the field mapping area.

Pipeline Task

Starting from FineDataLink 4.2.0.2, Pipeline Task supports data writing into Kudu tables in the Impala database. If you set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.

Synchronization Without Primary Key is not supported when the Kudu table is the target data source of a pipeline task.

Partition key setting is supported when you choose Auto Created Table as the target table of a pipeline task. The configuration page and method are the same as those described in the "Data Development - Scheduled Task" section of this document.

Data Service

Starting from FineDataLink 4.2.0.2, Data Service supports the publication of Kudu data.

Previous：SAP HANA Data Connection

Next：ClickHouse Data Connection

Helpful
Not helpful
Only read

中文（简体）

English

Apache Impala Data Connection

Overview

Version

Function Description

Preparation

Prerequisite

Version and Driver

Connection Information Collection

Procedure

Data Source Usage

Data Development - Scheduled Task

Pipeline Task

Data Service

附件列表