Overview
Version
FineDataLink Version | Function Description |
---|---|
3.5 | Scheduled Task supported data reading from Apache Impala. |
4.0.1.1 | Scheduled Task supported data writing into Apache Impala. |
4.1 | Data Service supported Apache Impala. |
4.1.8.3 | Supported Apache Impala 3.4 and 4.1. |
4.2.0.2 | Added Kudu Setting to the data connection setting items:
|
Function Description
FineDataLink supports connection to Apache Impala for data reading and writing using scheduled tasks, data writing using pipeline tasks, and releasing data services.
Starting from FineDataLink 4.2.0.2, Kudu Setting is added to the data connection setting items.
Scheduled Task supports data reading from and writing into the Kudu table.
Pipeline Task supports data writing into the Kudu table.
Data Service supports the publication of Kudu data.
Preparation
Prerequisite
1. For details, see Overview of Data Connection.
2. For FineDataLink 4.2.0.2 and later versions, you need to use Kudu of V1.7.0 and later versions to read data from and write data into the Kudu table.
3. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address.
Version and Driver
Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.
Supported Database Version | Driver Package |
---|---|
Impala 2.2 Impala 2.3 Impala 2.8 Impala 2.9 Impala 2.10 Impala 3.4 Impala 4.1 | ImpalaJDBC41.zip |
Impala 2.10 Kudu1.5 | ClouderaImpalaJDBC41_2.5.43.rar |
Connection Information Collection
Collect the following information before connecting FineDataLink to the database.
IP address and port number of the database server
Database name
Username and password (if username and password authentication is used); client principal and keytab file path (if Kerberos authentication is used)
Kudu address (if you need to read data from or write data into the Kudu table using FineDataLink of V4.2.0.2 and later versions)
Procedure
1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, select a folder, and create a data connection, as shown in the following figure.
2. Set the data connection name. You can also modify the directory of the data connection.
3. Find the data source by searching the data source name or filtering the data source by Data Source Type, Supported Form, and Compatible Module, as shown in the following figure.
4. Click Custom, select the driver uploaded in the "Version and Driver" section, and enter the connection information collected in the "Connection Information Collection" section.
Setting items are described in the following table.
Setting Item | Description |
---|---|
Authentication Method | Apache Impala supports Kerberos authentication. For details about the Kerberos authentication method, see Kerberos Authentication in Data Connection. |
Kudu Address | ![]() Specify the Kudu Master address in the format of IP address:Port number, with multiple addresses separated by comma (,). |
5. Click Test Connection. If the connection is successful, click Save, as shown in the following figure.
Data Source Usage
Data Development - Scheduled Task
1. Scheduled Task supports data reading from and writing into Apache Impala. To write data into Apache Imapla, Kudu Address on the data connection setting page must be configured.
2. When you write data into Apache Impala using scheduled tasks and set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.
3. When you write data into Apache Impala using scheduled tasks, setting a logical primary key is not supported, and you must specify a physical primary key.
4. Scheduled Task supports data reading from and writing into the Kudu partition table.
If you set Target Table to Existing Table, you can click View Partition Key Setting to view the partition key settings.
If you set Target Table to Auto Created Table, you can configure the partition key after configuring the physical primary key.
Setting items are described in the following table.
Setting Item | Description | ||||||
---|---|---|---|---|---|---|---|
Partitioning Method | Supported methods include Range Partitioning and Hash Partitioning. You can use the two methods to specify the partition at the same time. | ||||||
Range Partitioning |
| ||||||
Hash Partitioning | You can configure multiple partitions.
|
After you configure the partition key, the selected partition field is marked as the partition key in the field mapping area.
Pipeline Task
Starting from FineDataLink 4.2.0.2, Pipeline Task supports data writing into Kudu tables in the Impala database. If you set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.
Synchronization Without Primary Key is not supported when the Kudu table is the target data source of a pipeline task.
Partition key setting is supported when you choose Auto Created Table as the target table of a pipeline task. The configuration page and method are the same as those described in the "Data Development - Scheduled Task" section of this document.
Data Service
Starting from FineDataLink 4.2.0.2, Data Service supports the publication of Kudu data.