Apache Impala Data Connection

  • Last update: November 25, 2024
  • Overview 

    Version 

    FineDataLink VersionFunction Description
    3.5Scheduled Task supported data reading from Apache Impala.
    4.0.1.1Scheduled Task supported data writing into Apache Impala.
    4.1Data Service supported Apache Impala.
    4.1.8.3Supported Apache Impala 3.4 and 4.1.
    4.2.0.2

    Added Kudu Setting to the data connection setting items: 

    • Scheduled Task supported data reading from and writing into the Kudu table.

    • Pipeline Task supported data writing into the Kudu table.

    • Data Service supported the publication of Kudu data.

    Function Description

    FineDataLink supports connection to Apache Impala for data reading and writing using scheduled tasks, data writing using pipeline tasks, and releasing data services.

    Starting from FineDataLink 4.2.0.2, Kudu Setting is added to the data connection setting items.   

    • Scheduled Task supports data reading from and writing into the Kudu table.

    • Pipeline Task supports data writing into the Kudu table.

    • Data Service supports the publication of Kudu data.

    Preparation 

    Prerequisite 

    1. For details, see Overview of Data Connection.

    2. For FineDataLink 4.2.0.2 and later versions, you need to use Kudu of V1.7.0 and later versions to read data from and write data into the Kudu table.

    3. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address.

    Version and Driver 

    Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management

    Supported Database VersionDriver Package

    Impala 2.2

    Impala 2.3

    Impala 2.8 

    Impala 2.9 

    Impala 2.10 

    Impala 3.4 

    Impala 4.1

    ImpalaJDBC41.zip

    Impala 2.10 Kudu1.5

    ClouderaImpalaJDBC41_2.5.43.rar

    Connection Information Collection 

    Collect the following information before connecting FineDataLink to the database.

    • IP address and port number of the database server   

    • Database name

    • Username and password (if username and password authentication is used); client principal and keytab file path (if Kerberos authentication is used)   

    • Kudu address (if you need to read data from or write data into the Kudu table using FineDataLink of V4.2.0.2 and later versions)

    Procedure 

    1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, select a folder, and create a data connection, as shown in the following figure. 

    2. Set the data connection name. You can also modify the directory of the data connection.

    3. Find the data source by searching the data source name or filtering the data source by Data Source TypeSupported Form, and Compatible Module, as shown in the following figure. 

    4. Click Custom, select the driver uploaded in the "Version and Driver" section, and enter the connection information collected in the "Connection Information Collection" section. 

    Setting items are described in the following table.

    Setting ItemDescription
    Authentication MethodApache Impala supports Kerberos authentication. For details about the Kerberos authentication method, see Kerberos Authentication in Data Connection.
    Kudu Address

    iconNote:
    This setting item is available in FineDataLink 4.2.0.2 and later versions. To write data into Apache Impala using scheduled and pipeline tasks, you must configure Kudu Address.

    Specify the Kudu Master address in the format of IP address:Port number, with multiple addresses separated by comma (,).

    5. Click Test Connection. If the connection is successful, click Save, as shown in the following figure.

    Data Source Usage 

    Data Development - Scheduled Task 

    1. Scheduled Task supports data reading from and writing into Apache Impala. To write data into Apache Imapla, Kudu Address on the data connection setting page must be configured.

    2. When you write data into Apache Impala using scheduled tasks and set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.

    3. When you write data into Apache Impala using scheduled tasks, setting a logical primary key is not supported, and you must specify a physical primary key.

    4. Scheduled Task supports data reading from and writing into the Kudu partition table.

    • If you set Target Table to Existing Table, you can click View Partition Key Setting to view the partition key settings.    

    • If you set Target Table to Auto Created Table, you can configure the partition key after configuring the physical primary key. 

    Setting items are described in the following table.

    Setting ItemDescription
    Partitioning Method

    Supported methods include Range Partitioning and Hash Partitioning.

    You can use the two methods to specify the partition at the same time.

    Range Partitioning
    Setting ItemDescription
    Partition Field

    The dropdown list displays the mapped field that has been set as the primary key. The field chosen as the partition field in Hash Partitioning cannot be selected, and a prompt appears when you click the field.

    You can select multiple partition fields, in which case the value retrieval method in Partition Configuration only supports Specific Value.

    Partition Configuration

    You can retrieve the value by specifying the interval or the specific value.

    You can add multiple partitions and specify the interval and the specific value at the same time.


    Hash Partitioning

    You can configure multiple partitions.

    Setting ItemDescription
    Partition FieldThe dropdown list displays the mapped field that has been set as the primary key. The field chosen as the partition field in Range Partitioning cannot be selected, and a prompt appears when you click the field.
    Partition Configuration

    You can set the number of partitions for all partition fields, entering integers not less than 2. 

    Number of Partitions (Hash) of each partition field only supports one value.

    You can create multiple partition fields using different fields.


    After you configure the partition key, the selected partition field is marked as the partition key in the field mapping area.

    Pipeline Task

    Starting from FineDataLink 4.2.0.2, Pipeline Task supports data writing into Kudu tables in the Impala database. If you set Target Table to Existing Table, the system will check whether the selected table is a Kudu table. Writing is disallowed if it is not a Kudu table.

    Synchronization Without Primary Key is not supported when the Kudu table is the target data source of a pipeline task.

    Partition key setting is supported when you choose Auto Created Table as the target table of a pipeline task. The configuration page and method are the same as those described in the "Data Development - Scheduled Task" section of this document.

    Data Service 

    Starting from FineDataLink 4.2.0.2, Data Service supports the publication of Kudu data. 


    附件列表


    主题: Data Source Configuration
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    9s后關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy