Hadoop Hive Data Connection

  • Last update: September 19, 2024
  • Overview

    Version

    FineDataLink VersionFunctional Change
    4.0.4.2/
    4.0.29Hadoop Hive (HDFS) and Hadoop Hive data sources were merged into Hadoop Hive at New Data Connection.
    4.1.3You can create a partition table in the Hadoop Hive database when writing data into it.

    Application Scenario 

    Hadoop is a widely used distributed computing solution, and Hive is a data warehouse framework built on top of Hadoop.

    FineDataLink supports connection to Hadoop Hive for data reading/writing using scheduled tasks.

    Usage Instruction

    1. For versions before 4.0.29 of FineDataLink, you are advised to use the Hadoop Hive data connection to read database data. Use the Hadoop Hive (HDFS) data connection to ensure the write performance.

    2. For FineDataLink 4.0.29 and later releases, you can skip HDFS Setting (required for writing data into the Hadoop Hive database) when creating a data connection if you only want to read data from the Hadoop Hive database. For details, see the Procedure section of this article.

    Preparation

    Version and Driver 

    Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management

    Supported Database VersionDriver Package DownloadLog Package Download
      hive_1.1

    Hive1.1.zip

    Log JAR.rar

    Decompress the Log JAR file and upload the extracted files to FineDataLink together with the driver.

    Hadoop Hive 1.2, Hive 2.3, Hive 2.1.2, Hive 2.1.1, and Hive 3.3.1

    Hadoop Hive.rar 

    Connection Information Collection

    Collect the following information before connecting FineDataLink to the database.

    • IP address and port number of the database server

    • Database name

    • If the authentication method is Username & Password, collect the username and the password. If the authentication method is Kerberos, collect the keytab key path and the client principal information.

    • HDFS file system address (IP address and port number, which is required for writing data into the Hadoop Hive database to ensure the performance, and can be skipped if you only want to read data from the Hadoop Hive database)

    Note: Ensure that the FineDataLink server can access the HDFS file system port. For example, if the default port number of the HDFS file system is 8020 and the server has a firewall enabled, refer to the following content to open Port 8020.

    For details about the steps to open ports on the Windows system, see Setting Inbound and Outbound Rules on Windows Server.

    For details about the steps to open ports on the Linux system, see Linux Firewall Usage and Configuration.

    Procedure

    1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, and click New Data Connection.

    Note: If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission

    2. Click the Hadoop Hive icon.

    3. Set Driver to Custom, select the uploaded driver mentioned in the Version and Driver section, and fill in the connection information. 

    The following table describes the setting items.

    Setting ItemDescription
    Authentication Method

    There are two options, namely Username Password and Kerberos.

    For details of Kerberos authentication, see Kerberos Authentication in Data Connection

    Note the following items when using Kerberos authentication.

    • Before connecting, check if the IP address corresponding to the machine name in the hosts file in the /etc directory is a LAN address.

    • Check if the machine name in the hostname file (/etc/hostname) is consistent with the one in the hosts file (/etc/hosts).

    • Check if the IP address and machine name configured in the hosts file of the machine where FineDataLink is located are correct.

    • Configure the hosts file in the /etc directory for establishing a local connection. Add the remote mapping information, including the IP address and machine name. For example, 192.168.5.206 centos-phoenix.

    HDFS Setting

    1. This setting item is not required if you only want to read data from the Hadoop Hive database.

    2. If you need to write data into the Hadoop Hive database, set the value to the address of the active node in the Hadoop HDFS file system to ensure write performance in a format of hdfs://IP addressPort numberFor example, hdfs://192.168.101.119:8020.

    4. Click Test Connection. If the connection is successful, click Save to save the configuration.

     Data Source Usage 

    The data source can be used for data reading and writing in Data Synchronization and Data Transformation nodes.

    For FineDataLink 4.1.3 and later releases, you can create a partition table in the Hive database when write data into it. 


    附件列表


    主题: Data Source Configuration
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy