Hadoop Hive Data Connection- FineDataLink Help Document

Last update: September 19, 2024

Overview

Version

FineDataLink Version	Functional Change
4.0.4.2	/
4.0.29	Hadoop Hive (HDFS) and Hadoop Hive data sources were merged into Hadoop Hive at New Data Connection.
4.1.3	You can create a partition table in the Hadoop Hive database when writing data into it.

Application Scenario

Hadoop is a widely used distributed computing solution, and Hive is a data warehouse framework built on top of Hadoop.

FineDataLink supports connection to Hadoop Hive for data reading/writing using scheduled tasks.

Usage Instruction

1. For versions before 4.0.29 of FineDataLink, you are advised to use the Hadoop Hive data connection to read database data. Use the Hadoop Hive (HDFS) data connection to ensure the write performance.

2. For FineDataLink 4.0.29 and later releases, you can skip HDFS Setting (required for writing data into the Hadoop Hive database) when creating a data connection if you only want to read data from the Hadoop Hive database. For details, see the Procedure section of this article.

Preparation

Version and Driver

Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.

Supported Database Version	Driver Package Download	Log Package Download
hive_1.1	Hive1.1.zip	Log JAR.rar Decompress the Log JAR file and upload the extracted files to FineDataLink together with the driver.
Hadoop Hive 1.2, Hive 2.3, Hive 2.1.2, Hive 2.1.1, and Hive 3.3.1	Hadoop Hive.rar

Supported Database Version

Driver Package Download

Log Package Download

hive_1.1

Hive1.1.zip

Log JAR.rar

Decompress the Log JAR file and upload the extracted files to FineDataLink together with the driver.

Hadoop Hive 1.2, Hive 2.3, Hive 2.1.2, Hive 2.1.1, and Hive 3.3.1

Hadoop Hive.rar

Connection Information Collection

Collect the following information before connecting FineDataLink to the database.

IP address and port number of the database server
Database name
If the authentication method is Username & Password, collect the username and the password. If the authentication method is Kerberos, collect the keytab key path and the client principal information.
HDFS file system address (IP address and port number, which is required for writing data into the Hadoop Hive database to ensure the performance, and can be skipped if you only want to read data from the Hadoop Hive database)

Note: Ensure that the FineDataLink server can access the HDFS file system port. For example, if the default port number of the HDFS file system is 8020 and the server has a firewall enabled, refer to the following content to open Port 8020.

For details about the steps to open ports on the Windows system, see Setting Inbound and Outbound Rules on Windows Server.

For details about the steps to open ports on the Linux system, see Linux Firewall Usage and Configuration.

Procedure

1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, and click New Data Connection.

Note: If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission.

2. Click the Hadoop Hive icon.

3. Set Driver to Custom, select the uploaded driver mentioned in the Version and Driver section, and fill in the connection information.

The following table describes the setting items.

Setting Item	Description
Authentication Method	There are two options, namely Username Password and Kerberos. For details of Kerberos authentication, see Kerberos Authentication in Data Connection. Note the following items when using Kerberos authentication. Before connecting, check if the IP address corresponding to the machine name in the hosts file in the /etc directory is a LAN address. Check if the machine name in the hostname file (/etc/hostname) is consistent with the one in the hosts file (/etc/hosts). Check if the IP address and machine name configured in the hosts file of the machine where FineDataLink is located are correct. Configure the hosts file in the /etc directory for establishing a local connection. Add the remote mapping information, including the IP address and machine name. For example, 192.168.5.206 centos-phoenix.
HDFS Setting	1. This setting item is not required if you only want to read data from the Hadoop Hive database. 2. If you need to write data into the Hadoop Hive database, set the value to the address of the active node in the Hadoop HDFS file system to ensure write performance in a format of hdfs://IP address: Port number. For example, hdfs://192.168.101.119:8020.

Setting Item

Description

Authentication Method

There are two options, namely Username Password and Kerberos.

For details of Kerberos authentication, see Kerberos Authentication in Data Connection.

Note the following items when using Kerberos authentication.

Before connecting, check if the IP address corresponding to the machine name in the hosts file in the /etc directory is a LAN address.
Check if the machine name in the hostname file (/etc/hostname) is consistent with the one in the hosts file (/etc/hosts).
Check if the IP address and machine name configured in the hosts file of the machine where FineDataLink is located are correct.
Configure the hosts file in the /etc directory for establishing a local connection. Add the remote mapping information, including the IP address and machine name. For example, 192.168.5.206 centos-phoenix.

HDFS Setting

1. This setting item is not required if you only want to read data from the Hadoop Hive database.

2. If you need to write data into the Hadoop Hive database, set the value to the address of the active node in the Hadoop HDFS file system to ensure write performance in a format of hdfs://IP address: Port number. For example, hdfs://192.168.101.119:8020.

4. Click Test Connection. If the connection is successful, click Save to save the configuration.

Data Source Usage

The data source can be used for data reading and writing in Data Synchronization and Data Transformation nodes.

For FineDataLink 4.1.3 and later releases, you can create a partition table in the Hive database when write data into it.

Previous：GBase 8a Data Connection

Next：MySQL Data Connection

Helpful
Not helpful
Only read

中文（简体）

English

Hadoop Hive Data Connection