Hadoop is a widely used distributed computing solution, and Hive is a data warehouse framework built on top of Hadoop.
FineDataLink supports connection to Hadoop Hive for data reading/writing using scheduled tasks.
1. For versions before 4.0.29 of FineDataLink, you are advised to use the Hadoop Hive data connection to read database data. Use the Hadoop Hive (HDFS) data connection to ensure the write performance.
2. For FineDataLink 4.0.29 and later releases, you can skip HDFS Setting (required for writing data into the Hadoop Hive database) when creating a data connection if you only want to read data from the Hadoop Hive database. For details, see the Procedure section of this article.
Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.
Hive1.1.zip
Log JAR.rar
Decompress the Log JAR file and upload the extracted files to FineDataLink together with the driver.
Hadoop Hive.rar
Collect the following information before connecting FineDataLink to the database.
IP address and port number of the database server
Database name
If the authentication method is Username & Password, collect the username and the password. If the authentication method is Kerberos, collect the keytab key path and the client principal information.
HDFS file system address (IP address and port number, which is required for writing data into the Hadoop Hive database to ensure the performance, and can be skipped if you only want to read data from the Hadoop Hive database)
Note: Ensure that the FineDataLink server can access the HDFS file system port. For example, if the default port number of the HDFS file system is 8020 and the server has a firewall enabled, refer to the following content to open Port 8020.
For details about the steps to open ports on the Windows system, see Setting Inbound and Outbound Rules on Windows Server.
For details about the steps to open ports on the Linux system, see Linux Firewall Usage and Configuration.
1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, and click New Data Connection.
Note: If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission.
2. Click the Hadoop Hive icon.
3. Set Driver to Custom, select the uploaded driver mentioned in the Version and Driver section, and fill in the connection information.
The following table describes the setting items.
There are two options, namely Username Password and Kerberos.
For details of Kerberos authentication, see Kerberos Authentication in Data Connection.
Note the following items when using Kerberos authentication.
Before connecting, check if the IP address corresponding to the machine name in the hosts file in the /etc directory is a LAN address.
Check if the machine name in the hostname file (/etc/hostname) is consistent with the one in the hosts file (/etc/hosts).
Check if the IP address and machine name configured in the hosts file of the machine where FineDataLink is located are correct.
Configure the hosts file in the /etc directory for establishing a local connection. Add the remote mapping information, including the IP address and machine name. For example, 192.168.5.206 centos-phoenix.
1. This setting item is not required if you only want to read data from the Hadoop Hive database.
2. If you need to write data into the Hadoop Hive database, set the value to the address of the active node in the Hadoop HDFS file system to ensure write performance in a format of hdfs://IP address: Port number. For example, hdfs://192.168.101.119:8020.
4. Click Test Connection. If the connection is successful, click Save to save the configuration.
The data source can be used for data reading and writing in Data Synchronization and Data Transformation nodes.
For FineDataLink 4.1.3 and later releases, you can create a partition table in the Hive database when write data into it.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy