Confirming the IP Address and the Port Number in HDFS URL- FineDataLink Help Document

Last update: March 11, 2025

Overview

An HDFS URL setting item appears when you configure a Transwarp Inceptor data connection or a Hadoop Hive data connection, as shown in the following figure.

The HDFS URL is described as follows.

Enter the address of an active node in the Hadoop HDFS file system.
Fill in the address in the format of hdfs://IP address: Port number. For example, hdfs://192.168.101.119:8020.

This document introduces the method to determine the IP address and the port number in the HDFS address.

Procedure

Determining the Port Number

Execute the following SQL statement in the database: desc formatted Database name.Table name. Check information in the Location row in the query result.

Case One

You can get the port number and the hostname from the query result, as shown in the following figure, where the port number is 9000 and the hostname is hive1.

2.1.1-1.png

Case Two

In some scenarios, the Location row in the query result does not contain a port number, as shown in the following figure, where the hostname is HDFS-HA. In this case, you can use the default port number 8020.

2.1.2-1.png

Case Three

Sometimes, you cannot confirm the port number from the Location row.

2.1.3-1.png

This occurs in a high availability (HA) HDFS cluster. In a HA HDFS cluster, there are two NameNodes with equal status. One is the Active NameNode, and the other is the Standby NameNode. Both NameNodes start simultaneously, but only one NameNode enters the working state. At any given time, one NameNodes is in an Active state, while the other is in a Standby state. The Active NameNode is responsible for serving all client requests, while the Standby NameNode maintains enough state to provide a fast failover if necessary. You can confirm the HDFS node address through the hdfs-site.xml file. Typically, the configuration in the file is as follows, and you can obtain the NameNode's RPC port from it.

hdfs-site.xml 
<!-- Configure the cluster identifier for HA setup (myclustser). -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- NameNode identifiers -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- RPC communication port number for NameNode1 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node00:8020</value>
</property>
<!-- RPC communication port number for NameNode2 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node01:8020</value>
</property>
<!-- HTTP communication port number for NameNode1 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node00:50070</value>
</property>
<!-- HTTP communication port number for NameNode2 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node01:50070</value>
</property>
<!-- Configure shared storage configuration for JournalNodes (JN). -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node00:8485;node01:8485;node02:8485/mycluster</value>
</property>
<!-- Configure the failover proxy provider class. -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
</value>
</property>
<!-- Configure a fencing method to kill the previous Active NameNode during failover. -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- Configure the SSH private key for fencing. -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- Directory for JN metadata storage -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/software/hadoop/hdfs/journalnode/data</value>
</property>
<!-- Enable automatic failover when NameNode fails. -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

Obtaining the IP Address

You can use the ping command in the database server to check the connectivity to the corresponding hostname.

As the hostname in Case One is hive1, you can perform the ping test from the server where the Hive database is located, as shown in the following figure.

From the above figure, the IP address of HDFS is 192.168.101.243.

For Case Two, you can query the active IP address on the CDH platform, which is 192.168.9.188, as shown in the following figure.

In Case Three, a HA HDFS cluster contains one Active NameNode and one Standby NameNode and only the Active NameNode can be connected. You need to use the HA connection method to ensure that your connection is always directed to the Active NameNode, even if a failover occurs. The following describes the configuration steps.

Note: High availability configuration is not supported in FineDataLink of versions before 4.1.13.2. You need to confirm the master node address.

Use the following command to query Active and Standby NameNode status.

Commonly used services usually provide their own command lines (such as hdfs cli), and you can determine the Active and Standby NameNode status through specific commands and Node ID.

The following is a command used to query HDFS NameNode status.

hdfs haadmin

You can check HDFS parameters on the CDH platform. The default path of the HDFS configuration file is /etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml. In the hdfs-site.xml file, you can find high availability related configurations.

High availability configuration:

<property> 
  <name>dfs.ha.namenodes.sdg</name> 
  <value>namenode25,namenode30</value> 
</property>

In the above example, there are two Namenodes. One is the Active NameNode, and the other is the Standby NameNode.

You can use the hdfs haadmin -getServiceState command to check the node status.

$ hdfs haadmin -getServiceState namenode30
active
$ hdfs haadmin -getServiceState namenode25
standby

And then you can confirm the IP address of the active node.

Note: FineDataLink of 4.1.13.2 and later versions supports high availability configuration.

Configuration Item	Description
HDFS Address	FineDataLink of 4.1.13.2 and later versions supports the configuration of multiple HDFS addresses. Separate HDFS addresses by comma (,). For example, hdfs://IP address 1:Port number 1,hdfs://IP address 2:Port number 2,hdfs://IP address 3:Port number 3. The system uses the multiple HDFS addresses you configured to construct the configuration file for connecting to HDFS.

Configuration Item

Description

HDFS Address

FineDataLink of 4.1.13.2 and later versions supports the configuration of multiple HDFS addresses. Separate HDFS addresses by comma (,).

For example, hdfs://IP address 1:Port number 1,hdfs://IP address 2:Port number 2,hdfs://IP address 3:Port number 3.

The system uses the multiple HDFS addresses you configured to construct the configuration file for connecting to HDFS.