Overview
Version
FineDataLink Version | Functional Change |
---|---|
4.0.4 |
|
4.0.14 | The deployment package had a built-in gpfdist file of Greenplum (Parallel Loading). |
4.0.29 | GreenPlum (parallel loading) and Pivotal Greenplum Database were merged as Pivotal Greenplum Database at New Data Connection. |
4.1.2 |
Ignore Source Data If Same Primary Key Exists Record as Dirty Data If Same Primary Key Exists Overwrite Data in Target Table If Same Primary Key Exists |
Function Description
Scheduled Task supports the reading from and writing into the Greenplum database.
Pipeline Task supports data writing into the Greenplum database.
Data Service supports the Greenplum database.
Configuration Instruction
Pipeline Task
If GreenPlum is selected as the target data source in a pipeline task, the COPY loading mode will be used.
Using parallel loading requires specified database privilege.
1. Assign users who need to use a Greenplum data connection the privilege to create schemas in the corresponding database.
2. Create a fdl_temp schema in the target database to store temporary tables, and assign users the privilege to create tables in this schema.
The example command is as follows:
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
ALTER DEFAULT PRIVILEGES IN SCHEMA fdl_temp GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES, TRIGGER ON TABLES TO trans_user ;
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
Scheduled Task
When GreenPlum is selected as the data source in a scheduled task, three load methods are supported, namely, Parallel Loading, COPY Loading, and Common Loading. The differences among the three load methods are shown in the following table.
Load Method | Difference |
---|---|
Common Loading | 1. This method is not suitable for writing data into Greenplum databases. 2. If you only need to read data from a Greenplum database, configure the data connection following the steps in the section "Configuration Without Parallel Loading Setting" of this document. |
Parallel Loading | 1. FineDataLink 4.1.2 and later releases support the writing of JSON fields, but do not support the writing of binary fields. 2. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters. 3. Configure the data connection following the steps in the section "Configuration with Parallel Loading Setting" of this document. ![]() Using Parallel Loading requires specified privileges. |
COPY Loading (New in V4.1.2) | 1. It supports the writing of binary fields and JSON fields. 2. Configure the data connection following the steps in the Configuration Without Parallel Loading Setting section of this article. ![]() |
Assigning the Privilege for Parallel Loading
Using GreenPlum as the target data source in the parallel loading mode requires specified privileges.
1. Assign privileges to create tables and read existing tables in the gpfdist_temp schema.

GRANT USAGE,CREATE ON SCHEMA gpfdist_temp TO trans_user ;
2. Assign privileges to create external tables.
alter role trans_user with createexttable;
3. Assign read privileges on the target table. Using Auto Table Creation requires table creation privilege on corresponding databases.
ALTER DEFAULT PRIVILEGES IN SCHEMA gpfdist_temp GRANT SELECT, INSERT, UPDATE, DELETE,
REFERENCES, TRIGGER ON TABLES TO trans_user ;
Assigning the Privilege for COPY Loading
For details, see the Pipeline Task section of this article.
Data Service
Using a Greenplum database in Data Service requires configuring Parallel Loading Setting. For details on data services, see Overview of Data Service.
Configuration with Parallel Loading Setting
Prerequisite
Confirming the Database Version
Greenplum Database (Parallel Loading) 5.x and 6.x are supported.
Confirming the Data Type
Binary fields cannot be synchronized in the parallel loading mode and trigger an error message during loading. Binary fields can only be loaded via JDBC. For details, see the data connection procedure in this section.
Placing the gpfdist File
The related operations and storage location of the gpfdist file are shown in the following table.
FineDataLink Project | Operation | File Location |
---|---|---|
1. Projects before 4.0.55 2. Projects upgraded from a version before 4.0.14 to a version before 4.0.21 | See the following content of this section. | \webapps\webroot\WEB-INF |
Projects upgraded from a version before 4.0.14 to 4.0.21 and later versions | \webapps\webroot\WEB-INF\assist | |
Projects deployed with 4.0.14 and later installation packages | The driver is built in. Ignore this section. |
Linux System:
You can download the package for Linux systems: gpfdist_linux.tar.gz
1. Upload the downloaded package to the Linux server, and then extract it to the \webapps \webroot\WEB-INF\assist directory.

2. Place the gpfdist file (in the bin folder) at the same level as the lib folder, and then delete the bin folder.
3. Rename the gpfdist_linux folder to gpfdist.
The effect is shown in the following figure.
Windows System:
1. Obtain the installation package.
Create a gpfdist folder in the \webapps\webroot\WEB-INF\assist directory, change the obtained package to an EXE file, and place it in the folder.
2. Check if the server where the database is located can access the 15500 port of the FineDataLink project server as the database needs to read the CSV file generated by FineDataLink for loading.
3. Check if the account that needs to create the Greenplum data connection has the privilege to create schemas and tables.

1. For Windows systems, the gpfdist file (which has been pre-compiled for Linux systems) must be compiled into an EXE file based on the source code. Windows systems do not support the integration of gpfdist-related components (which can be integrated into Linux systems).
2. The maximum data size of a single row is 1 MB (for Windows systems), which cannot be modified.
Data Connection Procedure
Uploading the Driver
Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.
Driver Package Download |
---|
Download the latest version of the PostgreSQL driver. |
Data Connection Configuration
1. Log in to FineDataLink, choose System Management > Data Connection > Data Connection Management > New Data Connection, and click Pivotal Greenplum Database.

1. If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission.
2. For FineDataLink before 4.0.29, select Greenplum (Parallel Loading) when creating the data connection.
2. Fill in the connection information. Select Custom, and select the uploaded driver mentioned in the Uploading the Driver section.
You cannot set Pattern unless the database is connected. Click Click to Connect Database and then click Pattern, as shown in the following figure.
3. Configure Parallel Loading Setting if you need to write data into Greenplum databases.
Configuration Item | Description |
---|---|
Server Address - Node 1 | Enter the path of the gpfdist file mentioned in the Placing the gpfdist file section, ensuring it can be accessed by the SEG on the FineDataLink server. If the project is deployed in a clustered environment, multiple configuration items will be displayed in the format of Server Address - Node x. Type the path in the drop-down box. |
Temporary Table Reuse | Determine whether to reuse temporary tables. (Reusing temporary tables can effectively reduce the table growth rate during high-frequency loading.) If it is set to Yes, the gpfdist_temp schema will be automatically created and used during runtime. |
Limit on Temporary File Quantity | Set the maximum number of temporary files that can be written into the disk. Adjust the value according to the disk size and the network speed. Default value: 100,000. Range: 10,000 to 100,000,000. Required. |
Limit on Temporary File Size (MB) | Set the maximum size of the file that can be written into the disk. When either Limit on Temporary File Quantity or Limit on Temporary File Size (MB) is reached, data file writing stops, and file loading starts immediately. Default value: 1024. Range: 10 to 102400. Required. |
4. Click Test Connection. If the connection is successful, click Save to save the configuration.
Configuration Without Parallel Loading Setting
See the Configuration Instruction section of this article carefully.
Database Version
Greenplum 5.x and 6.x are supported.
Data Connection Procedure
The procedure is the same as that in the Configuration with Parallel Loading Setting section, except that you do not need to configure Parallel Loading Setting.
Data Source Usage
For details on using Greenplum data sources in FineDataLink, see Instruction on Greenplum Data Sources.
Scheduled Task supports the reading from and writing into the Greenplum database. For details, see Overview of Data Development.
Pipeline Task supports data writing into the Greenplum database. For details, see Overview of Data Pipeline.
Using a Greenplum database in Data Service requires configuring Parallel Loading Setting. For details on data services, see Overview of Data Service.