Greenplum Instruction- FineDataLink Help Document

Last update: September 26, 2025

Note: This document details how to use the Greenplum database in FineDataLink, including its usage limitations and special settings varying with function modules, helping you better understand and apply the data source.

Usage Restriction

1. Regardless of whether the Parallel Loading Setting is configured, you should adhere to case conventions during Data Development. You'd better use a uniform uppercase format. Otherwise, errors will occur. For example, when performing a data query, ensure consistent casing for the query content or enclose the query content in quotation marks.

2. In Greenplum databases, the length limit for field names is as follows.

The maximum length is 64 characters. For field names using English letters, numbers, or underscores, up to 64 characters are supported.

It is recommended that field names be composed of lowercase letters, numbers, and underscores. Avoid starting with a number or including special characters.

Data Development - Scheduled Task

Data Source

You can read partition tables from the Greenplum database in scheduled tasks, as shown in the following figure.

Data Destination and Mapping

You can select partition tables as data destinations in scheduled tasks, as shown in the following figure.

For FineDataLink of V4.1.9.3 and later versions, if you set the target table in a scheduled task to Auto Created Table, you can specify partition keys and distribution keys, as shown in the following figure.

Note: Only level-one partition is supported.

Note: For Greenplum 5.x and 6.x, Range Partitioning and List Partitioning are supported. Greenplum 7 additionally supports Hash Partitioning. (You do not need to specify modulus and remainder.) For all partitioning methods, only one field is supported as the partition field. (Multi-field partition keys are not supported by GPORCA.)

For details about the configuration method, see Partition Table Creation and Data Reading/Writing. The differences are as follows.

RANGE and LIST allow you to leave the Partition Name empty, in which case the database will assign a default name based on the partition position, without requiring FineDataLink to process it.
RANGE supports two methods to define boundaries. (Inclusive or exclusive ranges are supported.)

1. Method One: You need to set the start value and end value, and you can set the Automatic Partition Interval for automatic partitioning only when both start and end values are valid. For example, set the start value to 2015-01-01, the end value to 2020-12-31, and the interval to 1 Year.

When the field data is of the date type, you can select Year/Month/Day as the interval unit in Automatic Partition Interval.

When the field data is of the numeric type, you can input a positive integer as the interval.

2. Method Two: You can set the conditions to Greater than or equal to XXX or Less than or equal to XXX separately.

Note: The partitions you set through Method One and Method Two can coexist.

You can set a default partition.

Write Method

When the data destination in a scheduled task is a Greenplum database, the Write Method tab page is shown in the following figure.

The following table introduces the Load Method.

Load Method

Description

Parallel Load

1. By default, gpfdist uses port 15500 to provide services.

2. Binary fields cannot be synchronized when you select Parallel Loading.

3. FineDataLink 4.1.2 and later releases support the writing of JSON fields, but do not support the writing of binary fields.

4. For FineDataLink 4.1.2 and later releases, the following strategies for the primary key conflict are supported: Ignore Source Data If Same Primary Key Value Exists, Record as Dirty Data If Same Primary Key Value Exists, and Overwrite Data in Target Table If Same Primary Key Value Exists.

5. After you enable Dirty Data Tolerance, if the Parallel Loading process fails, FineDataLink leverages the built-in error table logic of GPLOAD to obtain the information about dirty data for correct recording. If Dirty Data Tolerance is disabled, the node will report an error.

6. When you use the gpfdist protocol for parallel loading, FineDataLink of versions prior to 4.1.2 does not support the insert/update/delete data writing methods. For FineDataLink 4.1.2 and later releases, these writing methods are supported.

7. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters.

8. When configuring data connections, you need to follow the steps in the "Configuration with Parallel Loading Setting" section.

Note: Using Parallel Loading requires specified permission.

COPY Loading (available from FineDataLink V4.1.2)

To use COPY Loading, you need to create an fdl_temp schema in the target database to store temporary tables and assign users permission to create tables within the specified schema. (If the schema has been created and granted permission by the database administrator, database users do not need permission to create schemas.)

This method supports the writing of binary fields and JSON fields.

Write Method

Description

Write Data into the Target Table Directly

1. Use COPY Loading when the target table has no primary key and Primary Key Mapping is not configured.

2. When the target table has a primary key or Primary Key Mapping is configured, three primary key conflict strategies are available: Ignore Source Data If Same Primary Key Value Exists, Record as Dirty Data If Same Primary Key Value Exists, and Overwrite Data in Target Table If Same Primary Key Value Exists. After selecting one of them as the Strategy for Primary Key Conflict, COPY Loading and Common Loading are used.

Write Data into the Target Table after Emptying It

COPY Loading and Common Loading are used.

Add/Modify/Delete Data Based on the Identifier Field

When COPY Loading and Common Loading are used:

If the COPY Loading process fails, you can try to write the batch of data using the Common Loading method. Any data that fails to be written will be recorded as dirty data. Once the writing of this batch is completed, the next batch will again prioritize COPY Loading.

Common Load

It is selected for JDBC-based serial loading.

1. This method is not suitable for writing data into Greenplum databases.

2. If you only need to read data from a Greenplum database, configure the data connection following the steps in the "Configuration Without Parallel Loading Setting" section.

The following table describes special usage scenarios of the Scheduled Task:

Special Scenario	Handling Strategy
You have set Load Method to COPY Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source. The loading exception occurs because multiple source records in a batch can be used to update the same target record, and there is no clear rule to determine which source record should take precedence.	The first record is used by default. If data anomalies occur, you can manually deduplicate data in the preceding steps.
You have set Load Method to Parallel Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source. The loading exception occurs because multiple source records in a batch can be used to update the same target record, and there is no clear rule to determine which source record should take precedence.
You have enabled Dirty Data Tolerance. The loading method changes to JDBC loading after the COPY loading fails, and then the loading speed becomes very slow.	Scenario One: If there are no binary fields in the fields to be synchronized, you can consider using GPLOAD. Scenario Two: If there are binary fields in the fields to be synchronized, there is no efficient solution to it currently.

Special Scenario

Handling Strategy

You have set Load Method to COPY Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source.

The loading exception occurs because multiple source records in a batch can be used to update the same target record, and there is no clear rule to determine which source record should take precedence.

The first record is used by default. If data anomalies occur, you can manually deduplicate data in the preceding steps.

You have set Load Method to Parallel Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source.

You have enabled Dirty Data Tolerance. The loading method changes to JDBC loading after the COPY loading fails, and then the loading speed becomes very slow.

Scenario One: If there are no binary fields in the fields to be synchronized, you can consider using GPLOAD.

Scenario Two: If there are binary fields in the fields to be synchronized, there is no efficient solution to it currently.

FineDataLink of V4.2.7.4 and later versions supports Data Synchronization - Write Method when the target end is a Greenplum database.

Data Pipeline

For FineDataLink of V4.1.1 and later versions, when the Greenplum database is used as the target end, Setting Synchronization Without a Primary Key is supported.

1. Pipeline Task supports the writing of data into the Greenplum database. For details, see Overview of Data Pipeline.

2. For FineDataLink of V4.1.9.3 and later versions, if you set the target table in a pipeline task to Auto Created Table, you can specify partition keys and distribution keys, as shown in the following figure.

4. When the target end of a pipeline task is a Greenplum database and the target table is a manually created table containing a timestamp field (which records the actual time when data is added or updated in the database), you need to set a default value for the timestamp field: ((EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000)::bigint). Otherwise, the timestamp field will be empty after incremental synchronization.

Data Service

Using a Greenplum database in Data Service requires configuring the Parallel Loading Setting. For details about data services, see Overview of Data Service.

In FineDataLink of V4.1.9.3 and later versions, you can select partition tables in Greenplum databases as data sources in Data Service, as shown in the following figure.

Previous：GaussDB 200 Instruction

Next：Doris Data Connection

Helpful
Not helpful
Only read

中文（简体）

English

Greenplum Instruction

Usage Restriction

Data Development - Scheduled Task

Data Source

Data Destination and Mapping

Write Method

Data Pipeline

Data Service

附件列表