1. Regardless of whether the Parallel Loading Setting is configured, you should adhere to case conventions during Data Development. You'd better use a uniform uppercase format. Otherwise, errors will occur. For example, when performing a data query, ensure consistent casing for the query content or enclose the query content in quotation marks.
2. In Greenplum databases, the length limit for field names is as follows.
The maximum length is 64 characters. For field names using English letters, numbers, or underscores, up to 64 characters are supported.
It is recommended that field names be composed of lowercase letters, numbers, and underscores. Avoid starting with a number or including special characters.
You can read partition tables from the Greenplum database in scheduled tasks, as shown in the following figure.
You can select partition tables as data destinations in scheduled tasks, as shown in the following figure.
For FineDataLink of V4.1.9.3 and later versions, if you set the target table in a scheduled task to Auto Created Table, you can specify partition keys and distribution keys, as shown in the following figure.
For details about the configuration method, see Partition Table Creation and Data Reading/Writing. The differences are as follows.
RANGE and LIST allow you to leave Partition Name empty, in which case the database will assign a default name based on the partition position, without requiring FineDataLink to process it.
RANGE supports two methods to define boundaries. (Inclusive or exclusive ranges are supported.)
1. Method One: You need to set the start value and end value, and you can set the Automatic Partition Interval for automatic partitioning only when both start and end values are valid. For example, set the start value to 2015-01-01, the end value to 2020-12-31, and the interval to 1 Year.
When the field data is of the date type, you can select Year/Month/Day as the interval unit in Automatic Partition Interval.
When the field data is of the numeric type, you can input a positive integer as the interval.
2. Method Two: You can set the conditions to Greater than or equal to XXX or Less than or equal to XXX separately.
You can set a default partition.
When the data destination in a scheduled task is a Greenplum database, the Write Method tab page is shown in the following figure.
The following table introduces the Load Method.
1. By default, gpfdist uses port 15500 to provide services.
2. Binary fields cannot be synchronized when you select Parallel Loading.
3. FineDataLink 4.1.2 and later releases support the writing of JSON fields, but do not support the writing of binary fields.
4. For FineDataLink 4.1.2 and later releases, the following strategies for the primary key conflict are supported: Ignore Source Data If Same Primary Key Value Exists, Record as Dirty Data If Same Primary Key Value Exists, and Overwrite Data in Target Table If Same Primary Key Value Exists.
5. After you enable Dirty Data Tolerance, if the Parallel Loading process fails, FineDataLink leverages the built-in error table logic of GPLOAD to obtain the information about dirty data for correct recording. If Dirty Data Tolerance is disabled, the node will report an error.
6. When you use the gpfdist protocol for parallel loading, FineDataLink of versions prior to 4.1.2 does not support the insert/update/delete data writing methods. For FineDataLink 4.1.2 and later releases, these writing methods are supported.
7. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters.
8. When configuring data connections, you need to follow the steps in the "Configuration with Parallel Loading Setting" section.
This method supports the writing of binary fields and JSON fields.
1. Use COPY Loading when the target table has no primary key and Primary Key Mapping is not configured.
2. When the target table has a primary key or Primary Key Mapping is configured, three primary key conflict strategies are available: Ignore Source Data If Same Primary Key Value Exists, Record as Dirty Data If Same Primary Key Value Exists, and Overwrite Data in Target Table If Same Primary Key Value Exists. After selecting one of them as the Strategy for Primary Key Conflict, COPY Loading and Common Loading are used.
When COPY Loading and Common Loading are used:
If the COPY Loading process fails, you can try to write the batch of data using the Common Loading method. Any data that fails to be written will be recorded as dirty data. Once the writing of this batch is completed, the next batch will again prioritize COPY Loading.
It is selected for JDBC-based serial loading.
1. This method is not suitable for writing data into Greenplum databases.
2. If you only need to read data from a Greenplum database, configure the data connection following the steps in the "Configuration Without Parallel Loading Setting" section.
The following table describes special usage scenarios of the Scheduled Task:
You have set Load Method to COPY Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source.
The loading exception occurs because multiple source records in a batch can be used to update the same target record, and there is no clear rule to determine which source record should take precedence.
You have set Load Method to Parallel Loading, and there are N records (N > 1) with the same primary key value in a single batch of data to be loaded from the source.
Scenario One: If there are no binary fields in the fields to be synchronized, you can consider using GPLOAD.
Scenario Two: If there are binary fields in the fields to be synchronized, there is no efficient solution to it currently.
FineDataLink of V4.2.7.4 and later versions supports Data Synchronization - Write Method when the target end is a Greenplum database.
For FineDataLink of V4.1.1 and later versions, when the Greenplum database is used as the target end, Setting Synchronization Without a Primary Key is supported.
1. Pipeline Task supports the writing of data into the Greenplum database. For details, see Overview of Data Pipeline.
2. For FineDataLink of V4.1.9.3 and later versions, if you set the target table in a pipeline task to Auto Created Table, you can specify partition keys and distribution keys, as shown in the following figure.
4. When the target end of a pipeline task is a Greenplum database and the target table is a manually created table containing a timestamp field (which records the actual time when data is added or updated in the database), you need to set a default value for the timestamp field: ((EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000)::bigint). Otherwise, the timestamp field will be empty after incremental synchronization.
Using a Greenplum database in Data Service requires configuring the Parallel Loading Setting. For details about data services, see Overview of Data Service.
In FineDataLink of V4.1.9.3 and later versions, you can select partition tables in Greenplum databases as data sources in Data Service, as shown in the following figure.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy