Supported connection to Greenplum (Parallel Loading) under System Management > Data Connection > Data Connection Management.
Data Input supported Greenplum and Greenplum (Parallel Loading).
Data Output supported Greenplum and Greenplum (Parallel Loading).
Scheduled Task supported the use of the COPY command to write data (including binary fields and JSON fields) into Greenplum databases.
In the Parallel Loading mode, the writing of JSON fields was supported.
In the Parallel Loading mode, the write method Append/Update/Delete Data Based on Identifier Field and the following strategies for primary key conflict were supported:
Ignore Source Data If Same Primary Key Exists
Record as Dirty Data If Same Primary Key Exists
Overwrite Data in Target Table If Same Primary Key Exists
Scheduled Task supports the reading from and writing into the Greenplum database.
Pipeline Task supports data writing into the Greenplum database.
Data Service supports the Greenplum database.
If GreenPlum is selected as the target data source in a pipeline task, the COPY loading mode will be used.
Using parallel loading requires specified database privilege.
1. Assign users who need to use a Greenplum data connection the privilege to create schemas in the corresponding database.
2. Create a fdl_temp schema in the target database to store temporary tables, and assign users the privilege to create tables in this schema.
The example command is as follows:
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;ALTER DEFAULT PRIVILEGES IN SCHEMA fdl_temp GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES, TRIGGER ON TABLES TO trans_user ;GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
When GreenPlum is selected as the data source in a scheduled task, three load methods are supported, namely, Parallel Loading, COPY Loading, and Common Loading. The differences among the three load methods are shown in the following table.
1. This method is not suitable for writing data into Greenplum databases.
2. If you only need to read data from a Greenplum database, configure the data connection following the steps in the section "Configuration Without Parallel Loading Setting" of this document.
1. FineDataLink 4.1.2 and later releases support the writing of JSON fields, but do not support the writing of binary fields.
2. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters.
3. Configure the data connection following the steps in the section "Configuration with Parallel Loading Setting" of this document.
Using Parallel Loading requires specified privileges.
1. It supports the writing of binary fields and JSON fields.
2. Configure the data connection following the steps in the Configuration Without Parallel Loading Setting section of this article.
Assigning the Privilege for Parallel Loading
Using GreenPlum as the target data source in the parallel loading mode requires specified privileges.
1. Assign privileges to create tables and read existing tables in the gpfdist_temp schema.
GRANT USAGE,CREATE ON SCHEMA gpfdist_temp TO trans_user ;
2. Assign privileges to create external tables.
alter role trans_user with createexttable;
3. Assign read privileges on the target table. Using Auto Table Creation requires table creation privilege on corresponding databases.
ALTER DEFAULT PRIVILEGES IN SCHEMA gpfdist_temp GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES, TRIGGER ON TABLES TO trans_user ;
Assigning the Privilege for COPY Loading
For details, see the Pipeline Task section of this article.
Using a Greenplum database in Data Service requires configuring Parallel Loading Setting. For details on data services, see Overview of Data Service.
Confirming the Database Version
Greenplum Database (Parallel Loading) 5.x and 6.x are supported.
Confirming the Data Type
Binary fields cannot be synchronized in the parallel loading mode and trigger an error message during loading. Binary fields can only be loaded via JDBC. For details, see the data connection procedure in this section.
Placing the gpfdist File
The related operations and storage location of the gpfdist file are shown in the following table.
Operation
1. Projects before 4.0.55
2. Projects upgraded from a version before 4.0.14 to a version before 4.0.21
\webapps\webroot\WEB-INF
Projects upgraded from a version before 4.0.14 to 4.0.21 and later versions
\webapps\webroot\WEB-INF\assist
Projects deployed with 4.0.14 and later installation packages
Linux System:
You can download the package for Linux systems: gpfdist_linux.tar.gz
1. Upload the downloaded package to the Linux server, and then extract it to the \webapps \webroot\WEB-INF\assist directory.
2. Place the gpfdist file (in the bin folder) at the same level as the lib folder, and then delete the bin folder.
3. Rename the gpfdist_linux folder to gpfdist.
The effect is shown in the following figure.
Windows System:
1. Obtain the installation package.
Create a gpfdist folder in the \webapps\webroot\WEB-INF\assist directory, change the obtained package to an EXE file, and place it in the folder.
2. Check if the server where the database is located can access the 15500 port of the FineDataLink project server as the database needs to read the CSV file generated by FineDataLink for loading.
3. Check if the account that needs to create the Greenplum data connection has the privilege to create schemas and tables.
1. For Windows systems, the gpfdist file (which has been pre-compiled for Linux systems) must be compiled into an EXE file based on the source code. Windows systems do not support the integration of gpfdist-related components (which can be integrated into Linux systems).
2. The maximum data size of a single row is 1 MB (for Windows systems), which cannot be modified.
Uploading the Driver
Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.
Data Connection Configuration
1. Log in to FineDataLink, choose System Management > Data Connection > Data Connection Management > New Data Connection, and click Pivotal Greenplum Database.
1. If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission.
2. For FineDataLink before 4.0.29, select Greenplum (Parallel Loading) when creating the data connection.
2. Fill in the connection information. Select Custom, and select the uploaded driver mentioned in the Uploading the Driver section.
You cannot set Pattern unless the database is connected. Click Click to Connect Database and then click Pattern, as shown in the following figure.
3. Configure Parallel Loading Setting if you need to write data into Greenplum databases.
Server Address - Node 1
If the project is deployed in a clustered environment, multiple configuration items will be displayed in the format of Server Address - Node x. Type the path in the drop-down box.
Determine whether to reuse temporary tables. (Reusing temporary tables can effectively reduce the table growth rate during high-frequency loading.)
If it is set to Yes, the gpfdist_temp schema will be automatically created and used during runtime.
Set the maximum number of temporary files that can be written into the disk. Adjust the value according to the disk size and the network speed.
Default value: 100,000. Range: 10,000 to 100,000,000. Required.
Set the maximum size of the file that can be written into the disk. When either Limit on Temporary File Quantity or Limit on Temporary File Size (MB) is reached, data file writing stops, and file loading starts immediately.
Default value: 1024. Range: 10 to 102400. Required.
4. Click Test Connection. If the connection is successful, click Save to save the configuration.
See the Configuration Instruction section of this article carefully.
Greenplum 5.x and 6.x are supported.
The procedure is the same as that in the Configuration with Parallel Loading Setting section, except that you do not need to configure Parallel Loading Setting.
For details on using Greenplum data sources in FineDataLink, see Instruction on Greenplum Data Sources.
Scheduled Task supports the reading from and writing into the Greenplum database. For details, see Overview of Data Development.
Pipeline Task supports data writing into the Greenplum database. For details, see Overview of Data Pipeline.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy