Overview
Version
FineDataLink Version | Functional Change |
---|---|
4.1.11.2 | Scheduled Task supported data reading from and writing into the YMatrix database. Data Pipeline supported data writing into the YMatrix database. Data Service supported data reading from the YMatrix database. Database Table Management supports the YMatrix database. |
Function Description
FineDataLink supports connection to the YMatrix database for data reading and writing using scheduled tasks, data writing using pipeline tasks, and for releasing APIs with data from it.
Configuration Instruction
data:image/s3,"s3://crabby-images/d3946/d3946eb9a0371b3d74182fd9101e34b519adb69b" alt="icon"
Pipeline Task
The COPY loading mode is used when you write data into a YMatrix database via Data Pipeline.
Using parallel loading requires specified database privilege.
1. Assign the privilege to create schemas in the database to users who need to use the data connection.
2. Create a fdl_temp schema in the target database to store temporary tables and assign users the privilege to create tables in this schema.
The example command is as follows:
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
ALTER DEFAULT PRIVILEGES IN SCHEMA fdl_temp GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES, TRIGGER ON TABLES TO trans_user ;
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
Scheduled Task
When writing data into a YMatrix database, you can choose from three load methods, namely, Parallel Loading, COPY Loading, and Common Loading. The differences among the three load methods are described in the following table.
Load Method | Difference |
---|---|
Common Loading | 1. This method is not recommended when you write data to a YMatrix database. 2. To read data from a YMatrix database, you are advised to configure the data connection by referring to the section "Configuration Without Parallel Loading Setting" of this document. |
Parallel Loading | 1. It supports the writing of JSON fields, not binary fields. 2. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters. 3. Configure the data connection following the steps in the section "Configuration with Parallel Loading Setting" of this document. ![]() |
COPY Loading | 1. It supports the writing of binary fields and JSON fields. 2. Configure the data connection following the steps in the section "Configuration Without Parallel Loading Setting" of this document. ![]() |
Assigning the Privilege for Parallel Loading
Using parallel loading to write data into a YMatrix database requires specified database privileges.
1. Assign privileges to create tables and read existing tables in the gpfdist_temp schema.
data:image/s3,"s3://crabby-images/c4fac/c4facb8b7db755354dae39b4fb5bcbd1abce0dd9" alt="icon"
GRANT USAGE,CREATE ON SCHEMA gpfdist_temp TO Username ;
2. Assign privileges to create external tables.
alter role Username with createexttable;
3. Assign the privilege to read the target table. Using Auto Table Creation requires the privilege to create tables in corresponding databases.
ALTER DEFAULT PRIVILEGES IN SCHEMA gpfdist_temp GRANT SELECT, INSERT, UPDATE, DELETE,
REFERENCES, TRIGGER ON TABLES TO Username ;
Assigning the Privilege for COPY Loading
For details, see the section "Pipeline Task" of this document.
Data Service
Data Service supports the YMatrix database provided that you have configured Parallel Loading Setting. For details, see Overview of Data Service.
Configuration with Parallel Loading Setting
Version and Driver
Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.
Supported Database Version | Driver |
---|---|
5. X | Download the latest version of the PostgreSQL driver from the official website. |
Procedure
1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, and click New Data Connection.
data:image/s3,"s3://crabby-images/c4fac/c4facb8b7db755354dae39b4fb5bcbd1abce0dd9" alt="icon"
2. Find the YMatrix icon, as shown in the following figure.
3. Fill in the connection information. Click Custom, and select the uploaded driver mentioned in the section "Version and Driver."
You cannot set Pattern unless the database is connected. Click Click to Connect Database and then click Pattern, as shown in the following figure.
data:image/s3,"s3://crabby-images/c4fac/c4facb8b7db755354dae39b4fb5bcbd1abce0dd9" alt="icon"
1. Specify the database and the schema when configuring the data connection. Otherwise, the default database and schema are read.
2. If a YMatrix data connection is used in a pipeline task and the username of the data connection is changed afterward, you need to grant the new database user the privilege on the fdl_temp schema and internal tables.
4. Configure Parallel Loading Setting to write data into a YMatrix database.
The setting items of Parallel Loading Setting are described in the following table.
Setting Item | Description |
---|---|
Server Address - Node 1 | Required. Enter the path of the gpfdist file and ensure it can be accessed by the SEG on the FineDataLink server. If the project is deployed in a clustered environment, multiple configuration items will be displayed in the format of Server Address - Node x. Type the path in the drop-down box. |
Temporary Table Reuse | Determine whether to reuse temporary tables. (Reusing temporary tables can effectively reduce the table growth rate during high-frequency loading.) If it is set to Yes, the gpfdist_temp schema will be automatically created and used during runtime. |
Limit on Temporary File Quantity | Default value: 100000. Range: 10000 to 100000000. Required. Set the maximum number of temporary files that can be written into the disk. Adjust the value according to the disk size and the network speed. |
Limit on Temporary File Size (MB) | Default value: 1024. Range: 10 to 102400. Required. Set the maximum size of the file that can be written into the disk. When either Limit on Temporary File Quantity or Limit on Temporary File Size (MB) is reached, data file writing stops, and file loading starts immediately. |
5. Click Test Connection. If the connection is successful, click Save to save the configuration.
Configuration Without Parallel Loading Setting
data:image/s3,"s3://crabby-images/c4fac/c4facb8b7db755354dae39b4fb5bcbd1abce0dd9" alt="icon"
The procedure is the same as that in the section "Configuration with Parallel Loading Setting," except that you do not need to configure Parallel Loading Setting.
Data Source Usage
Scheduled Task supports data reading from and writing into the YMatrix database. For details, see Overview of Data Development.
Data Pipeline supports data writing into the YMatrix database. For details, see Overview of Data Pipeline.
Database Service supports the YMatrix database. For details, see Overview of Data Service.