YMatrix Data Connection- FineDataLink Help Document

Last update: December 13, 2024

Overview

Version

FineDataLink Version	Functional Change
4.1.11.2	Scheduled Task supported data reading from and writing into the YMatrix database. Data Pipeline supported data writing into the YMatrix database. Data Service supported data reading from the YMatrix database. Database Table Management supports the YMatrix database.

FineDataLink Version

Functional Change

4.1.11.2

Scheduled Task supported data reading from and writing into the YMatrix database.

Data Pipeline supported data writing into the YMatrix database.

Data Service supported data reading from the YMatrix database.

Database Table Management supports the YMatrix database.

Function Description

FineDataLink supports connection to the YMatrix database for data reading and writing using scheduled tasks, data writing using pipeline tasks, and for releasing APIs with data from it.

Configuration Instruction

Note:

You need to register related function points to use this data source. For details, see Registration Introduction.

Pipeline Task

The COPY loading mode is used when you write data into a YMatrix database via Data Pipeline.

Using parallel loading requires specified database privilege.

1. Assign the privilege to create schemas in the database to users who need to use the data connection.

2. Create a fdl_temp schema in the target database to store temporary tables and assign users the privilege to create tables in this schema.

The example command is as follows:

GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;
ALTER DEFAULT PRIVILEGES IN SCHEMA fdl_temp GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES, TRIGGER ON TABLES TO trans_user ;
GRANT USAGE,CREATE ON SCHEMA fdl_temp TO trans_user ;

Scheduled Task

When writing data into a YMatrix database, you can choose from three load methods, namely, Parallel Loading, COPY Loading, and Common Loading. The differences among the three load methods are described in the following table.

Load Method	Difference
Common Loading	1. This method is not recommended when you write data to a YMatrix database. 2. To read data from a YMatrix database, you are advised to configure the data connection by referring to the section "Configuration Without Parallel Loading Setting" of this document.
Parallel Loading	1. It supports the writing of JSON fields, not binary fields. 2. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters. 3. Configure the data connection following the steps in the section "Configuration with Parallel Loading Setting" of this document. Note: Using Parallel Loading requires specified privileges.
COPY Loading	1. It supports the writing of binary fields and JSON fields. 2. Configure the data connection following the steps in the section "Configuration Without Parallel Loading Setting" of this document. Note: To use COPY Loading, you need to create a fdl_temp schema in the target database to store temporary tables and assign users the privilege to create tables in the specified schema. (If the database administrator has created the schema and granted users the privilege on it, database users do not need the privilege to create schemas.)

Load Method

Difference

Common Loading

1. This method is not recommended when you write data to a YMatrix database.

2. To read data from a YMatrix database, you are advised to configure the data connection by referring to the section "Configuration Without Parallel Loading Setting" of this document.

Parallel Loading

1. It supports the writing of JSON fields, not binary fields.

2. Parallel loading outperforms COPY loading in scenarios with large data volumes and large-scale clusters.

3. Configure the data connection following the steps in the section "Configuration with Parallel Loading Setting" of this document.

Note:

Using Parallel Loading requires specified privileges.

COPY Loading

1. It supports the writing of binary fields and JSON fields.

2. Configure the data connection following the steps in the section "Configuration Without Parallel Loading Setting" of this document.

Note:

To use COPY Loading, you need to create a fdl_temp schema in the target database to store temporary tables and assign users the privilege to create tables in the specified schema. (If the database administrator has created the schema and granted users the privilege on it, database users do not need the privilege to create schemas.)

Assigning the Privilege for Parallel Loading

Using parallel loading to write data into a YMatrix database requires specified database privileges.

1. Assign privileges to create tables and read existing tables in the gpfdist_temp schema.

Note:

If you don't want to assign the privilege to read existing tables, stop the task that uses parallel loading and delete the ext_gpload_* and staging_gpload_* tables in the gpfdist_temp schema. After that, you only need to assign the privilege to create tables in the schema.

GRANT USAGE,CREATE ON SCHEMA gpfdist_temp TO Username ;

2. Assign privileges to create external tables.

alter role Username with createexttable;

3. Assign the privilege to read the target table. Using Auto Table Creation requires the privilege to create tables in corresponding databases.

ALTER DEFAULT PRIVILEGES IN SCHEMA gpfdist_temp GRANT SELECT, INSERT, UPDATE, DELETE, 
REFERENCES, TRIGGER ON TABLES TO Username ;

Assigning the Privilege for COPY Loading

For details, see the section "Pipeline Task" of this document.

Data Service

Data Service supports the YMatrix database provided that you have configured Parallel Loading Setting. For details, see Overview of Data Service.

Configuration with Parallel Loading Setting

Version and Driver

Download the driver package and upload it to FineDataLink. For the specific steps of uploading the driver package, see Driver Management.

Supported Database Version	Driver
5. X	Download the latest version of the PostgreSQL driver from the official website.

Procedure

1. Log in to FineDataLink as the admin, choose System Management > Data Connection > Data Connection Management, and click New Data Connection.

Note:

If you are not the admin, you can configure data connections only after the admin assigns you permission on Data Connection under Permission Management > System Management. For details, see Data Connection Management Permission.

2. Find the YMatrix icon, as shown in the following figure.

3. Fill in the connection information. Click Custom, and select the uploaded driver mentioned in the section "Version and Driver."

You cannot set Pattern unless the database is connected. Click Click to Connect Database and then click Pattern, as shown in the following figure.

Note:

1. Specify the database and the schema when configuring the data connection. Otherwise, the default database and schema are read.

2. If a YMatrix data connection is used in a pipeline task and the username of the data connection is changed afterward, you need to grant the new database user the privilege on the fdl_temp schema and internal tables.

4. Configure Parallel Loading Setting to write data into a YMatrix database.

The setting items of Parallel Loading Setting are described in the following table.

Setting Item	Description
Server Address - Node 1	Required. Enter the path of the gpfdist file and ensure it can be accessed by the SEG on the FineDataLink server. If the project is deployed in a clustered environment, multiple configuration items will be displayed in the format of Server Address - Node x. Type the path in the drop-down box.
Temporary Table Reuse	Determine whether to reuse temporary tables. (Reusing temporary tables can effectively reduce the table growth rate during high-frequency loading.) If it is set to Yes, the gpfdist_temp schema will be automatically created and used during runtime.
Limit on Temporary File Quantity	Default value: 100000. Range: 10000 to 100000000. Required. Set the maximum number of temporary files that can be written into the disk. Adjust the value according to the disk size and the network speed.
Limit on Temporary File Size (MB)	Default value: 1024. Range: 10 to 102400. Required. Set the maximum size of the file that can be written into the disk. When either Limit on Temporary File Quantity or Limit on Temporary File Size (MB) is reached, data file writing stops, and file loading starts immediately.

5. Click Test Connection. If the connection is successful, click Save to save the configuration.

Configuration Without Parallel Loading Setting

Note:

See the section "Configuration Instruction" of this document carefully.

The procedure is the same as that in the section "Configuration with Parallel Loading Setting," except that you do not need to configure Parallel Loading Setting.

Data Source Usage

Note:

For details about using YMatrix in FineDataLink, see YMatrix Instruction.

Scheduled Task supports data reading from and writing into the YMatrix database. For details, see Overview of Data Development.
Data Pipeline supports data writing into the YMatrix database. For details, see Overview of Data Pipeline.
Database Service supports the YMatrix database. For details, see Overview of Data Service.