FineDataLink Version
Functional Change
4.1.4
Supported the reading of data from JSON and XML files.
Supported the reading of files of CSV-like types such as TSV, LOG, and DT (a mix of CSV and XML formats) when File Type was set to CSV.
4.2.6.1
Allowed adding built-in fields, including fileName (whose value is the file name), filePath (whose value is the file path), and lastModifiedTime (whose value is the file modification time) in Output Field, which would be processed as output fields.
Removed mandatory input validation for the Filename Extension setting.
You want to read data from Excel and CSV files on the local FineDataLink server or in the shared directory to build the data warehouse.
You want to use FineDataLink to upload Excel and CSV files to the local FineDataLink server.
FineDataLink provides a File Input function that can be used to read file data, as shown in the following figure.
To read data from local Excel and CSV files in FineDataLink or upload these files to FineDataLink, you must have created a data connection to the local server directory, placed the files to be read in the /webroot/WEB-INF/assets/local_files path of the FineDataLink project, and had permission to use the data connection. For details, see Data Connection to Local Server Directory.
To read data from Excel and CSV files on a remote FTP/SFTP server, you must have configured the FTP/SFTP data connection and had permission to use the data connection. For details, see FTP/SFTP Data Connection.
1. The number of read data displayed in the frontend log does not include the header row read by the File Input operator.
For example, for an Excel file containing one header row and 999,999 data rows (one million rows in total), the number of read data rows displayed in the frontend log is 999,999 after you read the file using the File Input operator.
2. You can reference parameters when configuring File Address, Folder Address, File Filtering, and Sheet Name of the File Input operator. For details about how to use parameters, see Parameter Configuration and Use. For specific examples of using the File Input operator together with parameters, see the "Application Example" section of this document.
File Source
The option in the drop-down list will only appear if you have previously configured a data connection to the local server directory or the FTP/SFTP server and have permission to use that data connection.
File Type
Reading Excel files is supported. When you set File Type to Excel:
Supported Excel file extensions include .xls, .xlsx, .XLS, and .XLSX.
Microsoft Excel should be of 97 or later versions.
Read Mode
Choose between Read File and Read Folder.
1. File Reading
Use it to read a single file.
File Address: Select the file to be read. You can use parameters set in FineDataLink. For details, see How to Configure and Use Parameters. If the File Source is Local Server Directory, enter the path relative to /webroot/WEB-INF/assets/local_files of the FineDataLink project, for example, /home/ftpuser/test/${time}. xlsx. (The /home path is in the /webroot/WEB-INF/assets/local_files path of the FineDataLink project.)
When File Source is set to Local Server Directory and Read Mode is set to Read File, you can manually upload the file to the directory on the local FineDataLink server on which you have permission. The name of the uploaded file cannot contain ? *:"<>\/| and cannot start with a space. The maximum file size is 100 MB.
2. Folder Reading
The operator reads files of the same format within the selected folder and merges file data. Fields in Excel files in the folder must be identical. The separator of CSV files in the folder must be the same.
1. If the folder contains files of multiple formats and you only want to read files of one type, you can add a condition to filter files whose names contain CSV or Excel. If no condition is set, all types of files in the folder will be read by default.
2. Files in the folder are read concurrently by the File Input operator. At most 20 files are read concurrently.
For example, there are two files test1 and test2 in the folder. If you select Read Folder, the files will be concatenated row-wise, as shown in the following figure.
For details, see Scanning a Batch of Files for Data Synchronization.
Setting Item
Description
Folder Address
Select the folder containing the files to be read. You can select the folder or manually enter the folder address. (You can use the parameter, where you need to select the parameter after entering it.)
Read Subfolder
If Read Subfolder is not selected, files in the subfolders will be skipped during reading. If it is selected, files in the subfolders will also be read.
File Filtering
You can configure filtering conditions for files to be read in the folder. You can filter files by fileName and lastModifiedTime. Filtering through parameters is supported.
A usage example of lastModifiedTime: A folder contains Table A and Table B added yesterday. Today, Table C is added to this folder, and the data in Table B is updated. In File Filtering, if you set lastModifiedTime to today's date, Table C and Table B will be read. (All data in Table B will be read, not only the data updated today.)
1. When you filter files using lastModifiedTime, the precision is limited to minutes, not seconds.
2. In FineDataLink 4.1.1 and later versions, the conditional judgment logic has been standardized. For details about each operator, see Condition Judgment Logic Description.
Sheet Name
Enter the name of the Sheet to be read. If it is empty, the first sheet will be read. You can use parameters in Sheet Name.
Read Range
Specify the start and end rows/columns in the Excel file for reading.
The values must be positive integers.
First Row As Field Name
If it is selected, the first row of the parsed data will be taken as the field name.
If it is not selected, the first row of content will be parsed as data.
Output Field
1. There are two methods to acquire output fields, as described in the following table.
Acquisition Method
Automatic Acquisition
Data types are obtained automatically through parsing.
Manual Acquisition
You can adjust the data type.
The name and data type of the output field are displayed in the Manual Acquisition area. You can set the type to varchar, int, long, float, double, date, and timestamp.
You can add a field and delete the added field. However, existing fields cannot be deleted or reordered.
When the first N rows of the added field are empty, the corresponding field value is NULL.
2. Starting from FineDataLink V4.2.6.1, you can add built-in fields, including fileName (whose value is the file name), filePath (whose value is the file path), and lastModifiedTime (whose value is the file modification time) in Output Field regardless of the acquisition method.
The configured built-in fields will be displayed on the Data Preview tab page and participate in actual execution. The following figure shows the preview effect.
3. If the file/folder timestamp (the lastModifiedTime value) retrieved via FTP is eight hours later than the actual server time or incorrect, you need to modify the configuration file. This adjustment will affect the time-based filtering configured in File Filtering when you set Read Mode to Read Folder.
Modify the vsftpd.conf configuration file in /etc/vsftpd by adding the following line at the bottom:
use_localtime=YES
Restart the FTP service.
service vsftpd restart
You need to configure the following content if setting File Type to CSV.
For details about File Source, Read Mode, First Row As Field Name, and Output Field, see the "Excel File Reading" section of this document. The following table describes other setting items.
This operator can read CSV files when you set File Type to CSV.
Supported file extensions include .csv, .CSV, .txt, .TXT, .tsv, .log, and .dt (a mixed format of CSV and XML).
The filename extension is case-insensitive.
Files in File Address with the specified extension will be read, and files of other formats will be skipped.
You can enter multiple filename extensions, which should be separated by comma. Reading files of CSV-like types, such as TXT/TSV/LOG/DT (a mixed format of CSV and XML), is supported.
Starting from FineDataLink V4.2.6.1, this setting item is no longer mandatory.
You can specify the separator to split data into multiple columns. The set separator must be consistent with the actual separator in the file. Otherwise, the file cannot be parsed correctly.
You can select the separator from Comma (,), Tab (\t), Semicolon (;), Vertical Bar (|), Space, ASCII Character, or select Custom to customize one.
You can customize the separator. Multiple characters, including Chinese characters, are supported.
You can also enter the decimal ASCII code to specify an ASCII character as a special column separator. The supported decimal ASCII codes range from 0 to 32.
The values of Text Qualifier, Column Separator, and Line Separator must be distinct.
You can specify the line separator of the CSV file.
CR + LF: Windows systems
LF: Unix, Linux, and other systems
CR: Early Mac OS systems
It marks the start and end of a column field to prevent the special character in the field value from being recognized as a separator, thus affecting the CSV file parsing.
You can select the text qualifier from Double Quotes, Single Quote, and ASCII Character. (The supported decimal ASCII codes range from 0 to 32.)
You can specify the encoding for the CSV file.
Supported encoding types include GBK, BIG5, ISO-8859-1, UTF-8, UTF-16, EUC_JP, EUC_KR, CP850, and GB2312.
It is used to specify the start row for reading in the CSV file.
It must be set. The default value is 1. You are not allowed to enter a number smaller than 1.
The actual reading starts from the set row.
This configuration takes effect on all files to be read during folder reading.
If you select First Row As Field Name, the reading will start from the set start row, and data on this row will be taken as the field name.
You can choose Custom to read files of types such as XML and JSON.
By default, the data is obtained as text data, with the content of a file per line. The column name is column by default. You can parse data using XML Parsing and JSON Parsing operators, as shown in the following figure.
Filename Extension
This option appears when you set Read Mode to Read File.
The username is case-insensitive.
You can enter multiple filename extensions, which should be separated by comma.
Duplicate filename extensions are not allowed.
Encoding
The Data Preview page is shown in the following figure.
The details are as follows:
By default, a maximum of 20 rows of data is displayed on the Data Preview page. Although the displayed fields are a union of fields of the first 5000 rows of data and column names on the first row (if the first row is specified as the field name), all the data will be output.
If the first row contains merged cells, these cells will be split and named in the format of Field name + Number. For example, if the first-row cell containing the Name field is the merging result of three cells, the fields after splitting will be named Name, Name1, and Name2.
If the first row of the CSV file contains fields with the same name, these fields will be renamed in the format of Field Name + Number. For example. If there are two Name fields, these fields will be renamed Name1 and Name2.
If First Row As Field Name is not selected, the fields will be named sequentially in the format of column + Number. For example, column, column1, and column2.
Assume that you have used a File Input operator in a scheduled task with File Type set to CSV and Read Mode set to Read Folder:
If you have set a file filtering condition using fileName and then upgraded FineDataLink to V4.1.4 or later versions, filtering conditions using fileName applying to files whose names contain .txt, .TXT, .csv, or .CSV will be added automatically and form a Logical And relationship with the original filtering condition.
If you have not set a file filtering condition using fileName, the filtering condition using fileName you set in FineDataLink of V4.1.4 or later versions will apply to files whose names contain .txt, .TXT, .csv, or .CSV.
Scenario
Handling Strategy
In the files to be read in batches, the data format of some files is inconsistent with the template file.
When you select a folder, the first file that is sorted alphabetically by filename among those meeting the filtering conditions will be used as the template.
If the number and data type of fields of the file with a mismatched format are consistent with the template, no error will be reported, and the file will be read normally.
If the number or type of fields of the file with a mismatched format are different from those of the template, an error will be reported during operator execution. The log records the name of the file being read when the error occurs.
In the read file, the first 5000 rows of the field at the end have no data.
The field is appended at the end of fields during the field output, with the field name and the data type configured.
The rows with no data will be read as NULL, and the rows with data will be read normally.
The custom data type of the output field is incompatible with the actual data type.
An error will be reported.
The correspondence between the cell format and the data type of the field after parsing is described in the following table.
Numeric
Currency
String
Accounting
Date and Time
Date or Timestamp
The cell containing the date only is parsed as date-type data, and the cell containing the date and time is parsed as timestamp-type data.
Percentage
Fraction
Scientific Notation
Text
1. CSV field type issues
The content in a CSV file can only be output as string and numeric data. Ensure the precision is preserved when the data type of the output field is numeric. Time-type data, such as 2022-11-10, will be parsed as string data. Timestamp-type data will be parsed as numeric data.
2. Merged cell reading issues
The merged cell will be split, and the value will be copied and pasted to each small cell (the cells generated after the splitting) for reading.
3. Empty fields
If a header field contains empty values, the empty values will be filled in as column + number.
If a numeric field contains empty values, NULL will be filled in.
4. Reading multiple sheets of an Excel file is not supported.
5. The filename containing wildcard characters is not supported.
6. Reading password-protected Excel files is not supported.
Document
Read a single file.
Using the File Input Operator to Read a Single File
Read Excel and CSV files with the same format in a folder, and save multiple files after reading them at one time.
Scanning a Batch of Files for Data Synchronization
Reading Data from a Multi-Sheet Excel File (by using a parameter in Sheet Name)
Scenarios:
Read files with the same format in batches. Files are frequently added.
Data is updated through file adding, instead of updating data in the existing file.
Methods for determining whether a file is a new file:
Method one: Judge by the file name. This method applies to scenarios where the file name contains the time when the file was added, for example, 2023/05/29 Inventory Data.
Method two: Judge by the file modification time.
Reading Files Based on Time Increments (by using a parameter in File Filtering)
Looping Through Files in the Linux Environment for Data Synchronization
(by using a parameter in File Address)
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy