Function Description of File Input

  • Last update: January 15, 2025
  • Overview

    Version

    FineDataLink Version

    Functional Change

    4.1.4

    • Supported the reading of data from JSON and XML files.

    • Supported the reading of files of CSV-like types such as TSV, LOG, DT (a mix of CSV and XML formats), and DBF when File Type was set to CSV.

    Application Scenario

    You want to read data from Excel and CSV files on the local FineDataLink server or in the shared directory to build the data warehouse.

    You want to use FineDataLink to upload Excel and CSV files to the local FineDataLink server.

    Function Description

    FineDataLink provides a File Input function that can be used to read file data, as shown in the following figure.

    1.png


    Prerequisite

    iconNote:
    You can assign data connection permission to control the local server directories accessible to each person, ensuring data security.
    • To read data from local Excel and CSV files in FineDataLink or upload these files to FineDataLink, you must have created a data connection to the local server directory, placed the files to be read in the /webroot/WEB-INF/assets/local_files path of the FineDataLink project, and had permission to use the data connection. For details, see Data Connection to Local Server Directory.

    • To read data from Excel and CSV files on a remote FTP/SFTP server, you must have configured the FTP/SFTP data connection and had permission to use the data connection. For details, see FTP/SFTP Data Connection.

    Function Description

    The number of read data displayed in the frontend log does not include the header row read by the File Input operator.

    For example, for an Excel file containing one header row and 999,999 data rows (one million rows in total), the number of read data rows displayed in the frontend log is 999,999 after you read the file using the File Input operator.

    Excel File Reading

    2.png

    File Source

    The option in the drop-down list will only appear if you have previously configured a data connection to the local server directory or the FTP/SFTP server and have permission to use that data connection.

    File Type

    Reading Excel files is supported.

    iconNote:

    1. When Excel is selected as File Type, the supported extensions are .xls, .xlsx, .XLS, and .XLSX.

    2. Microsoft Excel 97 and later versions are supported.

    Read Mode

    Choose between Read File and Read Folder.

    3.png

    1. File Reading

    • Use it to read a single file.

    • File Address: Select the file to be read. You can use parameters set in FineDataLink. For details, see How to Configure and Use Parameters. If the File Source is Local Server Directory, enter the path relative to /webroot/WEB-INF/assets/local_files of the FineDataLink project, for example, /home/ftpuser/test/${time}. xlsx. (The /home path is in the /webroot/WEB-INF/assets/local_files path of the FineDataLink project.)

    You can also use Parameter Assignment to set parameters.

    iconNote:
    Paste the complete file address or manually enter it in File Address, and press Enter to save it.

    Note: Paste the complete file address or manually enter it in File Address, and press Enter to save it.

    • When File Source is set to Local Server Directory and Read Mode is set to Read File, you can manually upload the file to the directory on the local FineDataLink server on which you have permission.  The name of the uploaded file cannot contain ? *:"<>\/| and cannot start with a space. The maximum file size is 100 MB.

    iconNote:
    The maximum file size for manual uploading is 100 MB.

    4.png

    2. Folder Reading

    Use it to read multiple files of the same format in a folder.

    iconNote:

    1. If the folder contains files of multiple formats and you only want to read files of one type, you can add a condition to filter files whose names contain CSV or Excel. If no condition is set, all types of files in the folder will be read by default. 

    2. Files in the folder are read concurrently by the File Input operator. At most 20 files are read concurrently.

    5.png

    For example, there are two files test1 and test2 in the folder. If you select Read Folder, the files will be concatenated row-wise, as shown in the following figure.

    For details, see Scanning a Batch of Files for Data Synchronization.

    6.png

    iconNote:
    Excel files in a folder should have the same fields. CSV files in a folder should have the same separator.

    Setting Item

    Description

    Folder Address

    Select the folder containing the files to be read. You can select the folder or manually enter the folder address. (You can use the parameter, where you need to select the parameter after entering it.)

    For details about parameters set in FineDataLink, see How to Configure and Use Parameters.

    iconNote:
    Paste the complete file address or manually enter it in File Address, and press Enter to save it.

    Read Subfolder

    If Read Subfolder is not selected, files in the subfolders will be skipped during reading. If it is selected, files in the subfolders will also be read.

    File Filtering

    You can configure filtering conditions for files to be read in the folder. You can filter files by fileName and lastModifiedTime. Filtering through parameters is supported.

    For details about parameters set in FineDataLink, see How to Configure and Use Parameters.

    A usage example of lastModifiedTime: A folder contains Table A and Table B added yesterday. Today, table C is added to this folder, and the data in Table B is updated. In File Filtering, if you set lastModifiedTime to today's date, Table C and Table B will be read. (All data in Table B will be read, not only the data updated today.)

    iconNote:

    1. When you filter files using lastModifiedTime, the precision is limited to minutes, not seconds.

    2. In FineDataLink 4.1.1 and later versions, the conditional judgment logic has been standardized. For details about each operator, see Judgment Logic Description.

    7.png

    Sheet Name

    Enter the name of the Sheet to be read. If it is empty, the first sheet will be read. You can use parameters in Sheet Name.

    For details about parameters set in FineDataLink, see How to Configure and Use Parameters.

    Read Range

    Specify the start and end rows/columns in the Excel file for reading.

    The values must be positive integers.

    First Row As Field Name

    If it is selected, the first row of the parsed data will be taken as the field name.

    If it is not selected, the first row of content will be parsed as data.

    Output Field

    1. Automatic Acquisition:

    If you choose Automatic Acquisition, the data type of the field is read based on the parsing result, and the output field is not displayed in the Output Field area.

    8.png

    2. Manual Acquisition:

    • The name and data type of the output field are displayed in the Manual Acquisition area. You can set the type to varchar, int, long, float, double, date, and timestamp.

    iconNote:
    When reading the CSV file, you cannot set the type to date and timestamp.

    9.png

    • You can add a field and delete the added field. However, existing fields cannot be deleted or reordered.

    iconNote:
    The name of the added field cannot be empty, cannot contain spaces, and cannot be the same as another valid field name.

    10.png

    • When the first N rows of the added field are empty, the corresponding field value is NULL.

    CSV File Reading

    11.png

    icon

    For details about File SourceRead Mode, First Row As Field Name, and Output Field, see the "Excel File Reading" section of this document.

    File Type

    When File Type is set to CSV, files of CSV-like types can be read.

    Supported file extensions include csv, CSV, txt, TXT, tsv, log, dt (a mixed format of CSV and XML), and dbf.

    Filename Extension

    This option appears when you select Read File as the Read Mode. The details are as follows:

    • The username is case-insensitive.

    • Files in File Address with the specified extension will be read, and files of other formats will be skipped.

    • You can enter multiple filename extensions, which should be separated by comma. Reading files of CSV-like types such as TXT/TSV/LOG/DT (a mixed format of CSV and XML)/DBF is supported.

    Column Separator

    Specify the separator to split data into multiple columns. The set separator must be consistent with the actual separator in the file. Otherwise, the file cannot be parsed correctly.

    You can specify or customize the separator.

    12.png

    You can select the separator from Comma (,), Tab (\t), Semicolon (;), Vertical Bar (|), Space, ASCII Character, or select Custom to customize one.

    • You can customize the separator. Multiple characters, including Chinese characters, are supported.

    • You can also enter the decimal ASCII code to specify an ASCII character as a special column separator. The supported decimal ASCII codes range from 0 to 32.

    • The values of Text Qualifier, Column Separator,and Line Separator must be distinct.

    Line Separator

    You can specify the line separator of the CSV file.

    13.png

    CR + LF: Windows system

    LF: Unix, Linux, and other systems

    CR: Early Mac OS system

    The values of Text Qualifier, Column Separator, and Line Separator must be distinct.

    Text Qualifier

    It marks the start and end of a column field to prevent the special character in the field value from being recognized as a separator, thus affecting the CSV file parsing.

    14.png

    You can select the text qualifier from Double Quotes, Single Quote, and ASCII Character. (The supported decimal ASCII codes range from 0 to 32.)

    The values of Text Qualifier, Column Separator, and Line Separator must be distinct.

    Encoding

    Specify the encoding for the CSV file. Supported encoding types include GBK, BIG5, ISO-8859-1, UTF-8, UTF-16, EUC_JP, EUC_KR, CP850, and GB2312.

    15.png

    Start Row to Read

    Specify the start row for reading in the CSV file.

    • This is a required field. The default value is 1. You are not allowed to enter a number smaller than 1.

    • The actual reading starts from the set row.

    • This configuration takes effect on all files to be read during folder reading.

    • If you select First Row As Field Name, the reading will start from the set start row, and data      on this row will be taken as the field name.

    Custom

    Use it to read file of types such as XML and JSON.

    By default, the data is obtained as text data, with the content of a file per line. The column name is Column by default. You can parse data using XML Parsing and JSON Parsing operators.

    16.png

    icon
    For details about File SourceRead Mode, and File Address, see the "Excel File Reading" section of this document.

    Filename Extension

    This option appears when you set Read Mode to Read File.

    • The username is case-insensitive.

    • You can enter multiple filename extensions, which should be separated by comma.

    • Duplicate filename extensions are not allowed.

    Encoding

    Supported encoding types include GBK, BIG5, ISO-8859-1, UTF-8, UTF-16, EUC_JP, EUC_KR, CP850, and GB2312.

    17.png

    Data Preview

    The Data Preview page is shown in the following figure.

    18.png

    The details are as follows:

    • By default, a maximum of 20 rows of data is displayed on the Data Preview page. Although the displayed fields are a union of fields of the first 5000 rows of data and column names on the first row (if the first row is specified as the field name), all the data will be output.

    • If the first row contains merged cells, these cells will be split and named in the format of Field name + Number. For example, if the first-row cell containing the Name field is the merging result of three cells, the fields after splitting will be named Name, Name1, and Name2.

    • If the first row of the CSV file contains fields with the same name, these fields will be renamed in the format of Field Name + Number. For example. If there are two Name fields, these fields will be renamed Name1 and Name2.

    • If First Row As Field Name is not selected, the fields will be named sequentially in the format of column + Number. For example, column, column1, and column2.

    Upgrade Compatibility Instruction

    Assume the File Input operator is used in a scheduled task of the FineDataLink project with File Type set to CSV and Read Mode set to Read Folder.

    Special Scenario Handling Strategy

    Scenario

    Handling Strategy

    In the files to be read in batches, the data format of some files is inconsistent with the template file.

    When you select a folder, the first file that is sorted alphabetically by filename among those meeting the filtering conditions will be used as the template.

     

    If the number and data type of fields of the file with a mismatched format are consistent with the template, no error will be reported, and the file will be read normally.

    If the number or type of fields of the file with a mismatched format are different from those of the template, an error will be reported during operator execution. The log records the name of the file being read when the error occurs.

    In the read file, the first 5000 rows of the field at the end have no data.

    The field is appended at the end of fields during the field output, with the field name and the data type configured.

    The rows with no data will be read as NULL, and the rows with data will be read normally.

    The custom data type of the output field is incompatible with the actual data type.

    An error will be reported.

    Instruction of Excel Data Reading

    The correspondence between the cell format and the data type of the field after parsing is described in the following table.

    Excel Cell Format
    Field Data Type After Parsing
    GeneralIf it can be parsed into a type supported by FineDataLink based on the original value, it will be parsed automatically. Otherwise, it will be treated as a string.

    Numeric

    Numeric

    Currency

    String

    Numeric

    Accounting

    String

    Numeric

    Date and Time

    Date or Timestamp

    The cell containing the date only is parsed as date-type data, and the cell containing the date and time is parsed as timestamp-type data.

    Percentage

    Numeric

    Fraction

    Numeric

    Scientific Notation

    Numeric

    Text

    String

    1. CSV field type issues

    The content in a CSV file can only be output as string and numeric data. Ensure the precision is preserved when the data type of the output field is numeric. Time-type data such as 2022-11-10 will be parsed as string data. Timestamp-type data will be parsed as numeric data.

    2. Merged cell reading issues

    The merged cell will be split and the value will be copied and pasted to each small cell (the cells generated after the splitting) for reading.

    3. Empty fields

    If a header field contains empty values, the empty values will be filled in as column + number.

    If a numeric field contains empty values, NULL will be filled in.

    4. Reading multiple sheets of an Excel file is not supported.

    5. The filename containing wildcard characters is not supported.

    6. Reading password-protected Excel files is not supported.

    Application Example

    Description

    Document

    Read a single file.

    Using the File Input Operator to Read a Single File

    Read Excel and CSV files with the same format in a folder, and save multiple files after reading them at one time.

    Scanning a Batch of Files for Data Synchronization

    Read files with the same format regularly in batches. Files are seldom added.

    For 4.0.24 and later versions, see Looping Through Files in the Linux Environment for Data Synchronization.

    Versions before 4.0.24: Protocol for Looping Through Files in the Linux Environment for Data Synchronization

    Excel spreadsheets (such as financial statements, sales data, and transcripts) often contain multiple sheets. You want to use FineDataLink to read data from a multi-sheet Excel file.

    Reading Data from a Multi-Sheet Excel File (by using a parameter in Sheet Name)

    Scenarios:

    Read files with the same format in batches. Files are frequently added.

    Data is updated through file adding, instead of updating data in the existing file.

    Methods for determining whether a file is a new file:

    Method one: Judge by the file name. This method applies to scenarios where the file name contains the time when the file was added, for example, 2023/05/29 Inventory Data.

    Method two: Judge by the file modification time.

    Reading Files Based on Time Increments (by using a parameter in File FIltering)

    Read Excel or CSV files with the same format in a folder. When reading a file, you want to process the data first and then save it.

    Looping Through Files in the Windows Environment for Data Synchronization

    Looping Through Files in the Linux Environment for Data Synchronization

    (by using a parameter in File Address)

    When a file is distributed in your business workflow, a specific file is generated to indicate that the distribution is complete. For example, after the file is distributed, an. ok file (used to determine whether the file is distributed), a. del file (used   to determine the writing method according to different values), and an. sql file with the same name as the file will be generated in the directory.

    The CSV file that has been distributed (generated into. ok file) needs to be read and put into storage in the way of. del record. After the file is read, the file that has been read needs to be recorded, so as to avoid repeated writing when the task is restarted abnormally.

    Acquire the files in the directory that meet the conditions, read the files into the warehouse in the Writing Method, and record the status.

     

     


    附件列表


    主题: Data Development - Scheduled Task
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    9s后關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy