Filtering out Invalid Data Before JSON Parsing

  • Last update: January 14, 2026
  • Overview

    Application Scenario

    The following issue may occur when you parse JSON data at scale:

    Parsing invalid JSON data (if any) will cause the entire scheduled task to terminate. This is because JSON parsing is a data processing step, and the dirty data tolerance mechanism of scheduled tasks cannot exclude the impact of invalid JSON data on the task.

    You may want to:

    • Filter out invalid JSON data to prevent it from affecting scheduled task execution.

    • Quickly identify invalid JSON data in large-volume data scenarios.

    Implementation Method

    You can define an is_valid_json function using Python to verify the validity of JSON data and parse valid JSON data only.

    Procedure

    Preparation

    1. The Python operator is required in the solution outlined in this document. You need to refer to the Python Operator document to prepare the environment and understand its usage.

    2. In this solution, JSON data is stored in a TXT file. Therefore, you need to prepare either of the following data connections: FTP/SFTP Data Connection, or Data Connection to a Local Server Directory.

    The JSON data to be parsed is as follows:

    You can download the example data: test_1.txt

    Reading JSON Data

    1. Create a scheduled task, drag a Data Transformation node onto the page, and enter the Data Transformation editing page.

    2. Drag in a File Input operator and configure it to read JSON data. In this solution, JSON data is stored in a TXT file. You can configure the operator based on actual conditions, as shown in the following figure.

    Set Filename Extension to TXT; set Column Separator to None; untick First Row As Field Name, select Manual Acquisition in Output Field, name the output field column, and set the data type to varchar.

    Click Data Preview, as shown in the following figure.

    Filtering Out Invalid JSON Data

    1. Drag in a Python operator and define an is_valid_json function to determine whether the JSON data is valid, as shown in the following figure.

    iconNote:
    The value of input in the code below needs to be modified according to actual conditions.

    import pandas as pd

    # You must use pandas.

    # If there is a connected data source, you can click the data source above to use it. Data from the input source exists in a pandas DataFrame, and can be processed through the DataFrame method.

    # ----------------------------------------

    import json  

      

    def is_valid_json(json_string):

        try:  

            json.loads(json_string)  

            return value  

        except json.JSONDecodeError:  

            return false  

      

    # Example  

    input=File Input

    a=[]

    for row in input.index:

      json_string=input.loc[row]['column']

      a.append(is_valid_json(json_string))

    input['isvalid']=a

     

    # ----------------------------------------

    output = input

    # Assign the data to be output to the downstream operator to an output variable. If the data is of the DataFrame data type, output it in the form of a two-dimensional table. If the data is of other data types, output it in the form of a string.

    Code explanation:

    The script iterates through each row of the input data, parses the JSON string in the specific column (column), and checks if it conforms to a valid JSON format. Then, the script adds the check result to a new column isvalid, indicating whether each JSON string is valid. Finally, it passes the input data with the new column as output to the downstream operator.

    2. Drag a Data Filtering operator onto the page and configure it to obtain data where the isvalid field value is true (indicating valid JSON data), as shown in the following figure.

    Parsing JSON Data

    1. Drag in a JSON Parsing operator and configure it to parse the correctly formatted data, as shown in the following figure.


    2. Click Data Preview, as shown in the following figure.

    Data Output

    1. Drag a DB Table Output operator onto the page and configure the operator, as shown in the following figure. Select Append Data to Existing File as the write method, as shown in the following figure. You can configure it according to actual conditions.

    Effect Display

    Click Run to execute the task. After a successful execution, the log is displayed, as shown in the following figure.

    Data in the database table is shown in the following figure.

    Subsequent Operation

    Click the Publish button to publish the scheduled task to Production Mode, as shown in the following figure.

    In Production Mode, click the icon to configure the scheduling plan, where you can set the task execution frequency.

     


    附件列表


    主题: Data Development - Scheduled Task
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy