反馈已提交

网络繁忙

Deleting Duplicate Data

  • Last update:  2024-04-11
  • Overview

    Version

    FineBI Version

    Functional Change

    6.0.7

    /

    Application Scenario

    Scenario One: Handling Dirty Data

    The Delete Duplicate Row function is helpful when you handle dirty data and delete duplicate rows.

    For example, dirty data appears when a row of order data in a table is triggered twice. In this case, you can perform Delete Duplicate Row to retain only one row of order data.

    a136f6512b5781868ad2a028027bf9d.jpg

    Scenario Two: Retaining Partial Data

    You need to collect data of the machine's status. However, the random data collection causes an uneven data distribution, with 10 to 20 rows of data collected per minute. In this case, you can perform Delete Duplicate Row to retain only one row of data per minute.

    Scenario Three: Deleting Duplicate Rows

    For example, you need to analyze user data (required data) in the following wide table.

    75e3cca3aae1a86fb1063c156bc63a4.png

    In this case, you can first click Field Settings to delete other fields. After that, you can perform Delete Duplicate Row to deduplicate user data.

    b4b750c5becfe4df1506de6dcd4d896.jpg

    Function Description

    The system judges whether there are rows of duplicate data in the deduplication field you selected. If you tick Select All from the drop-down list of Select Deduplication Field, the system will judge whether there are rows of duplicate data in all fields.

    If there are rows of duplicate data, the system will only retain the first one.

    5585b396b081cfac438766ffb03ff2d.png

    Example

    You can download the sample data: Order Information.xlsx.

    1. Upload the sample data to an analysis subject, as shown in the following figure.

    56d9832ec2e776e4b74dd09c4cf203c.png

    Some orders are recorded twice with duplicate data. Only data in the ID field are different.

    2. Click More and select Delete Duplicate Row from the drop-down list.

    ee1b2917aff6143e5714c12ab50d7ff.png

    3. The system judges whether there are rows of duplicate data in the deduplication field you selected. For data of duplicated rows, if data in the Date, Name, and Volume fields are all the same, then it can be inferred that the data of duplicated rows come from the same order data.

    Select Date, Name, and Volume from the drop-down list of Select Deduplication Field as the judgment basis of duplication.

    iconNote:
    The system only retains the first row of data by default after duplication judgment. For example, if the value of A1000005 is duplicated with that of A1000006, only the value of A1000005 will be retained. 

    b81ebf9d5f4a9ff31c75787420e700a.png

    4. Click Save and Update to obtain data without duplicate values.

    The following table shows different results according to different deduplication fields.

    Deduplication Field

    Result

    Region only

    Only one row of order data in each region will be retained.

    4128427b858e3c248a8e1b51f25a97a.png

    Name only

    Only one row of order data for each user will be retained.

    31ac4b160afedf138f6decf413b8adc.png

    Usage Recommendation

    The first row of data that is listed at the top among duplicated rows is retained by default after the system has judged duplicated rows.

    Therefore, the retained first row of data may be different when you perform Delete Duplicate Row in different steps. You are advised to perform Delete Duplicate Row in the last step of data analysis.

    附件列表


    主题: Adding and Editing Data
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    feedback

    鼠标选中内容,快速反馈问题

    鼠标选中存在疑惑的内容,即可快速反馈问题,我们将会跟进处理。

    不再提示

    10s后关闭