Deleting Duplicate Data- FineBI Help Document

Last update：April 11, 2024

Overview

Version

FineBI Version	Functional Change
6.0.7	/

Application Scenario

Scenario One: Handling Dirty Data

The Delete Duplicate Row function is helpful when you handle dirty data and delete duplicate rows.

For example, dirty data appears when a row of order data in a table is triggered twice. In this case, you can perform Delete Duplicate Row to retain only one row of order data.

Scenario Two: Retaining Partial Data

You need to collect data of the machine's status. However, the random data collection causes an uneven data distribution, with 10 to 20 rows of data collected per minute. In this case, you can perform Delete Duplicate Row to retain only one row of data per minute.

Scenario Three: Deleting Duplicate Rows

For example, you need to analyze user data (required data) in the following wide table.

In this case, you can first click Field Settings to delete other fields. After that, you can perform Delete Duplicate Row to deduplicate user data.

Function Description

The system judges whether there are rows of duplicate data in the deduplication field you selected. If you tick Select All from the drop-down list of Select Deduplication Field, the system will judge whether there are rows of duplicate data in all fields.

If there are rows of duplicate data, the system will only retain the first one.

Example

You can download the sample data: Order Information.xlsx.

1. Upload the sample data to an analysis subject, as shown in the following figure.

Some orders are recorded twice with duplicate data. Only data in the ID field are different.

2. Click More and select Delete Duplicate Row from the drop-down list.

3. The system judges whether there are rows of duplicate data in the deduplication field you selected. For data of duplicated rows, if data in the Date, Name, and Volume fields are all the same, then it can be inferred that the data of duplicated rows come from the same order data.

Select Date, Name, and Volume from the drop-down list of Select Deduplication Field as the judgment basis of duplication.

Note:

The system only retains the first row of data by default after duplication judgment. For example, if the value of A1000005 is duplicated with that of A1000006, only the value of A1000005 will be retained.

4. Click Save and Update to obtain data without duplicate values.

The following table shows different results according to different deduplication fields.

Deduplication Field

Result

Region only

Only one row of order data in each region will be retained.

Name only

Only one row of order data for each user will be retained.

Usage Recommendation

The first row of data that is listed at the top among duplicated rows is retained by default after the system has judged duplicated rows.

Therefore, the retained first row of data may be different when you perform Delete Duplicate Row in different steps. You are advised to perform Delete Duplicate Row in the last step of data analysis.

Helpful
Not helpful
Only read

中文（简体）中文（繁體）日本語

English

Deleting Duplicate Data