A data warehouse receives massive amounts of data from diverse sources, where discrepancies in formats, statistical calibers, and value representation methods may exist.
Therefore, data with missing values is inevitable. This document introduces how to handle such data.
You can click to download the example data: Internet.xls.
If feasible, simply deleting a small amount of sample data is the most effective way to handle data with missing values.
For example, the Event_Date field in the Internet table has null values. If invalid, data with missing values can be directly deleted.
Add a Data Transformation node, and drag in a DB Table Input operator to extract data from the Internet table, as shown in the following figure.
Use a Data Filtering operator to remove data records whose Event_Date are null, as shown in the following figure.
The effect of the Data Filtering operator is equivalent to that of the SQL statement: Event_Date is not null.
When excessive missing values make deletion impractical, imputation with specified data becomes necessary.
For example, in the following table, the Session_Duration field has many null values, and you want to replace these null values with 0.
Use a New Calculation Column operator to add a new column of data. Enter the formula IF(ISNULL(#{Session_Duration}),0.0,#{Session_Duration}). This means that if Session_Duration is empty, the function returns 0.0; otherwise, it returns the specific value of Session_Duration, as shown in the following figure.
Click Data Preview. The null values have been replaced with 0, as shown in the following figure.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy