Concept of Valid Data
1. Reliable data: data that have trustworthy and credible sources and are based on facts
2. Appropriate data: data with suitable granularity (Too coarse granularity means valuable insights may be missed, while too fine granularity results in a significant increase in data volume, leading to higher processing difficulty and acquisition costs.)
Data Format Suitable for Analysis
1. The header is a single row, with no merged cells.
2. Rows and columns have no dependencies (aggregation relationships), with expansion only allowed to be vertical, not horizontal.
For example, multiple columns with headers such as Sales in 2018, Sales in 2019, Sales in 2020... are nonstandard for a data table.
3. Each value in a data table must be indivisible.
For example, the common expressions 100 USD and 200 units in Excel are nonstandard.
4. Each row of data must not be duplicated and unique, therefore, a unique field/primary key can be used to distinguish the data table.