ETL is a process of retrieving and sending data from source data to the data warehouse. In the process of data retrieval, the data must be clean in order to obtain good data quality. For example, there is an invalid phone number, there is a codebook that does not exist anymore, there are some null data, and so forth. The traditional approach to the ETL process takes data from the source data, puts it in the staging area, and then transform and load it into the data warehouse. Data quality is the most important thing to be considered in building a data warehouse because the quality of the data affects the ETL process. In the ETL process if the data occurs a noise then the process ETL will fail. The quality of data can be seen from several parameters, namely:
- Accurate, When looking at the consumer address record, then the address must contain the city, zip code. If the consumer has a business then the consumer address also contains the address or location of the business.
- Up to date, Always provide up-to-date information in case of change.
- Complete, Each data must contain important information, for example for the correspondence process. Suppose name apartment, no apartment, street, zip code, and if needed the address plan or route.
- No redundancy, Suppose there is only one record per contact for each address in the correspondence.
- Standard, Each record should be standard in naming, reading process, and abbreviations.
Advertisement
No comments