Data Warehouse is a collection of data that has the properties of object-oriented, time-variant, and integrated in a collection of data as a supporter in the decision-making process. Data Warehouse serves as a data warehouse, later the data in it can be used to be processed at any time needed. The Data Warehouse is used for a more specific scope as in an organization or company. Extract, Transform and Load (ETL) is the process of retrieving and changing the data from the source system and then put it into the data warehouse.
Fundamental principles on the extraction of data include:
Types of data transformation:
Another important principle at ETL:
Approaches to implementing ETL:
The following is a category of some method of ETL with the condition who move the data out of the source system:
The following is a category of several methods based on the location of the ETL process:
Some types of data source should be a consideration before doing the extraction, including:
Fundamental principles on the extraction of data include:
- The volume of data that is taken.
- OLTP systems are designed so that the data is small in size. Need to be careful so as not to slow down – the source of the system too much.
- The extraction process is done as quickly as possible.
- The extraction process is done as much as possible be small.
- Expected changes in the source system may be minimal.
Types of data transformation:
- Formatting and standardization.
- Convert to a certain number or date format.
- Translate into other forms.
- Aggregation or summarize the data at the level of taller.
Another important principle at ETL:
- Leakage (leakage) occurs when the process of ETL thought he had complete data download all of the source systems, but in fact, there are some missing records.
- Recoverability (recovery) process that ETL must be robust so that if a failure occurred, it can be immediately restored without loss or damage to data.
- The architectural approach and ETL
Approaches to implementing ETL:
- Retrieve data from the source system, place it on the staging area, and then convert that data into the load data warehouse.
- Retrieve data from the source system, change it in memory and then change the data warehouse directly.
- Retrieve data from the source system, load it into the data warehouse, and then apply the transformation by changing the data in the data warehouse.
The following is a category of some method of ETL with the condition who move the data out of the source system:
- Interesting data ETL process out with a query to the source system regularly.
- A trigger in the database of the source system data change push out.
- The scheduled process in the source system to export the data regularly.
- The log reader to read the log files of a database to identify the data changes.
The following is a category of several methods based on the location of the ETL process:
- ETL process execution in a separate the ETL server between a source system and the data warehouse server.
- The execution process of ETL in data warehouse server.
- ETL process execution in the server hosting the source system.
- General Considerations
Some types of data source should be a consideration before doing the extraction, including:
- Database (ODBC, OLEDB, ADO.NET, JDBC, etc.).
- System files (structured, unstructured or semi-structure).
- Queue
- Service
Advertisement
No comments