Slowly changing Dimensions are dimensions that change slowly over time rather than on a regular basis, for example salary of an individual, job location etc.
There are mainly 3 types of SCD:
·
SCD Type 1:
The new record replaces the existing record. No trace of the old record
exists.
·
SCD Type 2: Creating a new additional record
(row). History is tracked.
·
SCD Type 3: Adding a new column
Let us see how we will implement SCD Type 1 and Type 2 in
ODI.
Requirement: To apply
SCD Type 2 in Employees OLAP table for capturing the history of salary.
SCD TYPE 1:
In SCD Type 1 we are not keeping the history, we are just
replacing the data with new one.
We can use IKM Oracle
Incremental update from the IKM Selector and there are no additional
modifications required in the mapping.
We can look into SCD Type 2 and see the differences between
SCD Type 1 and Type 2 in the mapping.
SCD TYPE 2:
To implement SCD TYPE 2 we can use Employees table from HR
Schema.
Source Schema
|
HR
|
Source Tables
|
“EMPLOYEES”
|
Target Table
|
“EMPLOYEES_SCD”
|
Scenario
|
To apply SCD Type 2 in “EMPLOYEES” table for capturing the history of
salary.
|
ROW_ID: Surrogate key for the table.
EMPLOYEE_ID: Natural Key.
START_DATE: Starting time stamp (When the record is inserted
into the data base).
END_DATE: Ending time stamp (We can give a future date to
this column).
FLAG: Current record flag.
o Create a database sequence in your target schema;
this will be used as surrogate key in target table.
Now reverse engineer from your source and target
models to populate the table in corresponding models.
o
After the reverse engineering you can see your
target table in your model
o
Now we have to define some properties for the
table and columns
o
Double click on EMPLOYEES_SCD Table and from the
definition tab select OLAP type as slowly
changing dimension.
o
Now expand EMPLOYEES_SCD Table and give SCD behavior for each column.
Now expand EMPLOYEES_SCD Table and give SCD behavior for each column.
o
ROW_ID is Surrogate
key
- EMAIL: Give overwrite on change.
- SALARY: The salary of an employee is an slowly changing attribute, keeping history make sense so we can give Add row on change to this attribute so that when there is a change in the value a new column will be added and previous records is stored.
- START_DATE: This column is Starting Timestamp, when the record is inserted into the data base.
o
END_DATE: This column is Ending Time stamp; we assign a
far future date for time being.
FLAG: This column indicates status of the record
select Current Record Flag for this
column.
§ 1: Present record.
§ 0: Old record.
Now we are ready with the SCD Target table and now we can
move to the interface.
o Creating interface for SCD Type2 implementation.
o
From your project folder right click on
interface and select new interface.
o Give
a name for your interface in definition tab.
o In the Mapping tab drag your source and target
to your interface and it will ask for auto mapping click ok for that.
o
For ROW_ID we have to create a sequence that will execute your data base sequence you created in your database.
For ROW_ID we have to create a sequence that will execute your data base sequence you created in your database.
o
From Sequence select New Sequence.
o
Give a name for your sequence.
o
Then select Native sequence and select the
sequence that you created in your data base.
o
In this example we created SEQ_ROW_ID in our
EMPLOYEE_TARGET schema and selected this sequence here.
o
Now select the ROW_ID column in your target and
select edit icon from mapping properties.
o
Now double click on your data base sequence
o
ROW_ID: Values from the data base sequence is added
to this column.
o
START_DATE: Mark SYSDATE for this column
o
END_DATE: We can give a future date to this
column(to_date(’01-01-2400’,’mm-dd-yyyy’))
o
FLAG: Default value will be 1.
o
Change the execution of EMPLOYEE_ID, FIRST_NAME,
LAST_NAME, EMAIL and SALARY to staging area from source.
o
Execution of ROW_ID, START_DATE, END_DATE and
FLAG to Target.
o In the quick edit tab check the unique key column for ROW_ID, this will be our primary key
o
In the IKM Selector you have to select IKM Oracle Slowly Changing Dimension,
for that you need to have this IKM in your project.
Then
give flow control and static control to true.
v If you are working with SCD TYPE 1 you have to select IKM as Oracle Incremental Update and
there is no need to set SCD behaviour for the target table you just need source
and target and load data directly by selecting Incremantal Upate IKM. Don’t
forget to set primary key constraint in target table and set flow control and
static control to true.
o
From the control tab select CKM SQL
o
Now save your interface and execute it.
o
In the screenshot you can see that END_DATE is a
future date and flag is 1 for current records.
o
Now we can update some columns in source and
check how it will be reflected in our target.
o
In this example I am updating SALARY of a user
and EMAIL of another user
o
Commit this after updating the records.
o Now
execute your interface from ODI.
o Now
check the updated records from your target DB
- In the target table you can see a new record is inserted for record having EMPLOYEE_ID=102 with the updated salary. This is since the source record was updated with the new salary and as the salary column has SCD behavior of add row on change.
For the latest
record current flag is set to 1 and the old record the current flag is changed
to 0 and END_DATE of the old record is changed to the date at which the new
record is inserted.
- For EMPLOYEE_ID = 103 we updated EMAIL ID of that user. EMAIL has SCD behavior of Overwrite on Change so previous EMAIL ID is replaced with new EMAIL.
No comments:
Post a Comment