Tuesday 16 August 2016

Data Warehouse and Concepts.

Why we need data warehouse ?


  • Consider the case of a multinational company. In that company the source of data can be in different format and different type for example
    • Customer data in Siebel (CRM Tool)
    • HR System data in people soft 
    • CSV files
    • Excel files
    • XML Sources
  • Querying data from different systems will be a difficult task.
So what data warehouse doing is consolidating all the data from different systems into a single location. And that location is called data Warehouse and all the data from different sources are loaded into the data warehouse through ETL tools like informatica,ODI. And put together in a single location that will make our job simple and we can use data from different sources from the data warehouse.

OLTP and Data Warehouse(OLAP).

OLTP: 
  • Contain large number of users and optimized for transnational processing
  • For example amazon multiple users are connected and they are doing simple transactions.
  • Real time validation is necessary because users are active in all time.
  • Normalized data model.
  • More and more joins so not preferred for fast data retrieval.

Data Ware House(OLAP): 
  • Contain limited number of business users and used for analytical purpose,
  • For example number for orders placed in last quarter. Large query include aggregations and calculations.
  • Load data using ETL. No real time validation needed.
  • De-normalized Model or Dimensional modelling: Star schema and Snow-flake schema.Star Schema: 
    • Star Schema: 
      • Fact and Dimensional table.
      • Denormalized data Eg: Product dimension contain most of the details of product and Store dimension contain most of the details of stores.
      • Dimensions are denormailzed.
      • Dimension Table: Describe the details of dimension.
      • Fact Table : Contain measures and foreign keys of dimension table.
      • We can answer more questions using this model.
    • Snowflake Schema:
      • Dimensions are normalized: In snowflake schema we are simplifying our dimension table by normalizing the data. For example if the Product dimension table contain details of product,category,brand etc. then we are dividing this into different table like one table for Category and another for Brand. By this method we can make our Dimension table into multiple Dimension table.
      • If we have large number of different Brands of product then it is better to keep this as a separate Brand table for better querying purpose.
      • We choose this method when we have large volume of data in our dimension table.

1 comment: