Home Up PDF Prof. Dr. Ingo Claßen
Extract, Transform, Load (ETL) - DSML

Data Provisioning

Staging Area

  • Data bucket for data preparation
  • Location depends on architecture

Extract, Transform, Load (ETL)

Extract

  • Get data from source systems

Transform

  • Clean data
  • Normalize data
  • Transform data

Load

  • Store data in Data Warehouse

Extract, Load, Transform (ELT) - L and T switched

Extract

Load

Transform

ETL Operations

Input / Output

  • File Access
  • Database Access
  • API / WebServices Access

Data Manipulation

  • Operations on Data - to Clean, Normalize, Transform
  • Filter
  • Split / Union
  • Join
  • Aggregation
  • Calculations

Visual Description of Dataflow

Code-based Description of Dataflow

with
DimGeography as (
SELECT DISTINCT
    ROW_NUMBER() OVER (
        ORDER BY a.City, sp.StateProvinceCode, cr.CountryRegionCode, a.PostalCode
    ) AS GeographyKey,
    a.City,
    sp.StateProvinceCode,
    sp.Name AS StateProvinceName,
    cr.CountryRegionCode,
    cr.Name AS EnglishCountryRegionName,
    a.PostalCode
FROM Address a
LEFT JOIN StateProvince sp ON a.StateProvinceID = sp.StateProvinceID
LEFT JOIN CountryRegion cr ON sp.CountryRegionCode = cr.CountryRegionCode
),
DimCustomerRaw as (
SELECT
    c.CustomerID AS CustomerKey,
    (
        SELECT g.GeographyKey
        FROM DimGeography g
        INNER JOIN Address a ON
            g.City = a.City
            AND g.PostalCode = a.PostalCode
        INNER JOIN BusinessEntityAddress bea ON a.AddressID = bea.AddressID
        WHERE bea.BusinessEntityID = p.BusinessEntityID
        LIMIT 1
    ) AS GeographyKey,
    p.FirstName,
    p.LastName,
FROM Customer c
LEFT JOIN Person p ON c.PersonID = p.BusinessEntityID)
INSERT INTO DimCustomer 
SELECT 
    CustomerKey,
    FirstName,
    LastName,
    City,
    StateProvinceCode,
    StateProvinceName,
    CountryRegionCode,
    EnglishCountryRegionName,
    PostalCode
FROM DimCustomerRaw dcr
LEFT JOIN DimGeography dg on dcr.GeographyKey = dg.GeographyKey;