Skip to content

Latest commit

 

History

History
184 lines (111 loc) · 9.78 KB

File metadata and controls

184 lines (111 loc) · 9.78 KB

Architechture(1)

Global Cycling Company Data Warehouse 🚴‍♂️

Welcome to the Global Cycling Company Data Warehouse by AdventureWorks 2016! From data collection to insightful reporting, our process involves gathering data from various sources, integrating it into a centralized platform, transforming it for analysis, storing it efficiently, applying advanced analytics techniques like Slowly Changing Dimension (SCD), generating insightful reports and visualizations, and continuously improving the warehouse to meet evolving business needs and technological advancements.

Overview 🌟

The Global Cycling Company Data Warehouse project involves the following key components:

  • Data Consolidation: Gathering data from transactional databases, external feeds, spreadsheets, and online sales platforms to centralize information for enhanced reporting and analytics.
  • Optimized Utilization: Structuring and organizing data to ensure accessibility and facilitate insightful analysis.
  • Purposeful Insights: Leveraging the data warehouse to gain deeper insights into sales trends, customer behavior, inventory management, and market dynamics.
  • Decision Support: Empowering decision-makers with data-driven strategies for better forecasting, inventory optimization, targeted marketing, and predictive analytics.

💾 Dataset

AdventureWorks is a database provided by Microsoft for free on online platforms. It is a product sample database initially published by Microsoft to demonstrate the supposed design of a SQL server database using SQL Server 2019. Here are some key points to know about AdventureWorks:

  • AdventureWorks database supports a manufacturing MNC named Adventure Works Cycles.
  • It is a sample Online Transaction Processing (or OLTP) database, a type of data processing where multiple transactions occur concurrently. Microsoft ships these with all of their SQL server products.

This project uses the Lightweight (LT) data: a lightweight and pared-down version of the OLTP sample. Download here

Architecture 🏗️

Utilizes a star schema design with a central fact table surrounded by dimension tables optimized for query performance and simplicity.

schema

Tools and Technologies 🛠️

  • SQL Server Management Studio 2019 and MySQL (SSMS): For deployment and scheduling.
  • ODBC and OLEDB Drivers: For database connections.
  • Visual Studio 2019 (ETL): To automate the extraction, transformation, and loading processes.

Purpose and Impact 📈

The purpose of our data warehouse is to:

  • Gain deeper insights into sales trends and customer behavior
  • Improve inventory management through better forecasting and optimization
  • Enhance decision-making processes with data-driven strategies
  • Facilitate targeted marketing campaigns based on customer segmentation
  • Enable predictive analytics to anticipate market changes and customer preferences

Benefits 🎉

By implementing the Global Cycling Company Data Warehouse, we expect to:

  • Make informed, strategic decisions to drive business growth
  • Improve operational efficiency through data-driven insights
  • Enhance customer satisfaction by understanding their preferences and needs better

Challenges and Limitations ⚠️

While implementing the data warehouse, we may encounter challenges such as:

  • Data quality issues leading to inaccurate analysis
  • Complex ETL processes requiring careful management
  • Integration of data from diverse sources may pose compatibility challenges

Buidling Phases

1. Define Requirements 📝

  • Understand Business Goals: Research our company to grasp our business model, focusing on sales trends, customer demographics, and inventory optimization.
  • Identify Data Sources: Enumerate potential sources like transactional databases, external feeds, and spreadsheets.
  • Define Data Granularity and Architecture: Determine the detail level for data (e.g., daily sales data) and outline the data warehouse architecture, including ETL workflow and schema design.

2. Documentation 📄

  • Provide guidelines covering ODBC driver installation, source data extraction, and SQL script for DW creation, including key definitions and data mapping specifications.

3. ETL Workflow 🔄

  • Extract: Identify and prepare data from MySQL and SQL Server 2019.
  • Transform: Clean data, ensuring consistency and correct format, especially for keys.
  • Load: Populate the Global Cycling Company Data Warehouse (DW), focusing on incremental data loading and critical transformations.

Dimension Tables

Dimension + slowly changing dimensions

Incremental Load and Full Load.

Currency

dimcurrency

Look up to DWH to avoid data duplication.

Date

dimdate

Look up to DWH to avoid data duplication.

Slowly Changing Dimension (SCD)

migratw to dimproduct

Split into three conditions: SCD 0, SCD 1, and SCD 2 based on the incoming data criteria of Dim.Product.

updates sdc1 update scd 1 - sql query

If the criteria meet SCD 1, it will update the existing data.

image_2024-03-01_21-06-30

If the criteria meet SCD 2, it will insert new data and keep history.

Fact Tables

Fact

Full load and Truncate each time.

Online Sale

factonlinesales

Look up for SK key matched with all other SK keys of dim.Table.

Store Sale

factstoresale

Look up for SK key matched with all other SK keys of dim.Table.

4. Testing and Deployment 🛠️

  • Unit Testing: Test ETL processes, data validation, and reporting.
  • Integration Testing: Validate the entire data pipeline for correctness and logic.
  • Deployment: Deploy the data warehouse in a production environment, monitoring performance and functionality.

ETL deployment

deployment

Deploy packages to SQL server (DWH).

Execute Task Dimension

deploy-execute dimension

Testing deployment of Dimension tables in SQL server (DWH).

Execute Task Fact

deploy-execute fact

Testing deployment of Fact tables in SQL server (DWH).

Job Schedule

Create Task for Dimension Table

deploy-task dimention

Create a step task to load all Dimension tables data.

Create Task for Fact Table

deploy-task fact

Create a step task to load all Fact tables data.

Job Schedule

job-schedule

Create a schedule to execute all two steps to load all data from Dimension and Fact tables based on your preferred time schedule.

Future Productions 🔍🌟

With the data warehouse in place, we can expect to:

  • Gain deeper insights into sales trends and customer behavior.
  • Improve inventory management through better forecasting and optimization.
  • Enhance decision-making processes with data-driven strategies.
  • Facilitate targeted marketing campaigns based on customer segmentation.
  • Enable predictive analytics to anticipate market changes and customer preferences.

Contact

For further inquiries or to discuss potential collaborations, please connect with us: