Enterprise ELT Framework

Project Overview

This project demonstrates an end-to-end Enterprise ELT (Extract, Load, Transform) pipeline from an API source to cloud storage, transformation using dbt, and finally visualization through Power BI connected to an SQL Server Analysis Services (SSAS) cube. This project combines modern data engineering practices with cloud solutions and business intelligence tools.

Technologies Used

ETL: Airbyte 🌀
API: FastAPI 🚀
Cloud: Google Cloud Platform (GCP) ☁️
Database: Google Cloud SQL (MS SQL Server) 🗄️
Transformation: dbt (Data Build Tool) 🔄
Containerization: Docker 🐳
Orchestration: Prefect 🕹️
Scripting: Expect Script for automating keystrokes, PowerShell #️⃣
Analytics: SSAS (SQL Server Analysis Services) 📊
Visualization: Power BI 📈

Pipeline Steps

Custom API Connector with Airbyte:
- Created a custom connector in Airbyte to fetch product data from the DummyJSON API using the following configurations:
  - API URL Path: https://dummyjson.com/products?limit=101
  - HTTP Method: GET
  - Primary Key: id
  - Field Path: products
Destination Connector (Google Cloud SQL):
- The MS SQL connector in Airbyte was used to connect the pipeline to Google Cloud SQL with the following details:
  - Destination Name: GCP SQL
  - Database Host: Google Cloud SQL instance
  - Port: 1433 (default for MS SQL)
Transformation with dbt:
- dbt was used to transform the data after loading it into the cloud database.
- The data transformations created a new table named Landing.Products.
Orchestration with Prefect:
- Prefect is an open-source orchestration tool used to manage the ETL pipeline. It enables automation, scheduling, and monitoring of data workflows.
- Prefect’s flow orchestrates the following steps:
  1. Trigger the Airbyte sync to fetch the data.
  2. Run dbt transformations to clean and prepare the data.
  3. Log the success or failure of the data pipeline.
SSAS Tabular Model:
- A tabular model was created using SQL Server Analysis Services (SSAS).
- Connected the SQL database and loaded the Landing.Products table.
- DAX (Data Analysis Expressions) measures were written to analyze the data.
Power BI Visualization:
- Power BI connected to the SSAS cube to retrieve both the table data and the DAX measures.
- Built interactive reports and dashboards for data visualization.

How to Run the Pipeline

In order to execute the solution, ensure you have the following pre-requisite applications installed on your local machine:

Airbyte Server must be running.
Docker Desktop must be installed and running.

To run the pipeline, follow these steps:

Open a terminal in the project's root directory or navigate (cd) to the project's root folder.
Activate the virtual environment. This is necessary to manage dependencies. If you're using Windows, run:
```
.\venv\Scripts\activate
```
This assumes your virtual environment is named venv.
Build the Docker image:
```
docker build -t airbyte-dbt-prefect-container .
```
This will build a Docker image using the project files.
Run the Docker container:
```
docker run -it --rm --name airbyte-dbt-prefect -p 8081:4200 airbyte-dbt-prefect-container
```
This command will run the entire pipeline in steps. Here’s what happens:
- First, the Prefect server will be started.
- After a 90-second pause, the extraction and loading using Airbyte will be triggered.
- Then, dbt will run the data transformation as defined in the models, and the final data will be stored in the Google Cloud SQL database.

This is the general process to execute the solution locally. For a detailed explanation of the port bindings, the roles of each file, and further configuration details, refer to the detailed documentation in ProjectSetupDetails.md.

How Prefect Works

Prefect is a modern workflow orchestration platform that enables us to automate our data pipelines. In this project, Prefect was used to orchestrate the extraction from the API, loading into the database, and transforming the data using dbt. Prefect’s UI allowed us to monitor the pipeline’s progress, view logs, and manage schedules.

Flow Definition: A Prefect flow was created to manage the Airbyte sync and dbt transformation steps.
Task Orchestration: Prefect's task runner ensures that each task, such as triggering an Airbyte sync or running a dbt transformation, runs in the right order.

Conclusion

This project successfully implements a modern ELT pipeline using Airbyte for data extraction, Google Cloud SQL for storage, dbt for data transformation, SSAS for analytics, and Power BI for visualization. By utilizing Prefect, we ensure a fully orchestrated, automated pipeline that is both scalable and flexible for future enhancements.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
BI		BI
SSAS/TabularSSAS		SSAS/TabularSSAS
dbt_config		dbt_config
logs		logs
my_gcp_project		my_gcp_project
orchestration		orchestration
resources		resources
target		target
transformation		transformation
venv		venv
.gitattributes		.gitattributes
Dockerfile		Dockerfile
ProjectSetupDetails.md		ProjectSetupDetails.md
README.md		README.md
dbt_project.yml		dbt_project.yml
prefect_config.toml		prefect_config.toml
prefect_server.log		prefect_server.log
profiles.toml		profiles.toml
requirements.txt		requirements.txt
start_prefect_and_run_flow.sh		start_prefect_and_run_flow.sh
start_prefect_server.expect		start_prefect_server.expect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enterprise ELT Framework

Project Overview

Table of Contents

Technologies Used

Pipeline Steps

How to Run the Pipeline

How Prefect Works

Conclusion

About

Releases

Packages

Languages

KSwaviman/Enterprise-ELT-Framework

Folders and files

Latest commit

History

Repository files navigation

Enterprise ELT Framework

Project Overview

Table of Contents

Technologies Used

Pipeline Steps

How to Run the Pipeline

How Prefect Works

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages