In this project, I have created an End-To-End Data Engineering Project on Airlines and its operations data.
Creating ETL pipeline designed to extract data from the source S3, transform it to ensure data quality and consistency using AWS Glue, and load it into the Redshift for further analysis and reporting.
- Retrieve metadata of Airlines Flight details from the source S3 using Glue Crawler.
- Retrieve metadata of Dimension tables related to Airport Details and crew from AWS Redshift tables using Glue Crawler.
- Apply quality checks, cleaning, and standardization to ensure high-quality data.
- Load transformed data into the destination Redshift fact table for analysis. And Loading Bad Data into S3 bucket for further analysis.
- Monitoring and Logging: Monitor the ETL pipeline's performance and log any errors or anomalies for easy troubleshooting and getting alerts on the gmail.
S3 (Simple Storage Service)
Glue Crawler
Glue Catalog
Visual ETL
Redshift
CloudWatch
EventBridge
Step Functions
SNS (Simple Notification Service)
Here's the DataSet link - https://www.kaggle.com/datasets/iamsouravbanerjee/airline-dataset