US Flight Departures - 2022

Newark, New Jersey @nicolasjehly

👨🏻‍💻

Author: @juansevargasc

Dataset Source: 2022 U.S. Domestic Flights Departures, Kaggle

Topics: Data Engineering, ETL, Data Analysis, Data Warehouse

Execution

Environment

conda create --name <env> --file requirements.txt
# or
pip install -r requirements.txt

Main python file

python src/main.py

Description

This project aims to explore the US flight departures features in 2022. This will be made through the analysis of weather conditions, cancellations, dates, locations and carriers among others. Nevertheless, it will feature first a ETL pipeline to preprocess different data sources and then load into a OLAP database, for BI consumption.

Extract data from different sources. In this case it comes from 5 CSV Files but two of them are worked out to be in a Relational Database and the other to be a JSON file so simulate different types of sources. See prework.
Design a data schema that allows to query data for BI purposes
Create an ETL Pipeline.
Clean data by choosing which NaN (empty) values should be dropped.
Standardizing names, making conventions.
Testing and enforcing data types and schemas.
Build a Star architecture.

Data Analysis Stage

Objectives

Make questions interesting questions such as:
- Is there a correlation between delays and wheather?
- How many flights did a certain airline make during the year?
- What's the most common route? Is there an impact from wheather in a route?
Make a Data exploration and characterize some columns.
Make some Statistics:
- What's the average of flights per day?
- How many flights are delayed per day?
- Does the wheather events follow a normal distribution? Another type of distribuition?

1. Data Engineering Stage

Introduction

The project aims to analyze the files that are given in this dataset: 2022 U.S. Domestic Flights Departures

Author: Jacky Luo

Prework

The prework is made to take some original files and export them to SQL database and a JSON file to simulate we have different data sources in the project. See more in Prework

Documentation of Stages

Raw Tables
Staging Tables
Star Schema Tables

Final Dim - Fact Schema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

US Flight Departures - 2022

Execution

Description

Table of contents

Data Engineering Stage

Data Analysis Stage

1. Data Engineering Stage

Star Schema Tables

Files

README.md

Latest commit

History

README.md

File metadata and controls

US Flight Departures - 2022

Execution

Description

Table of contents

Data Engineering Stage

Data Analysis Stage

1. Data Engineering Stage

Star Schema Tables