Skip to content

Latest commit

 

History

History
177 lines (117 loc) · 9.43 KB

README.md

File metadata and controls

177 lines (117 loc) · 9.43 KB

AVIATION ELT


Project Summary

This project demonstrates an Analytical Engineering approach to aviation data using a modern data stack, covering Extraction, Loading, and Transformation (ELT). It leverages tools like Airbyte for data ingestion and dbt for data transformation to provide actionable insights for business decision-making.

Role of an Analytic Engineer

Analytical Engineers bridge the gap between Data Engineering and Data Analysis, facilitating data interpretation and actionable insights for business needs. They use data modeling techniques and tools like Snowflake and dbt to ensure data is ready for analysis and decision-making.

Project Overview

This project involves analyzing an airline's operational data from the booking database, focusing on ticket sales, bookings, flights, and more. The goal is to derive insights that inform route optimization, customer retention, and operational efficiencies.


oltp_erd

Business Requirements

The project aims to help the airline understand and utilize its data to:

  • Optimize routes and reduce costs.
  • Enhance customer acquisition and retention.
  • Improve operational efficiency.

After interviews and research into the industry, an Enterprise Bus Metrix was developed to structure the project and guide the creation of Dimensional Models, aligning with the business requirements.

Data Modeling:

Conceptual Model:

This is a high-level representation of key data entities and their relationships, used for communicating with stakeholders and defining data requirements.


conceptual_model

Logical Model:

The logical model is a detailed specification of data entities, attributes, and relationships, serving as a blueprint for database design without considering physical implementation.


logical_model

Physical Model:

The physical model translates the logical model into a technical blueprint for actual database creation. It specifies the exact schema, including tables, columns, and indexes


physical_model

Project Tools & Architecture

In this project, I made use of cloud-based data tools in the modern data stack such as;

  • Data Architecture: Designed using Draw.io
image

  • Data Ingestion: Using Airbyte: Airbyte simplifies data extraction from various sources and loading into Snowflake, ensuring data integrity and availability for transformation and analysis.


airbyte_jobs

  • Data Transformation: Using the data build tool (dbt): dbt automates data cleaning, transformation, and testing. It ensures data quality through modular SQL practices, documentation, and testing capabilities, making data ready for analysis and reporting.


dbt_run dbt_test

  • Data Warehousing - Snowflake: Snowflake is a cloud-based data warehousing platform that offers scalable storage and compute power for efficiently managing and analyzing large volumes of data.

Data Pipeline Layers:

  • src_aviation:

    This source layer is a replica of the source database (OLTP system). It's the initial layer where the Airbyte run ingests the data. It's untouched and unprocessed.
    src_aviation

  • Data cleaning and transformation were primarily conducted in the staging layer. Tasks included:

    • Text cleaning to select English values from JSON columns containing both English and Russian content.
    • Creation of surrogate keys for tables due to lengthy and complex natural keys that were not integers.
    • Separation of the customer table from the fact table (tickets) to accommodate customer dimension metrics.

stg_aviation
  • The data warehouse layer implemented dimensional modeling using the Star schema. It focused on aggregating key metrics:
    • Financial metrics like cumulative_revenue.
    • Operational metrics such as age_in_fleet, flight_to_next_maintenance, and number_of_maintenance.
    • Additionally, dbt tests were developed in this layer to validate data integrity.

dwh_aviation
  • The analytics layer consolidated business scenarios into a single table, enabling the creation of models to address executive queries and provide key insights. This layer minimized the need for complex joins, streamlining dashboard creation in business intelligence tools.

obt_aviation

Business Questions

  • The Operating and Finance teams sought insights into the most frequently booked routes, focusing on ticket volumes and cumulative revenue generated by each flight class. To achieve this, I extracted and integrated data from the Fact_Tickets, Fact_Flights, and Dim_Airport tables, providing a comprehensive analysis of route performance and financial contributions by flight class.

  • The Marketing team seeks to design campaigns based on passenger travel patterns, including preferred routes, ticket classes, and trip frequency, to encourage loyalty through offers like complimentary flights for frequent flyers. The Finance team uses these metrics to assess cumulative revenue and calculate passenger lifetime value.

    Data from Fact_Tickets, Fact_Flights, and Dim_Customer tables were used to support these analyses.

  • The Engineering and Operations teams focus on closely monitoring aircraft to ensure optimal fleet performance. Following FAA recommendations that each aircraft undergo an A-check every 200-300 flights, I developed a tracking system that counts down from 300 with each flight. This system enables the Engineering team to anticipate maintenance needs and plan accordingly, while the Operations team can identify which aircraft are available for flights or undergoing maintenance.

    Additionally, the Finance team is interested in tracking the revenue generated by each aircraft across different flight classes, allowing for an assessment of financial performance and resource allocation within the fleet. To gather these insights, I utilized data from the Fact_Tickets, Fact_Flights, and Dim_Aircraft tables, ensuring a comprehensive view of aircraft maintenance schedules and financial contributions.

    Conclusion

    This aviation analytics project has successfully utilized modern data engineering techniques to extract, transform, and load (ETL) data from operational databases into a structured data warehouse. By implementing Airbyte for seamless data ingestion, dbt for efficient data transformation, and Snowflake for scalable data storage and querying, we've enabled the airline to derive actionable insights. These insights include optimizing route selection, enhancing customer retention strategies, and improving operational efficiencies. This project highlights the critical role of analytical engineering in translating raw data into meaningful business outcomes, driving informed decision-making, and strategic planning within the aviation industry.

    Resources consulted: