AirBnB-Data-Engineering-Project

Overview

This project offers an end-to-end data pipeline. It begins with importing CSV data into PostgreSQL as a staging area. Next, Python is used to extract data from the staging area, followed by transformation using the Pandas library. The transformed data is loaded back into a designed data warehouse on PostgreSQL. Finally, actionable business insights are derived from the loaded data.
The data engineering pipeline of the project:

Source system

Airbnb is an American company operating an online marketplace for short- and long-term homestays and experiences. The company acts as a broker and charges a commission from each booking.
The data of Airbnb typically includes information related to listings, bookings, reviews, host profiles, guest profiles, pricing, availability calendars, and various other transactional and operational data associated with the Airbnb platform.This data helps Airbnb manage its marketplace, facilitate transactions between hosts and guests, improve user experience, and make data-driven decisions to enhance its services.
For detailed data profiling : Click here

Staging Area

The staging area involves extracting data from CSV files and loading it into a PostgreSQL database. During this process, careful consideration is given to data types for memory optimization.
Attached is a photo showing the relations between tables:

Designing The Data Warehouse

appling Dimensional modeling to design a galaxy-schema model depending on our business processes
Dimensional Modeling steps :
- (1) The business process revolves around leveraging the data warehouse to gain insights and make informed decisions related to the asked business questions ,Most fact tables focus on the results of a single business process
- (2) Identifying Granularity:
  - Fact Tables: Data stored at the daily level, with each row representing a specific listing on a specific date.
  - Dimension Tables: Aggregate or summarize data at varying levels of granularity, based on specific dimensions and metrics included in each table. For example, summarizing key metrics such as price, number of reviews, and availability at the listing level.
- (3) Identifying Dimensions:
  - Listing Dimension: Includes all attributes related to listings such as ID, name, description, host details, location, property type, room type, and amenities.
  - Date Dimension: Represents date-related attributes such as date, month, year, and day of the week for analyzing trends and patterns over time.
- (4) Identifying Facts:
  - Reservations Fact: Contains data related to listing availability, price, and minimum/maximum nights for each listing on specific dates.
  - Reviews Fact: Stores data related to reviews for each listing, including review ID, date, reviewer details, and comments.
See the Data Warehouse Design here

ETL process:

This process includes extracting data from the staging layer, converting each table into a pandas DataFrame, performing transformations such as removing leading and trailing whitespace from columns, changing some column datatypes to match the data warehouse, adding surrogate keys, etc. Then, the data is loaded back to the data warehouse on PostgreSQL.
You can find the code for the ETL process in this jupyter notebook file

Actionable Business Insights

Now, you can turn business questions into actionable business insights.
You can find the questions, along with their queries used to answer them, and the CSV files containing these insights in this folder

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Actionable Business Insights		Actionable Business Insights
Data Warehouse Design		Data Warehouse Design
Data_Profiling		Data_Profiling
ETL using python script.ipynb		ETL using python script.ipynb
README.md		README.md
Staging Layer(Database).sql		Staging Layer(Database).sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AirBnB-Data-Engineering-Project

Overview

Source system

Staging Area

Designing The Data Warehouse

ETL process:

Actionable Business Insights

About

Releases

Packages

Languages

ArwaEiad/AirBnB-Data-Engineering-Project

Folders and files

Latest commit

History

Repository files navigation

AirBnB-Data-Engineering-Project

Overview

Source system

Staging Area

Designing The Data Warehouse

ETL process:

Actionable Business Insights

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages