Skip to content

Extracting the data from the API | Writing the extracted data to the Data Lake | Extracting the final required attributes to Data Warehouse.

Notifications You must be signed in to change notification settings

HakeemSalaudeen/travel-agency-project

Repository files navigation

Travel Agency Project

This article explores a travel agency project designed to recommend tourist locations based on various data points. This project involves building a platform that processes data from the Country REST API into a cloud-based Data Warehouse, utilizing numerous modern technologies and methodologies.

Project Overview

The primary objective of this project is to create a platform that can efficiently recommend tourist destinations by analyzing data from multiple sources. The platform is designed to process data from the Country REST API and store it in a cloud-based Data Warehouse. This data-driven approach allows travel agencies to provide personalized recommendations to their customers.

Data Processing Workflow

The data processing workflow is orchestrated using Apache Airflow, a powerful tool for managing complex data pipelines. The workflow consists of the following key steps:

  1. Data Extraction: The first step involves extracting raw data from the Country REST API. This data is comprehensive and includes various attributes necessary for making informed recommendations.

  2. Data Storage: The extracted raw data is stored in Amazon S3, serving as the Raw layer. The data is stored in Parquet format, which is efficient for both storage and processing.

  3. Data Transformation: Specific fields and attributes are extracted from the raw API data to facilitate downstream usage. This transformation process prepares the data for analysis and storage in the Data Warehouse.

  4. Data Loading: The final step involves loading the transformed data into the Data Warehouse, where it can be accessed for analysis and generating recommendations.

Cloud Infrastructure

The project utilizes several cloud infrastructures to ensure scalability, security, and efficiency:

  • AWS IAM: Used for managing access keys and permissions, ensuring secure access to AWS resources.
  • Amazon S3: Serves as the storage solution for raw and transformed data.
  • Snowflake: Acts as the Data Warehouse, providing a robust platform for storing and analyzing large datasets.
  • Terraform: Used for provisioning cloud infrastructure, with the Terraform State File backend managed in S3.

Code Structure

The project's codebase is organized into several Python scripts, each serving a specific function:

  • boto3session.py: Configures AWS IAM access keys for secure access to AWS resources.
  • downloadapi.py: Responsible for extracting data from the API.
  • upload_data_tos3.py and read_data_tos3.py: Handle the movement of raw data into S3 and open it for transformation.
  • transform_data.py: Performs data transformation and moves transformed data back to S3.
  • travel_agency.py: Configures the DAG (Directed Acyclic Graph) for orchestrating the workflow in Apache Airflow.

All these scripts are housed within the "includes" folder, ensuring a clean and organized codebase.

Infrastructure Management

The infrastructure required for this project is managed using Terraform, an Infrastructure as Code (IaC) tool. The Terraform configuration files are stored in the "terraform" folder, which includes:

  • Files for building cloud infrastructure components such as S3 buckets and Snowflake instances.
  • Backend configuration for managing the Terraform State File in S3.
  • Provider configuration for AWS resources.

Code Quality

To maintain high code quality, Continuous Integration (CI) is implemented using the Ruff library. This ensures that all code adheres to best practices and follows established coding standards.

Conclusion

This travel agency project showcases the power of modern data processing technologies in transforming raw data into actionable insights. By leveraging cloud infrastructure, orchestration tools like Apache Airflow, and robust data storage solutions like Snowflake, the platform provides travel agencies with the ability to offer personalized recommendations to their clients. Through careful planning and execution, this project demonstrates how technology can enhance customer experiences in the travel industry.

This documentation serves as a guide for understanding the project's architecture, workflow, and implementation details. By following these guidelines, developers and stakeholders can ensure the successful deployment and operation of this data-driven travel agency platform.

About

Extracting the data from the API | Writing the extracted data to the Data Lake | Extracting the final required attributes to Data Warehouse.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published