This project outlines the steps needed to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. This pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation. The goal is to consolidate, clean, transform and store large volumes of taxi trip data in a Snowflake database and create a dashboard for visualising insights from the data.
If you find this project useful, kindly consider giving it a star ⭐ on GitHub.
-
Clone the Repository:
git clone https://github.com/nafisalawalidris/NYC_Taxi_Data_Pipeline.git cd NYC_Taxi_Data_Pipeline
-
Create and Activate a Virtual Environment:
python -m venv nyc_taxi_env .\nyc_taxi_env\Scripts\Activate # On Windows source nyc_taxi_env/bin/activate # On macOS/Linux
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Scripts:
- Follow the instructions in the scripts to extract, transform and load the data.
Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.
For any inquiries or suggestions, please contact Nafisa Lawal Idris.