This project aims at the development of an ETL (Extract, Transform, Load) pipeline that fetches data about programming languages used by major corporations, including Amazon, Spotify, Netflix, and Apple. Through data extraction, transformation, and loading, we're able to gather, shape, and store pertinent information for subsequent analysis.
We employ Python's Requests library and the GitHub API to implement this ETL. The GitHub API provides a rich dataset we use to access information on the programming languages deployed in these companies' projects.
The notebook containing all developed codes can be found in the "notebook" folder. In the "classes" folder, you will find the same codes, but structured into Python classes to allow for their reuse.
Key features and functionalities of the project include:
- Data extraction with the GitHub API
- Status Codes Management
- Authentication
- Pagination
- Object-Oriented Programming (OOP)
This project was developed for a course I taught at Alura. You can access it by clicking on the link: Course's link
- Python;
- Requests library;
- GitHub API;
- Pandas library.
Email: millenagena@gmail.com