Data Consumer API Project: ETL Process for Currency Quotes Data

Code Coverage KPI Graph

Project Stack

Project description

ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.

Contributing

See the following docs:

Project Highlights:

MVC Architecture: Implementation of the Model-View-Controller (MVC) architecture, separating business logic, user interface and data manipulation for better organization and code maintenance.
Comprehensive Testing: Development of tests to ensure the quality and robustness of the code at various stages of the ETL process
Parallelism in Models: Use of parallelism in the data transformation and loading stages, increasing efficiency and reducing processing time.
Fire-Forget Messaging: Use of messaging (queue.queue) in the fire-forget model to manage files generated between the transformation and loading stages, ensuring a continuous and efficient data flow.
Parameter Validation: Sending valid parameters based on the request data source itself, ensuring the integrity and accuracy of the information processed.
Configuration Management: Use of a configuration module to manage endpoints, retry times and number of attempts, providing flexibility and ease of adjustment.
Common Module: Implementation of a common module for code reuse across the project, promoting consistency and reducing redundancies.
Dynamic Views: Generation of views with index.html using nbConvert, based on consolidated data from a Jupyter Notebook that integrates the generated files into a single dataset for exploration and analysis.

ETL Process:

Extraction: A single request is made to a specific endpoint to obtain quotes from multiple currencies.
Transformation: The request response is processed, separating each currency quote and storing it in individual files in Parquet format, facilitating data organization and retrieval.
Upload: Individual Parquet files are consolidated into a single dataset using a Jupyter Notebook, allowing for comprehensive analysis and valuable insights into currency quotes.

In summary, this project offers a robust and efficient solution for collecting, processing and analyzing currency quote data, using advanced architecture and parallelism techniques to optimize each step of the ETL process.

Repository structure

data/: Stores raw data in Parquet format.
- ETH-EUR-1713658884.parquet: Example: Raw data for ETH-EUR quotes. file_name = symbol + extraction unix timestamp
notebooks/: Contains the data_explorer.ipynb notebook for data exploration.
etl/: Contains the project's source code.
- run.py: Entrypoint of the application
common/: Library for code reuse and standardization.
- utils/
  - logs.py: Package for log management.
- common.py: Package for common code tasks like output directory retrieval or default timestamp.
- logs/: For storing debug logs.
controller/
- pipeline.py: Receives data extraction requests and orchestrates ETL models .
models/:
- extract/
  - api_data_extractor.py: Receives the parameters from the controller, sends the request and returns in JSON.
- transform/
  - publisher.py: Receives the JSON from the extractor, separates the dictionary by currency and publishes each of them to a queue to be processed individually.
- load/
  - parquet_loader.py: In a separate thread, receive a new dictionary from queue that the transformer is publishing and generates .parquet files in the default directory.
views/: For storing data analysis and visualization.

How to run the application locally

Step by Step

Ensure Python 3.10 or higher is installed on your machine

Clone the repository:

$ git clone https://github.com/ivdatahub/data-consumer-api.git

Go to directory

$ cd data-consumer-api

Install dependencies and execute project

$ poetry install && poetry run python etl/run.py

Learn more about poetry

ETL and Data Analysis Results:

You can see the complete data analysis, the Jupyter Notebook is deployed in GitHub Pages

Name		Name	Last commit message	Last commit date
Latest commit History 603 Commits
.github		.github
data		data
docs		docs
etl		etl
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
codecov.yml		codecov.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Consumer API Project: ETL Process for Currency Quotes Data

Code Coverage KPI Graph

Project Stack

Project description

Contributing

Project Highlights:

ETL Process:

Step by Step

ETL and Data Analysis Results:

About

Releases 23

Sponsor this project

Packages

Contributors 4

Languages

License

ivanildobarauna-dev/data-consumer-api

Folders and files

Latest commit

History

Repository files navigation

Data Consumer API Project: ETL Process for Currency Quotes Data

Code Coverage KPI Graph

Project Stack

Project description

Contributing

Project Highlights:

ETL Process:

Step by Step

ETL and Data Analysis Results:

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 23

Sponsor this project

Packages 0

Contributors 4

Languages

Packages