Data Producer API: FastAPI application for sending data to Pub/Sub, used for load testing and triggering pipelines
- Cloud Build: Continuous Integration and Continuous Deployment (CI/CD) service provided by GCP integrated with GitHub Actions
- Artifact Registry: Private Docker container registry provided by GCP for storing FastAPI image.
- Cloud Run: Fully managed serverless container runtime provided by GCP for running FastAPI Application.
- Pub/Sub: Messaging service provided by GCP for sending and receiving messages between FastAPI and Dataflow pipeline.
See the following docs:
-
Hexagonal Architecture: Adoption of Hexagonal Architecture to decouple the core logic from external dependencies, ensuring that any current data source can be replaced seamlessly in case of unavailability. This is facilitated by the use of adapters, which act as intermediaries between the core application and the external services.
-
Comprehensive Testing: Development of tests to ensure the quality and robustness of the code at various stages of the ETL process
-
Fire-Forget Messaging: Use of messaging (Cloud PubSub) in the fire-forget model to manage files generated between the transformation and loading stages, ensuring a continuous and efficient data flow.
-
Configuration Management: Use of a configuration module to manage project_id and others env variables, providing flexibility and ease of adjustment.
-
Continuous Integration and Continuous Deployment: Use of CI/CD pipelines to automate the build, test and deployment processes, ensuring that the application is always up-to-date and ready for use.
-
Code Quality: Use of code quality tools such as linters and formatters to ensure that the codebase is clean, consistent and easy to read.
-
Documentation: Creation of detailed documentation to facilitate the understanding and use of the application, including installation instructions, usage examples and troubleshooting guides.