-
This application allows you to get some information and download images from provided array of websites. This project tries to use asynchronous functions, libraries, and tools (
aiohttp
,aiokafka
,asyncio
,aiobotocore
,asyncpg
) where it is possible - to speed up parsing. -
There are two main microservices -
web
to interact with client andparser
- to parse data for webpages (as for now - get html length) and upload webpage images to minio storage. Microservices communicate with each other via kafka topics (both microservices have producer and consumer). You could monitor kafka cluster by usingprovectuslabs/kafka-ui
dashboard. -
For each of website - there will be created a row in Postgresql database using asyncpg driver. You will get website data as well as parsing status (which will be
UPDATED
to different statusespending
->in_progress
->finished/failed
while parsing). -
All database connections could be made by
web
microservice only, while bothweb
andminio
interact with minio. You will be able toPOST
website entity (which will trigger data parsing as well),GET
info for each website (with updating data) andDELETE
website entities (all minio data for this website will be deleted as well). -
After uploading images to minio you could request S3
presigned urls
to be able to download this images.
-
Make sure that you have installed the latest versions of
python
andpip
on your computer. Also, you have to install Docker and Docker Compose.Note: each microservice -
parser
andweb
has ownDockerfile
and.dockerignore
with appropriate build stages. -
This project by default uses poetry for dependency and virtual environment management. Make sure to install it too.
Note: each microservice -
parser
andweb
has own poetry files and dependencies specified. -
Make sure to provide all required environment variables (via
.env
file,export
command, secrets, etc.) before running application.Note: each microservice -
parser
andweb
should have own .env/secrets variables specified.
-
For managing pre-commit hooks this project uses pre-commit.
-
For import sorting this project uses isort.
-
For code format checking this project uses black.
-
For type checking his project uses mypy
-
For create commits and lint commit messages this project uses commitizen. Run
make commit
to use commitizen during commits. -
There is special
build_dev
stage in Docker file to build dev version of application image. -
Because there are two separate microservices, all
pre-commit
and Dockertest
build stage checks run both forparser
andweb
microservices from repo root. -
New application version should be specified in
web/version.txt
file to updateweb
microservice openapi documentation.
-
This project involves github actions to run all checks and unit-tests on
push
to remote repository. -
There will be two jobs running in one workflow - for
parser
andweb
microservices (built from each of directories separately viastrategy.matrix
).
There are lots of useful commands in Makefile
included into this project's repo. Use make <some_command>
syntax to run each of them.
If your system doesn't support make commands - you may copy commands from Makefile
directly into terminal.
Note: there are many commands that will perform actions both for parser
and web
microservice. Even so, all Makefile commands
should be run from repo root directory only.
-
To install all the required dependencies and set up a virtual environment run in the cloned repository directory use:
poetry install
You can also install project dependencies using
pip install -r requirements.txt
from repo root directory.Note: this command will install ALL dependencies for the project - both for
parser
andweb
microservice. Separate dependencies will be installed automatically during Docker image build (or github actions run). -
To config pre-commit hooks for code linting, code format checking and linting commit messages run in the cloned directory:
poetry run pre-commit install
-
Build app images (for
parser
andweb
) usingmake build
To build reloadable application locally use
make build_dev
to build images in development environment. -
Run all necessary Docker containers together using
make up
Containers will start depending on each other and considering health checks.
Note: this will also create and attach persistent named volume
logs
for Docker containers. Containers will use this volume to store applicationapp.log
file. -
Stop and remove Docker containers using
make down
If you also want to remove log volume use
make down_volume
-
For managing migrations this project uses alembic.
-
Dockerfile for
web
microservice already includesalembic upgrade head
command to run all revision migrations, required by current version of application. -
Run
make upgrade
to manually upgrade database tables state. You could also manually upgrade to specific revision with.py
script (fromweb/alembic/versions/
) by running:alembic upgrade <revision id number>
-
You could also downgrade one revision down with
make downgrade
command, to specific revision - by runningalembic downgrade <revision id number>
, or make full downgrade to initial database state with:make downgrade_full
-
By default, web application will be accessible at http://localhost:8080, minio storage console - at http://localhost:9001, database - at http://localhost:5432, kafka cluster UI - at http://localhost:9093. You can try all endpoints with SWAGGER documentation at http://localhost:8080/docs
Note:
parser
microservice will run at http://localhost:8081 but user don't need to interact with it directly. -
Make sure to create minio bucket (specified in your .env/secrets) before interaction with web application resources.
-
Use
/websites
resource withPOST
method to create database entities for each URL and start parsing. Created entities with ids will return in response body. -
Use
/websites/{website_id}
resource withGET
orDELETE
method to get and delete row in database respectivelly. Use get to monitor status of parsing and data updates. Delete also clears minio storage objects associated with URL (website URL is used as prefix for picture keys of this webpage) -
Use
/websites/{website_id}/picture_links
to get website database entity with generated S3 presigned URLs array in response body. You could use some tool (e.g. POSTMAN, wget, curl, etc.) to download this images via generated URL. URL will expire after 5 minutes.
-
Description of all project's endpoints and API may be viewed without running any services from
documentation/openapi.yaml
file -
You can update web/documentation/openapi.yaml documentation for API at any time by using
make openapi
command. -
All warnings and info messages will be shown in container's stdout and saved in
web.log
andparser.log
files. -
To minimize chances to be blocked
parser
uses custom user-agents for request headers fromparser/documentation/user_agents.txt
file.
-
Use
make test
to build test images forparser
andweb
microservices and run all linters checks and unit-tests for each of them. -
After all tests coverage report will be also shown.
-
Staged changes will be checked during commits via pre-commit hook.
-
All checks and tests (both for
parser
andweb
microservices) will run on code push to remote repository as part of github actions.