👷 This project is WIP - and a playground project for myself.
The vision: a service to search contents from your documents (EPUB, PDF or any text files)
There are 2 big business logic flows:
- extracting the content from the user's documents, and save it in searchable ways
- enabling, for the user, a fast search in their documents given a query (using full-text search and/or semantic search for ex)
Having said this, the project is not usable currently.
Current infra:
- RabbitMQ for the message queue
- PostgreSQL for the relational database
- MinIO for the S3-compatible object storage
- Meilisearch for the full-text search
- Qdrant for the vector database
What has been done:
- : REST gateway service to handle requests from the users:
rest_gateway
- : services to extract contents:
content_ingestion_worker
(name need to change) - : service to handle full-text search:
fulltext_search_service
- : service to handle semantic search:
embedding_worker
(name need to change) - : communication between services using a message broker (RabbitMQ): either messages representing queued jobs or RPC requests
- : authentication based on JWT token
The current work:
- : Replace RabbitMQ by Kafka (for the queue job) and gRPC (for the RPC requests)
- : Implement a more Hexagonal/Clean architecture in Rust
- : A diagram explaining the new backend architecture
- : Re-work of the semantic search service
- : Improve the content extraction: better handle text encodings, enabling reading PDF with OCR (and not just with the PDF encoded content)
There are several environments depending on where/how you want to deploy the services and workers:
develop
: not containerized, locally on your machinelocal
: containerized, locally on your machineproduction
: containerized, in production
To run with different logs: (sqlx
logs are a bit spammy, cutting them out to reduce noise)
RUST_LOG="sqlx=error,info" TEST_LOG=enabled cargo test <a_test> | bunyan
For each test, a new database is created (to enforce isolation).
The name of each database will be: test_<%Y-%m-%d_%H-%M-%S>_<randomly generated UUID>
- I have learnt a lot about REST backend system in Rust thanks to Luca Palmieri's book: Zero To Production In Rust