- The user sends a request to the application hosted on AWS Amplify.
- Amplify integrates with the backend API Gateway.
- Users can upload files to the application, which are stored in an S3 bucket using a pre-signed upload URL.
- Adding a new file to the S3 bucket triggers the data ingestion workflow, with a lambda function extracting the metadata (the number of pages in the document and its size).
- The lambda function updates the Documents table in DynamoDB with the document's data and pushes a queue in Amazon SQS.
- Another lambda function picks the message from the queue and processes the document, splitting it into multiple documents
- The embeddings lambda generates the vector embeddings using Amazon Bedrock embedding models
- The lambda function stores the vectors in the PostgreSQL database.
- After the document is successfully processed, users can start chatting with it by sending an API request that invokes the lambda function to generate a response.
- This lambda function uses RAG architecture to retrieve the response from LLMs hosted on Amazon Bedrock augmented with the documents' information stored in the vector database.
- The application also supports reading the list of documents and their metadata and deleting them.
Column Name | Description |
---|---|
uuid | The uuid of the collection |
name | The name of the collection |
cmetadata | The metadata of the collection |
Column Name | Description |
---|---|
id | The ID of the embeddings |
collection_id | The uuid of the collection |
embedding | The vector embeddings of the document |
cmetadata | The metadata of the collection |
document | The content of the document |
Column Name | Description |
---|---|
userid | The ID of the user uploading the document |
documentid | The id of the document |
filename | The name of the document |
pages | Number of pages in the document |
filesize | The size of the document |
document_split_ids | The embedding IDs of the document |
conversations | The list of conversations with the document |
docstatus | The processing status of the document |
created | The time the document was processed |
Column Name | Description |
---|---|
session_id | The uuid of the chat session |
history | The list of the messages in the chat session |
.
├── user1/
│ └── document1.pdf
└── user2/
├── document1.pdf
└── document2.pdf