Clone Vector Search

Overview

This service provides endpoints for handling and vectorizing S3 objects, and consume and populate an OpenSearch index, inspired by hexagonal architecture principles.

Project structure

service: Contains the third party services access logic.
usecase: Contains business logic layer.
controller: Contains the Flask API endpoint handlers. ⇧ back to top

Tech Stack

Python
Flask
boto3
Llama-Index ⇧ back to top

Installation

Clone the repository

git clone git@github.com:wizeline/clone-vector-search.git

Create a Python virtual environment (recommended):

python3 -m venv env 
source env/bin/activate

Install Dependencies:

pip install -r requirements.txt

⇧ back to top

Running the Service

Set Environment Variables (if applicable) in .env and .flaskenv files:
Create the opensearch index. The application will create the needed mapping.
In order to run this service locally, you'll need localstack in order to mock some AWS Services.
- Once you have localstack installed and running, create a clone-ingestion-messages bucket: aws --endpoint-url=http://localhost:4566 s3 mb s3://clone-ingestion-messages
- Add the required test files by running: aws --endpoint-url=http://localhost:4566 s3 cp /path/to/your/file/filename.json s3://clone-ingestion-messages/key/to/file.json
Start the Flask Server:

flask run

⇧ back to top

Opensearch index

An opensearch index is required for running this service. You can create the index with the following mapping:

// PUT /clone-vector-index 
{
    "aliases": {},
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "embedding": {
                "type": "knn_vector",
                "dimension": 384
            },
            "metadata": {
                "properties": {
                    "_node_content": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "_node_type": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "doc_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "document_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "file_uuid": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "processed_user": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "raw_text": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "ref_doc_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "source_name": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "twin_id": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "user_name": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            }
        }
    },
    "settings": {
        "index": {
            "replication": {
                "type": "DOCUMENT"
            },
            "number_of_shards": "1",
            "number_of_replicas": "1"
        }
    }
}

Building the Docker Image

docker compose up --build

⇧ back to top

Code Contribution

Ensure you adhere to the following conventions when working with code in the Clone Vector Search project:

Relate every commit to a ticket: If the commit is not related to a ticket, the branch name contains the related ticket.
Work on one feature for each PR: Do not crowd unrelated features in one PR.
Every line of code in your commits must be production-ready: Do not create incomplete, work-in-progress commits.
Ensure the branching strategy is simple:
- Create a feature branch and then merge it with the main branch.
- Do not create extra branches beside the feature or fix branches to merge with the main.
- Remove any feature or fix branches after you merge the changes.

⇧ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
core		core
localstack		localstack
.dockerignore		.dockerignore
.env.dist		.env.dist
.flake8		.flake8
.flaskenv		.flaskenv
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.py		config.py
docker-compose.yaml		docker-compose.yaml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clone Vector Search

Overview

Table of Contents

Project structure

Tech Stack

Installation

Running the Service

Opensearch index

Building the Docker Image

Code Contribution

About

Releases

Packages

Contributors 2

Languages

wizeline/clone-vector-search

Folders and files

Latest commit

History

Repository files navigation

Clone Vector Search

Overview

Table of Contents

Project structure

Tech Stack

Installation

Running the Service

Opensearch index

Building the Docker Image

Code Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages