This set of benchmarks is meant to test USearch capabilities for Billion-scale vector search.
It provides an alternative to the ann-benchmarks
and the big-ann-benchmarks
which generally operate on much smaller collections.
The main objective is to understand the scaling laws of the USearch compared to FAISS.
Supplementary adapters for other popular systems is also available under index/
directory:
- Alternative HNSW implementations, like HNSWlib,
- Alternative CPU-based libraries, like SCANN,
- Vector Databases, like Qdrant, and Wevaite.
The primary dataset used for benchmarks is the Deep1B dataset of 1 Billion 96-dimensional vectors, totalling at 384 GB. Ground-truth nearest neighbors are provided to calculate the recall metrics.
First of all, we recommend creating a conda
environment to isolate the dependencies:
conda create -n usearch-benchmarks python=3.10
conda activate usearch-benchmarks
Then install dependencies, getting an MKL-accelerated version of FAISS library.
pip install usearch hnswlib scann lancedb qdrant-client weaviate-client psutil plotly kaleido
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl
To benchmark Qdrant, you need to run their Docker container:
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
Finally, download the Deep1B dataset:
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset
To run the ANN benchmarks pass a configuration file:
python run.py configs/usearch_1B.json 1B # Outputs stats/*.npz file
python utils/draw_plots.py # Exports tp plots/*.png