Skip to content

Learned Metric Index (LMI) is a machine learning based data structure for fast look-up of approximate nearest neighbors in complex data.


Notifications You must be signed in to change notification settings


Repository files navigation



Learned Metric Index (LMI) is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation.

Getting started

See examples of how to index and search in a dataset in: 01_Introduction.ipynb notebook.


Using virtualenv

# 1) Clone the repo with submodules 
git clone --recursive
# 2) Create and activate a new virtual environment
python -m venv lmi-env
source lmi-env/bin/activate
# 3) Install the dependencies
pip install -r requirements-cpu.txt # alternatively requirements-gpu.txt
pip install --editable .

Using docker


  • Docker
  • At least 1.5 gb disk space for the CPU and up to 5.5 gb for the GPU version
# 1) Clone the repo with submodules 
git clone --recursive
# 2) Build the docker image (CPU version)
docker build -t lmi -f Dockerfile --build-arg version=cpu .
# alternatively: docker build -t lmi -f Dockerfile --build-arg version=gpu .
# 3) Run the docker image
docker run -p 8888:8888 -it lmi bash


# Run jupyterlab, copy the outputted url into the browser and open 01_Introduction.ipynb
jupyter-lab --ip --no-browser

# Run the search on 100k data subset, evaluate the results and plot them.
# Expected time to run = ~5-10 mins
python3 search/ && python eval/ && python eval/ res.csv


LMI comprised of 1 ML model

  • Recall: 91.421%
  • Search runtime (for 10k queries): ~220s
  • Build time: 20828s
  • Dataset: LAION1B, 10M subset
  • Hardware used:
    • CPU Intel Xeon Gold 6130
    • 42gb RAM
    • 1 CPU core
  • Hyperparameters:
    • 120 leaf nodes
    • 200 epochs
    • 1 hidden layer with 512 neurons
    • 0.01 learning rate
    • 4 leaf nodes stop condition

Hardware requirements


  • 42gb RAM
  • 1 CPU core
  • ~6h of runtime (waries depending on the hardware)

LMI in action


"LMI Proposition" (2021):

M. Antol, J. Ol'ha, T. Slanináková, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. Information Systems, 2021 - Elsevier (2021)

"Data-driven LMI" (2021):

T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. SISAP 2021 - Similarity Search and Applications pp 81-94 (2021)

"LMI in Proteins" (2022):

J. Ol'ha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques, and Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques SISAP 2022 - Similarity Search and Applications pp 274-282 (2022)

"Reproducible LMI" (2023):

T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: Reproducible experiments with Learned Metric Index Framework. Information Systems, Volume 118, September 2023, 102255 (2023)

"LMI in a large (214M) protein database" (2024):

Procházka, D., Slanináková, T., Oľha, J., Rošinec, A., Grešová, K., Jánošová, M., Čillík, J., Porubská, J., Svobodová, R., Dohnal, V., & Antol, M. (2024). AlphaFind: discover structure similarity across the proteome in AlphaFold DB. Nucleic Acids Research.


🔎Complex data analysis research group