Skip to content

Latest commit

 

History

History
156 lines (123 loc) · 7.14 KB

README.md

File metadata and controls

156 lines (123 loc) · 7.14 KB

Python PyPI version Downloads License

Evaluate your multimodal retrieval system in 3 lines of code.

🌟 Key Features

  • ✅ Load datasets and models with one line of code.
  • ✅ Built in support for Sentence Transformers, TIMM, BM25, and Transformers models.
  • ✅ Run benchmarks and get retrieval metrics like MRR, NormalizedDCG, Precision, Recall, HitRate, and MAP.
  • ✅ Visualize retrieval results to understand how your model is performing.
  • ✅ Combine retrieval results from multiple models using Reciprocal Rank Fusion (RRF).

🚀 Quickstart

Open In Colab Open In Kaggle

import xretrieval

metrics, results_df = xretrieval.run_benchmark(
    dataset="coco-val-2017",
    model_id="transformers/Salesforce/blip2-itm-vit-g",
    mode="text-to-text",
)
 Retrieval Metrics @ k=10 
┏━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Metric        ┃ Score  ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ MRR           │ 0.2358 │
│ NormalizedDCG │ 0.2854 │
│ Precision     │ 0.1660 │
│ Recall        │ 0.4248 │
│ HitRate       │ 0.4248 │
│ MAP           │ 0.2095 │
└───────────────┴────────┘

📦 Installation

From PyPI:

pip install xretrieval

From source:

pip install git+https://github.com/dnth/x.retrieval

🛠️ Usage

List datasets:

xretrieval.list_datasets()
                                     Available Datasets                                      
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Dataset Name                 ┃ Description                                                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ coco-val-2017                │ The COCO Validation Set with 5k images.                    │
│ coco-val-2017-blip2-captions │ The COCO Validation Set with 5k images and BLIP2 captions. │
│ coco-val-2017-vlrm-captions  │ The COCO Validation Set with 5k images and VLRM captions.  │
└──────────────────────────────┴────────────────────────────────────────────────────────────┘

List models:

xretrieval.list_models()
                         Available Models                         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Model ID                                         ┃ Model Input ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ transformers/Salesforce/blip2-itm-vit-g          │ text-image  │
│ transformers/Salesforce/blip2-itm-vit-g-text     │ text        │
│ transformers/Salesforce/blip2-itm-vit-g-image    │ image       │
│ xhluca/bm25s                                     │ text        │
│ sentence-transformers/paraphrase-MiniLM-L3-v2    │ text        │
│ sentence-transformers/paraphrase-albert-small-v2 │ text        │
│ sentence-transformers/multi-qa-distilbert-cos-v1 │ text        │
│ sentence-transformers/all-MiniLM-L12-v2          │ text        │
│ sentence-transformers/all-distilroberta-v1       │ text        │
│ sentence-transformers/multi-qa-mpnet-base-dot-v1 │ text        │
│ sentence-transformers/all-mpnet-base-v2          │ text        │
│ sentence-transformers/multi-qa-MiniLM-L6-cos-v1  │ text        │
│ sentence-transformers/all-MiniLM-L6-v2           │ text        │
│ timm/resnet18.a1_in1k                            │ image       │
└──────────────────────────────────────────────────┴─────────────┘

Run benchmarks:

results, results_df = xretrieval.run_benchmark_bm25("coco-val-2017-blip2-captions")

Visualize retrieval results:

xretrieval.visualize_retrieval(results_df)

alt text alt text

Run hybrid search with Reciprocal Rank Fusion (RRF):

results_df = xretrieval.run_rrf([results_df, results_df], "coco-val-2017")

See RRF notebook for more details.