Skip to content

Models available

Nandan Thakur edited this page Jun 29, 2022 · 2 revisions

🍻 Available Models

Name Implementation
BM25 (Robertson and Zaragoza, 2009) https://www.elastic.co/
Anserini (Yang et al., 2017) https://github.com/castorini/anserini
SBERT (Reimers and Gurevych, 2019) https://www.sbert.net/
ANCE (Xiong et al., 2020) https://github.com/microsoft/ANCE
DPR (Karpukhin et al., 2020) https://github.com/facebookresearch/DPR
USE-QA (Yang et al., 2020) https://tfhub.dev/google/universal-sentence-encoder-qa/3
SPARTA (Zhao et al., 2020) https://huggingface.co/BeIR
ColBERT (Khattab and Zaharia, 2020) https://github.com/stanford-futuredata/ColBERT

How to load different models available in BEIR?

We include different retrieval architectures and evaluate them all in a zero-shot setup.

Lexical Retrieval Evaluation using BM25 (Elasticsearch)

from beir.retrieval.search.lexical import BM25Search as BM25

hostname = "your-hostname" #localhost
index_name = "your-index-name" # scifact
initialize = True # True, will delete existing index with same name and reindex all documents
model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)

Sparse Retrieval using SPARTA

from beir.retrieval.search.sparse import SparseSearch
from beir.retrieval import models

model_path = "BeIR/sparta-msmarco-distilbert-base-v1"
sparse_model = SparseSearch(models.SPARTA(model_path), batch_size=128)

Dense Retrieval using SBERT, ANCE, USE-QA or DPR

from beir.retrieval import models
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

model = DRES(models.SentenceBERT("msmarco-distilbert-base-v3"), batch_size=16)
retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot-product

Reranking using Cross-Encoder model

from beir.reranking.models import CrossEncoder
from beir.reranking import Rerank

cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-electra-base')
reranker = Rerank(cross_encoder_model, batch_size=128)

# Rerank top-100 results retrieved by BM25
rerank_results = reranker.rerank(corpus, queries, bm25_results, top_k=100)

Disclaimer

If you use any one of the implementations, please make sure to include the correct citation.

If you implemented a model and wish to update any part of it, or do not want the model to be included, feel free to post an issue here or make a pull request!

If you implemented a model and wish to include your model in this library, feel free to post an issue here or make a pull request. Otherwise, if you want to evaluate the model on your own, see the following section.