After a long stale year full of no changes. I've merged many pull requests and made changes to the BEIR code. You can find the latest changes mentioned here below:
1. Heap Queue for keeping track of top-k documents when evaluating with dense retrieval.
Thanks to @kwang2049, starting from v2.0.0, we include a heap queue for keeping track of top-k documents when using the DenseRetrievalExactSearch
class module. This considerably reduces the RAM consumed, especially during the evaluation of large corpora such as MS MARCO or BIOASQ.
The logic remains the same for keeping track of elements during the chunking of the collection.
- If your
heapq
is less thank
size, push the item, i.e. document into the heap. - If your
heapq
is at maxk
size, if the item is larger than the smallest item in the heap, push the item on the heap and then pop the smallest element.
2. Removed all major typing errors from the BEIR code.
We removed all typing errors from the BEIR code as we implemented an abstract base class for search. The base class function will take in the corpus, queries, and a top-k value. We return the results, where you would have query_id
and corresponding doc_id
and score
.
class BaseSearch(ABC):
@abstractmethod
def search(self,
corpus: Dict[str, Dict[str, str]],
queries: Dict[str, str],
top_k: int,
**kwargs) -> Dict[str, Dict[str, float]]:
pass
Example: evaluate_sbert_multi_gpu.py
3. Updated Faiss Code to include GPU options.
I added the GPU option with FaissSearch
base class. Using the GPU can reduce latency immensely. However, sometimes it takes time to transfer the faiss index from CPU to GPU. Pass the use_gpu=True
parameter in the DenseRetrievalFaissSearch
class to use GPU for faiss inference with PQ, PCA, or with FlatIP Search.
4. New publication -- Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.
We have a new publication, where we describe our official leaderboard hosted on eval.ai and provide reproducible reference models on BEIR using the Pyserini Repository (https://github.com/castorini/pyserini).
Link to the arxiv version: https://arxiv.org/abs/2306.07471
If you use numbers from our leaderboard, please cite the following paper:
@misc{kamalloo2023resources,
title={Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard},
author={Ehsan Kamalloo and Nandan Thakur and Carlos Lassance and Xueguang Ma and Jheng-Hong Yang and Jimmy Lin},
year={2023},
eprint={2306.07471},
archivePrefix={arXiv},
primaryClass={cs.IR}
}