Testing models Elixir vs Python: reranking with cross-encoder "cross-encoder/ms-marco-MiniLM-L-6-v2" #8

ndrean · 2024-09-11T12:43:19Z

ndrean
Sep 11, 2024
Maintainer

We have a few queries and a list of documents and we will submit, for each query, a list of pairs {query, document}. The model should rank the documents that are most relevant to our query.

The Elixir code is based on this issue: elixir-nx/bumblebee#251

Bumblebee does not yet implement the architecture :for_sequence_classification.

Conclusion

The results are the same.
The conclusion is that Elixir is at least as good as Python. Note that we needed to exhibit a tokenizer for this cross-encoder

Queries and documents

I asked ChatGPT to generate queries and documents to test the result of the reranking. The input:

queries = [
    "What is the capital of France?",  # Factual query
    "What is the capital of Germany?",  # Similar but different query
    "What is the Eiffel Tower?",  # Entity-related query
    "What is not the capital of France?",  # Negation query
    "Where is London located?",  # Geographical location query
]

documents = [
    # Highly relevant
    "Paris is the capital of France.",  # Directly answers the question
    "The capital city of France is Paris.",  # Rephrased but still relevant
    "Paris, the largest city in France, is its capital.",  # Adds context but remains relevant
    
    # Somewhat relevant
    "The Eiffel Tower is one of Paris' most famous landmarks.",  # Paris-related but not about the capital
    "Paris is known for its museums and cultural landmarks.",  # Related to Paris but not directly answering
    
    # Slightly misleading but still connected
    "London is a global city, and the capital of the UK.",  # Related to capitals, but not relevant to the query
    "France is a country in Europe with Paris as a key tourist destination.",  # Mention of Paris but indirect relevance
    
    # Irrelevant
    "Berlin is the capital of Germany.",  # Different country and unrelated
    "Spain's capital, Madrid, is famous for its architecture.",  # Completely irrelevant country
    "New York City is one of the most populous cities in the USA.",  # Entirely unrelated to the query
    
    # Misleading/Confusing information
    "The capital of France was once London during World War II.",  # Incorrect or misleading historical claim
    "Paris is the capital of Italy.",  # Deliberately false statement
    
    # Edge cases (minimal or abstract information)
    "France is a country.",  # Vague, doesn't address the capital at all
    "There are many capital cities in Europe.",  # General statement about capitals without specifics
    "Paris is a word with many meanings.",  # Abstract, contextless statement
    "The sky in Paris is often overcast.",  # Completely off-topic but mentions Paris
]

Python running the model

I consider Python as the "source of truth". In a Jupyter notebook:

from sentence_transformers import CrossEncoder
import numpy as np
import time

Sequential run

def rerank(query, documents, first=3, model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"):
    # Load the cross-encoder model
    model = CrossEncoder(model_name)
    
    # Prepare input pairs: (query, document)
    input_pairs = [[query, doc] for doc in documents]
    
    # Compute the scores
    scores = model.predict(input_pairs)
    
    # Sort the results based on the scores and keep only the top 'first' results
    reranked_results = [x for _, x in sorted(zip(scores, documents), key=lambda pair: pair[0], reverse=True)[:first]]
    reranked_scores = sorted(scores, reverse=True)[:first]
    
    return reranked_results, reranked_scores

start = time.time()
first = 3  # Number of top results to keep
for query in queries:
    reranked, scores = rerank(query, documents, first)
    print(f"Query: {query}")
    for doc, score in zip(reranked, sorted(scores, reverse=True)):
        print(f"Score: {score:.4f} - {doc}")
print(f"time: {time.time()-start:.4f}")

Sequential Python Results

Query: What is the capital of France?
    Score: 8.9898 - The capital city of France is Paris.
    Score: 8.5007 - Paris is the capital of France.
    Score: 7.7378 - Paris, the largest city in France, is its capital.

Query: What is the capital of Germany?
    Score: 8.6557 - Berlin is the capital of Germany.
    Score: -3.3820 - There are many capital cities in Europe.
    Score: -3.4038 - Paris is the capital of Italy.

    Query: What is the Eiffel Tower?
    Score: 9.3518 - The Eiffel Tower is one of Paris' most famous landmarks.
    Score: -8.0965 - Paris is known for its museums and cultural landmarks.
    Score: -8.2520 - Paris is the capital of Italy.

Query: What is not the capital of France?
    Score: 6.3721 - The capital city of France is Paris.
    Score: 5.8387 - Paris is the capital of France.
    Score: 5.2583 - Paris, the largest city in France, is its capital.

Query: Where is London located?
    Score: 5.5550 - London is a global city, and the capital of the UK.
    Score: -1.8151 - The capital of France was once London during World War II.
    Score: -7.4113 - Paris, the largest city in France, is its capital.

time: 3.3038

Concurrent run

from concurrent.futures import ThreadPoolExecutor, as_completed

def rerank_concurrent(queries, top_k_results, max_workers=8):
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Submit the rerank task for each query
        futures = {executor.submit(rerank, query, top_k_results): query for query in queries}
        
        for future in as_completed(futures):
            query = futures[future]
            try:
                reranked, scores = future.result()
                print(f"Query: {query} - Reranked results:")
                for score, doc in zip(scores, reranked):
                    print(f"Score: {score:.4f} - {doc}")
            except Exception as exc:
                print(f"Query {query} generated an exception: {exc}")

start = time.time()
first = 3  # Number of top results to keep
rerank_concurrent(queries, top_k_results, first)
print(f"time: {time.time()-start:.4f}")

Concurrent Python results

... same results, order of arrival may be different

time: 1.2592

Elixir running the code

Sequential run

defmodule CrossEncoder do
  @first 3
  
  def load_model do
    repo= {:hf, "cross-encoder/ms-marco-MiniLM-L-6-v2"}
    tokenizer = {:hf, "bert-base-uncased"}
    {:ok, model_info} = Bumblebee.load_model(repo)
    {:ok, tokenizer} = Bumblebee.load_tokenizer(tokenizer)

    {model_info, tokenizer}
  end

  def rerank(documents, query) do
    # Prepare input pairs for cross-encoder
    {model_info, tokenizer} = load_model()
    input_pairs = 
     Bumblebee.apply_tokenizer(tokenizer, 
       Enum.map(documents, fn doc -> 
         {query, doc}
       end)
     )

    # Run cross-encoder in batches
    outputs = Axon.predict(model_info.model, model_info.params, input_pairs)
     

    # Combine scores with original documents and sort
    Enum.zip(documents, outputs.logits |> Nx.to_flat_list())
    |> Enum.sort_by(fn {_, score} -> score end, :desc)
    |> Enum.map(fn {doc, score} -> {Float.round(score,4), doc.content} end)
    |> Enum.take(@first)
    |> IO.inspect()
  end
end

CrossEncoder.load_model()

:timer.tc(fn -> 
  for query <- queries do 
    {query, CrossEncoder.rerank(top_k_results, query)} |> IO.inspect()
end, :millisecond)

Sequential Elixir results

{2857,
 [
   {"What is the capital of France?",
    [
      {8.9898, "The capital city of France is Paris."},
      {8.5007, "Paris is the capital of France."},
      {7.7378, "Paris, the largest city in France, is its capital."}
    ]},
   {"What is the capital of Germany?",
    [
      {8.6557, "Berlin is the capital of Germany."},
      {-3.382, "There are many capital cities in Europe."},
      {-3.4038, "Paris is the capital of Italy."}
    ]},
   {"What is the Eiffel Tower?",
    [
      {9.3518, "The Eiffel Tower is one of Paris' most famous landmarks."},
      {-8.0965, "Paris is known for its museums and cultural landmarks."},
      {-8.252, "Paris is the capital of Italy."}
    ]},
   {"What is not the capital of France?",
    [
      {6.3721, "The capital city of France is Paris."},
      {5.8387, "Paris is the capital of France."},
      {5.2583, "Paris, the largest city in France, is its capital."}
    ]},
   {"Where is London located?",
    [
      {5.555, "London is a global city, and the capital of the UK."},
      {-1.8151, "The capital of France was once London during World War II."},
      {-7.4113, "Paris, the largest city in France, is its capital."}
    ]}
 ]}

Elixir Concurrent run

{t, :ok } = 
  :timer.tc(fn -> 
    Task.async_stream(queries, fn 
      query -> {query, CrossEncoder.rerank(top_k_results, query)} 
    end)
    |> Enum.each(fn {_query, results} ->  IO.inspect(results) end)
  end)

# Float.round(t/1000_000, 4)

Elixir concurrent results

0.8915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing models Elixir vs Python: reranking with cross-encoder "cross-encoder/ms-marco-MiniLM-L-6-v2" #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Testing models Elixir vs Python: reranking with cross-encoder "cross-encoder/ms-marco-MiniLM-L-6-v2" #8

ndrean Sep 11, 2024 Maintainer

Conclusion

Queries and documents

Python running the model

Sequential run

Sequential Python Results

Concurrent run

Concurrent Python results

Elixir running the code

Sequential run

Sequential Elixir results

Elixir Concurrent run

Elixir concurrent results

Replies: 0 comments

ndrean
Sep 11, 2024
Maintainer