Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold
sudo apt install default-jre
pip install beir
pip install pandas
pip install sklearn
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm
pip install ipywidgets
wget https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz
sha512sum spark-3.4.0-bin-hadoop3.tgz
tar -xzf spark-3.4.0-bin-hadoop3.tgz
Follow this tutorial
Enter in the app folder and run
python main.py