Project developed for the MIRVC course of the Master of Artificial Intelligence and Data Engineering at the University of Pisa.
This project consists in the design and implementation of a Search Engine for MSMARCO dataset. Checkout the assignment and the report for all the information about the project.
To run this project you need to download in the main folder the MSMARCO dataset.
-
Import the project in Visual Studio/Visual Studio Code
-
Build the project using Cmake
-
Execute app.exe
- Install the required software
$ sudo apt-get install git cmake build-essential zlib1g-dev libboost-all-dev
- Download the source code
$ git clone --recursive https://github.com/edoardoruffoli/MSMARCO-Search-Engine
- Generate the build files
$ cd MSMARCO-Search-Engine
$ mkdir build && cd build
$ cmake ..
- Build
$ make
- Run
$ cd bin
$ ./app
*** Started MSMARCO Search Engine ***
Available commands:
help - display a list of commands
query - perform a query
eval - execute a queries dataset, saving the result file for trec_eval
index - create the inverted index
exit - exit the program
Enter a command:
>query
Enter the query execution mode:
0 : CONJUNCTIVE_MODE
1 : DISJUNCTIVE_MODE
2 : DISJUNCTIVE_MODE_MAX_SCORE
>2
Select how many documents return:
>10
Enter the query:
>manhattan project
Results for: "manhattan project"
The elapsed time was 15 milliseconds, 15293700 nanoseconds.
RESULTS:
Doc Id Score
2036644 4.31715
3870080 4.30079
2 4.29498
3615618 4.28213
2395250 4.27013
4404039 4.25136
3607205 4.23599
7243450 4.20026
3689999 4.1146
3870082 4.09159
The repository is organized as follows:
- apps/ contains the main of the programs
- docs/ contains the project report and the assignment
- evaluation/ contains the dataset used to evaluate the search engine with trec_eval
- include/ contains the header files
- src/ contains the source files
- tests/ contains the unit tests
- thirdparty/ contains the thirdparty dependencies
- Francesco Hudema @MrFransis
- Edoardo Ruffoli @edoardoruffoli