Information Retrieval is the process through which a computer system can respond to a user's query for text-based information on a specific topic. IR was one of the first and remains one of the most important problems in the domain of natural laguague processing (NLP) - stanford cs276
This Search Engine gives the result information about the song based on the relevance of the query about the lyrics provided by the user.
The system supports users to search for songs based on a query from the lyrics.
We build an appilcation with similar idea with Shazam, MusixMatch
The database we use for this retrieval model is from Song Lyrics Dataset on Kaggle.
This dataset contains lyric of songs by various artists. Thanks to The Author for creating this dataset, and for inspiring us to make this project.
- numpy
- pandas
- re
- pickle
- json
- nltk
- rank_bm25
Install all packages with the line: pip install -r requirements.txt
After installing the NLTK package, please do install NLTK Data for specific functions to work. Following this command in your terminal:
python
import nltk
nltk.download('popular')
We deployed our application to Streamlit framework for demo purposes of our project.
To run it, firstly, install the environment according to the requirements section above.
Then, run with the line: streamlit run music-retrieval.py
Or without using Streamlit framework, you can run with jupyter notebook file:
jupyter notebook Information_Retrieval.ipynb
Remember to load the data file music_data.csv to be able to perform the next operations.
You can use your own custom music database by creating a file with the same structure as our data file music_data.csv.