Repository for Challenge - 3 (Algorithms) project made during Microsoft Engage 2022
Sorting Algorithms play an important role in recommendation engines. By the end of the project, the following questions should be answered : - What role is played by sorting algorithms in recommendation engine. - Which sorting algorithm is used in this project and why?
In this project, i have implemented Recommendation Engine for Movies.
Different approaches, choosing an approach and why.
To understand the role of sorting algorithms and make a choice, one should know the different types of filtering algorithms present. They are:
- Content-based filtering - In this, content is recommended to a user based on the past content-interaction of the same user.
- Collaborative filtering - In this, content is recommended to a user based on the similarity of that user's content-interaction to another user's content-interaction. Users with similar activities are recommeded similar contents.
- Hybrid filtering - This is a combination of Content-based and Collaborative filtering.
My objective was to implement an approach that would be :
- relevant to the user (content similarity)
- avoid cold start to the problem Therefore, content-based filtering approach has been used in this project.
Link to the dataset: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/discussion?select=tmdb_5000_movies.csv
The datasets are also available with this repo, in a folder titled Datasets
The following were the factors kept in mind while selecting the dataset :
- Relevant and useful data
- Different and diverse attributes (to facilitate content-based filtering approach)
- Manageable computational load
- Dataset Analysis
- Data Pre-processing
- Model Building (using text vectorization and cosine similarity)
- Model Testing
- Establishing web connection (using streamlit)
To install and run the project on your local system, following are the requirements:
Make sure you have the following libraries installed in your python environment, using the following commands :
pip install ast
pip install nltk
pip install pickle
pip install streamlit
After downloading source code files from this repo, perform the following steps:
- Open get_recommendation.ipynb jupyter notebook file and change the location of datasets in the following visible lines of code :
movies_df = pd.read_csv('C:/Users/Aaheli Paul/Movie-Recommendation-Engine-Engage-2022-Project/Datasets/tmdb_5000_movies.csv')
credits_df = pd.read_csv('C:/Users/Aaheli Paul/Movie-Recommendation-Engine-Engage-2022-Project/Datasets/tmdb_5000_credits.csv')
- Run and execute the get_recommendation.ipynb jupyter notebook file or run the following command on command prompt:
python get_recommendation.ipynb
After completing the execution of this file, there will be two files downloaded to the main folder : movie_list.pkl, similarity.pkl
These files will be used during the execution of app.py file.
- After entering the source code folder, run the following command on command prompt, to locally host the webpage
streamlit run app.py