Skip to content

Repository for Challenge - 3 (Algorithms) project made during Microsoft Engage 2022 by Aaheli Paul

Notifications You must be signed in to change notification settings

aaheli-paul/Movie-Recommendation-Engine-Engage-2022-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie-Recommendation-Engine-Engage-2022-Project

Repository for Challenge - 3 (Algorithms) project made during Microsoft Engage 2022

Generic badge Generic badge Generic badge ForTheBadge uses-git


Challenge - 3 : ALGORITHMS


Problem Description :


Sorting Algorithms play an important role in recommendation engines. By the end of the project, the following questions should be answered : - What role is played by sorting algorithms in recommendation engine. - Which sorting algorithm is used in this project and why?

In this project, i have implemented Recommendation Engine for Movies.



Answering the questions :


Different approaches, choosing an approach and why.

To understand the role of sorting algorithms and make a choice, one should know the different types of filtering algorithms present. They are:

  1. Content-based filtering - In this, content is recommended to a user based on the past content-interaction of the same user.
  2. Collaborative filtering - In this, content is recommended to a user based on the similarity of that user's content-interaction to another user's content-interaction. Users with similar activities are recommeded similar contents.
  3. Hybrid filtering - This is a combination of Content-based and Collaborative filtering.

My objective was to implement an approach that would be :

  • relevant to the user (content similarity)
  • avoid cold start to the problem Therefore, content-based filtering approach has been used in this project.



Selecting the dataset :


Link to the dataset: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/discussion?select=tmdb_5000_movies.csv

The datasets are also available with this repo, in a folder titled Datasets

The following were the factors kept in mind while selecting the dataset :

  • Relevant and useful data
  • Different and diverse attributes (to facilitate content-based filtering approach)
  • Manageable computational load

Project Flow :

  1. Dataset Analysis
  2. Data Pre-processing
  3. Model Building (using text vectorization and cosine similarity)
  4. Model Testing
  5. Establishing web connection (using streamlit)




Getting Started


To install and run the project on your local system, following are the requirements:

Prerequisites

Make sure you have the following libraries installed in your python environment, using the following commands :

  pip install ast
  pip install nltk
  pip install pickle
  pip install streamlit

After downloading source code files from this repo, perform the following steps:

  1. Open get_recommendation.ipynb jupyter notebook file and change the location of datasets in the following visible lines of code :
  movies_df = pd.read_csv('C:/Users/Aaheli Paul/Movie-Recommendation-Engine-Engage-2022-Project/Datasets/tmdb_5000_movies.csv')
  credits_df = pd.read_csv('C:/Users/Aaheli Paul/Movie-Recommendation-Engine-Engage-2022-Project/Datasets/tmdb_5000_credits.csv')

  1. Run and execute the get_recommendation.ipynb jupyter notebook file or run the following command on command prompt:
  python get_recommendation.ipynb

After completing the execution of this file, there will be two files downloaded to the main folder : movie_list.pkl, similarity.pkl

These files will be used during the execution of app.py file.


  1. After entering the source code folder, run the following command on command prompt, to locally host the webpage
  streamlit run app.py

GitHub repo size GitHub code size in bytes GitHub top language

About

Repository for Challenge - 3 (Algorithms) project made during Microsoft Engage 2022 by Aaheli Paul

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published