A Content-based movie recommendation application that suggests movies based on content similarity and provides additional details like year, IMDb rating, runtime, and poster, utilizing the OMDb API. The project combines data pre-processing, machine learning, and web development, and has been deployed live using Flask.
- Project Overview
- Data Preprocessing & Model Training
- Application Architecture
- Flask Integration & Deployment
- API Usage & Key Management
- Usage Instructions
- Setup Guide
- Technologies Used
- Future Enhancements
This project is a Content-Based Movie Recommendation System that recommends movies based on the similarity of genres, keywords, cast, and crew members. The core model is developed in a Jupyter Notebook (Movie.ipynb
), where we preprocess movie data from TMDB (The Movie Database) and compute the content similarity between different movies using Cosine Similarity on the preprocessed features.
To make the system interactive, we built a Flask web application, which serves the trained model and provides users with movie recommendations. Additional movie information is fetched from the OMDb API and displayed on the UI.
The datasets used in this project are:
tmdb_5000_movies.csv
: Contains movie metadata such as genres, keywords, overview, etc.tmdb_5000_credits.csv
: Includes cast and crew details for each movie.
In the Movie.ipynb
file, the following steps were performed:
- Data Cleaning: Dropped missing or irrelevant columns and handled duplicates.
- Feature Engineering:
- Extracted and converted
genres
,keywords
,cast
, andcrew
columns from stringified lists to actual lists. - Selected only the top 3 cast members and fetched the director for each movie.
- Converted text in the
overview
column to word tokens.
- Extracted and converted
- Stemming: Applied Porter Stemming to normalize the text.
- Vectorization: Used CountVectorizer to convert textual data into vectors with a max feature limit of 5000.
- Similarity Calculation: Computed pairwise cosine similarity to create the recommendation engine.
Given the large size of the trained model and data, it was saved as a pickle file. Due to GitHub limitations, the pickle files were uploaded to Google Drive for efficient storage and retrieval:
movie_list.pkl
: Contains the preprocessed movie data.similarity.pkl
: Contains the cosine similarity matrix.
The application consists of two main components:
-
Backend:
The backend is built using Flask, which handles the model predictions and serves movie recommendations. The pre-trained recommendation model, stored as a pickle file, is loaded into the application when it starts. The backend also communicates with the OMDb API to fetch additional movie details. -
Frontend:
The frontend is rendered using HTML and CSS templates in Flask. It displays recommended movie results, details about the movies like IMDb rating, runtime, year, and posters fetched from OMDb.
The Flask web app serves the trained model and handles API requests to OMDb. It has two main routes:
/
: Displays four movies on the homepage with details like the title, year, runtime, IMDb rating, and poster./movie
: Takes the movie name as input, recommends similar movies, and fetches their details from OMDb.
The application is deployed live on Render. Flask handles routing, fetching recommendations, and integrating OMDb movie details.
We use the OMDb API to retrieve detailed information about movies, such as the IMDb rating, year of release, runtime, and poster. To avoid reaching API limits, we rotate between multiple API keys.
A list of API keys is maintained in the app.py
file. The application checks which API key is valid by testing a request to fetch the movie 'Inception'. If the key returns valid data, it is used for all subsequent API requests. If the key fails, the next one in the list is tested.
-
Homepage:
- When you visit the homepage, you'll see a few popular movies with their details like the title, IMDb rating, year, runtime, and poster.
-
Search Functionality:
- You can search for any movie title on the website, and the system will suggest similar movies based on content similarity.
-
Movie Details:
- For each recommended movie, the app fetches details such as IMDb rating, year, runtime, and poster using the OMDb API.
- Python 3.x
- Flask
- Pandas, NumPy, Scikit-learn
- CountVectorizer (from sklearn)
- Requests library
- Pickle library
- OMDb API keys (free keys can be generated from the OMDb website)
-
Clone the repository:
git clone https://github.com/yourusername/movie-recommendation-system.git cd movie-recommendation-system
-
Install the required dependencies:
pip install -r requirements.txt
-
Download the pickle files (movie list and similarity matrix):
- movie_list.pkl: Download Link
- similarity.pkl: Download Link
Save these files to the
artifacts/
folder in your project directory. -
Run the Flask app:
python app.py
-
Visit the app in your browser at
http://127.0.0.1:5000
.
- Python: For data preprocessing, model building, and backend logic.
- Flask: To build and serve the web application.
- Pandas & NumPy: For data manipulation and processing.
- Scikit-learn: To perform vectorization and calculate content similarity.
- OMDb API: For fetching movie details.
- HTML, CSS: For creating the user interface.
- Google Drive & gdown: To store and retrieve model files.
-
Collaborative Filtering: Implementing collaborative filtering to complement the content-based approach, providing recommendations based on user ratings.
-
Improved Movie Search: Adding autocomplete functionality in the search bar for better user experience.
-
Real-time Movie Database: Updating the dataset to include the latest movie releases, leveraging an external movie database API.
-
UI Enhancements: Improving the UI to make it more user-friendly and visually appealing with dynamic features.