- This Project repository is based on Building a movies recommendation system clone
The Dataset used for building this recommendation engine is mentioned as below:
- Dataset used : MovieLens dataset
- Download Dataset : Download Dataset from these following links
- Download MovieLens dataset hosted on Kaggle then use kaggle link
- Download MovieLens dataset from its official website then use GroupLens link
- Dataset File Format : CSV File (Comma-separated values). NOTE: Download and save dataset inside input_data folder
- Types of dataset :
- The full dataset : This dataset consists of 26,000,000 ratings and 750,000 tag applications applied to 45,000 movies by 270,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.
- NOTE: We will build a simple Recommendation for movies using The full dataset.
- The small dataset : This dataset comprises of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users.
- NOTE: All personalised recommender systems will make use of the small dataset (due to the limited computing power of our system).
- The full dataset : This dataset consists of 26,000,000 ratings and 750,000 tag applications applied to 45,000 movies by 270,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.
- Data description : It contains 100004 ratings and 1296 tag applications across 9125 movies. These data were created by 671 users between January 09, 1995 and October 16, 2016. This dataset was generated on October 17, 2016.Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.
- Data Files Content :
- credits.csv
- keywords.csv
- links.csv
- links_small.csv
- movies_metadata.csv
- ratings.csv
- ratings_small.csv
- List of other dataset available :
- MovieLens - Movie Recommendation Data Sets click link
- Netflix Prize Dataset click link
- Yahoo! - Movie, Music, and Images Ratings Data Sets click link
- Cornell University - Movie-review data for use in sentiment-analysis experiments click link
- MovieTweetings - click link
- Python >=3.5
- pandas
- numpy
- scipy
- scikit-learn
- scikit-surprise
- matplotlib
- seaborn
- jupyter notebook
- jupyter lab
- textblob
- Install Python3 (install python 3.6.4)
- Install anaconda
- Install dependencies using conda
- nltk: In-built installed with anaconda
- numpy: In-built installed with anaconda
- scipy: In-built installed with anaconda
- scikit-learn: In-built installed with anaconda
- scikit-surprise: $ conda install -c conda-forge scikit-surprise
- Pandas: In-built installed with anaconda
- matplotlib: In-built installed with anaconda
- seaborn: In-built installed with anaconda
- jupyter notebook: In-built installed with anaconda
- jupyter lab: In-built installed with anaconda
- textblob: $ conda install -c conda-forge textblob
- If you are facing issue for installing surprise then try the following links which can help you.
- If conda is not working then try to install surprise using pip
- See this installation instructions
- See these links if you have any issues.
- Step 1: Download pycharm IDE community edition by using this link
- Step 2: Install .exe file.
- Reference Code Credit for creating this project are: