diff --git a/(Study Case V) Film Recommendation System/Film Recommender.ipynb b/(Study Case V) Film Recommendation System/Film Recommender.ipynb
new file mode 100644
index 0000000..4669bde
--- /dev/null
+++ b/(Study Case V) Film Recommendation System/Film Recommender.ipynb
@@ -0,0 +1,4488 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QXLJDtxsguzf"
+ },
+ "source": [
+ "> Nama : Muhammad Ammar Nabil\n",
+ "Kelas : M03\n",
+ "Email : mammarnabil1@gmail.com"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xt9VQpBWz-XA"
+ },
+ "source": [
+ "
\n",
+ "\n",
+ "# **Movie Recommender**\n",
+ "###### [Zahra Nazari, Hamidreza Koohi, Javad Mousavi](https://jad.shahroodut.ac.ir/article_2390.html)\n",
+ "---\n",
+ "\n",
+ "\n",
+ "\n",
+ "In this notebook, we learn how to build a recommender model to recommend a simillar film based on what they like (Content Based Filtering) or community like (Collaborative Filtering). In this model, i'll use MovieLens dataset in kaggle that includes:\n",
+ "* Film Title \n",
+ "* Genre\n",
+ "* Tag\n",
+ "* Rating\n",
+ "\n",
+ "> Number Film listed is **27262 data**\n",
+ "\n",
+ "> Number ratings listed is **20.000.000 data**\n",
+ "\n",
+ "\n",
+ "\n",
+ "## • ***Background***\n",
+ "\n",
+ "I choose this problem because it's can help to improve experience of stream app company to recommend similar film that they like or community like. This is can improve satisfaction of client that can impact revenue of company.\n",
+ "\n",
+ "My reference comes from **Zahra Nazari, Hamidreza Koohi, Javad Mousavi** in the journal entitled _**\"Increasing Performance of Recommender Systems by Combining Deep Learning and Extreme Learning Machine\"**_. In the journal, they applied new deep learning-based clustering methods in order to overcome the data sparsity problem, and increment the efficiency of the recommender systems based on precision, accuracy, F-measure, and recal. They use dataset from kaggle [MovieLens 20M Dataset](https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset). For more details about my model, download this [Details Report](https://colab.research.google.com/drive/14zROOHUuS7qmjisQAtCLewyGE_GZ5ML1?usp=sharing) Only available in Bahasa Indonesia\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "FSKDD17gzGf0"
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import tensorflow as tf\n",
+ "import matplotlib.pyplot as plt\n",
+ "from keras import layers\n",
+ "from tensorflow import keras\n",
+ "from google.colab import files\n",
+ "from sklearn.preprocessing import Normalizer\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.metrics.pairwise import cosine_similarity\n",
+ "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+ "from tensorflow.keras.callbacks import EarlyStopping, CSVLogger"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "h_7HniFhz08I"
+ },
+ "source": [
+ "## **Import and Understanding Dataset**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5_FM0R6A-V88"
+ },
+ "source": [
+ "### *1. Data Loading*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 74
+ },
+ "collapsed": true,
+ "id": "PiqjgZaZqeHu",
+ "outputId": "cab22b3b-6fc8-44da-d598-f059d8c9152e"
+ },
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " "
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Saving kaggle.json to kaggle.json\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Upload kaggle.json API\n",
+ "!mkdir ~/.kaggle\n",
+ "files.upload()\n",
+ "!mv kaggle.json ~/.kaggle/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "0Wa0VGWxJRj3",
+ "outputId": "857d8b28-1c6c-4e0d-eb0d-3c05477f1795"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "total 16\n",
+ "drwxr-xr-x 2 root root 4096 Sep 25 16:58 .\n",
+ "drwx------ 1 root root 4096 Sep 25 16:58 ..\n",
+ "-rw------- 1 root root 63 Sep 25 16:58 kaggle.json\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Change permission\n",
+ "!chmod 600 ~/.kaggle/kaggle.json\n",
+ "!ls ~/.kaggle/ -la"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "QN_62ntdKq11",
+ "outputId": "ec3bed35-d4eb-4253-ba91-2f406536835e"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Downloading movielens-20m-dataset.zip to /content\n",
+ " 97% 189M/195M [00:08<00:00, 34.6MB/s]\n",
+ "100% 195M/195M [00:08<00:00, 24.9MB/s]\n",
+ "Archive: movielens-20m-dataset.zip\n",
+ " inflating: genome_scores.csv \n",
+ " inflating: genome_tags.csv \n",
+ " inflating: link.csv \n",
+ " inflating: movie.csv \n",
+ " inflating: rating.csv \n",
+ " inflating: tag.csv \n"
+ ]
+ }
+ ],
+ "source": [
+ "# Download and extract kaggle dataset\n",
+ "!kaggle datasets download -d grouplens/movielens-20m-dataset\n",
+ "!unzip movielens-20m-dataset.zip\n",
+ "!rm movielens-20m-dataset.zip"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "id": "PN-F2Sfey7yy"
+ },
+ "outputs": [],
+ "source": [
+ "# load the dataset\n",
+ "movies = pd.read_csv('movie.csv')\n",
+ "ratings = pd.read_csv('rating.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "JAahsD9kj0Y2",
+ "outputId": "647cc67a-ca5a-4c68-f153-148341313a63"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total movies \t\t= 27262 movie\n",
+ "Total rating count \t= 26744 rating\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Total movies \\t\\t= {(len(movies.title.unique()))} movie')\n",
+ "print(f'Total rating count \\t= {(len(ratings.movieId.unique()))} rating')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "V4bj8rD0-PdR"
+ },
+ "source": [
+ "### *2. Exploratory Data Analysis - Variable Description*"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "80_TMAaBl6sm"
+ },
+ "source": [
+ "#### **EDA Movies**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "NZnSsxMjmGz8",
+ "outputId": "67ae125d-b2d5-4617-8dea-103b635553ac"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Shape of movies (27278, 3)\n",
+ "\n",
+ "\n",
+ "RangeIndex: 27278 entries, 0 to 27277\n",
+ "Data columns (total 3 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 movieId 27278 non-null int64 \n",
+ " 1 title 27278 non-null object\n",
+ " 2 genres 27278 non-null object\n",
+ "dtypes: int64(1), object(2)\n",
+ "memory usage: 639.5+ KB\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Check data type each atribute\n",
+ "print(f'Shape of movies {movies.shape}\\n')\n",
+ "movies.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "poBQrSCy1C5W",
+ "outputId": "e7f73eb1-590b-4f86-ac8c-b5c6b1677cee"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "RangeIndex: 27278 entries, 0 to 27277\n",
+ "Data columns (total 3 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 movieId 27278 non-null category\n",
+ " 1 title 27278 non-null object \n",
+ " 2 genres 27278 non-null category\n",
+ "dtypes: category(2), object(1)\n",
+ "memory usage: 1.6+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Change dataType of movieId\n",
+ "movies = movies.astype({'movieId': 'category', 'genres': 'category'})\n",
+ "movies.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "VvBpuPtxnSTC",
+ "outputId": "72f00982-8b5f-4b5d-9766-126526f01192"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total movies : 27262 movie\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array(['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)',\n",
+ " ..., 'The Pirates (2014)', 'Rentun Ruusu (2001)',\n",
+ " 'Innocence (2014)'], dtype=object)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ],
+ "source": [
+ "# Check title attribute\n",
+ "print(f'Total movies : {len(movies.title.unique())} movie\\n') \n",
+ "movies.title.unique()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "JDHIRePrnsHB",
+ "outputId": "51f32f0e-0c7b-418e-de92-05dc86348473"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total genres : 1342 genre\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "['Adventure|Animation|Children|Comedy|Fantasy', 'Adventure|Children|Fantasy', 'Comedy|Romance', 'Comedy|Drama|Romance', 'Comedy', ..., 'Adventure|Children|Drama|Sci-Fi', 'Children|Documentary|Drama', 'Action|Adventure|Animation|Fantasy|Horror', 'Animation|Children|Comedy|Fantasy|Sci-Fi', 'Animation|Children|Comedy|Western']\n",
+ "Length: 1342\n",
+ "Categories (1342, object): ['(no genres listed)', 'Action', 'Action|Adventure',\n",
+ " 'Action|Adventure|Animation', ..., 'Thriller|Western', 'War', 'War|Western', 'Western']"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ],
+ "source": [
+ "# Check genre attribute\n",
+ "print(f'Total genres : {len(movies.genres.unique())} genre\\n') \n",
+ "movies.genres.unique()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "znMufTthmIyf"
+ },
+ "source": [
+ "#### **EDA Ratings**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "DdK7qhCDmHXM",
+ "outputId": "7a9f4b97-285b-4414-8a11-aa0893de4d7f"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Shape of Ratings (20000263, 4)\n",
+ "\n",
+ "\n",
+ "RangeIndex: 20000263 entries, 0 to 20000262\n",
+ "Data columns (total 4 columns):\n",
+ " # Column Dtype \n",
+ "--- ------ ----- \n",
+ " 0 userId int64 \n",
+ " 1 movieId int64 \n",
+ " 2 rating float64\n",
+ " 3 timestamp object \n",
+ "dtypes: float64(1), int64(2), object(1)\n",
+ "memory usage: 610.4+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Check data type each attribute\n",
+ "print(f'Shape of Ratings {ratings.shape}\\n')\n",
+ "ratings.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "i5koKrww1_mY",
+ "outputId": "863d3d77-7959-4af7-c7fc-bf3acf07af91"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "RangeIndex: 20000263 entries, 0 to 20000262\n",
+ "Data columns (total 4 columns):\n",
+ " # Column Dtype \n",
+ "--- ------ ----- \n",
+ " 0 userId category \n",
+ " 1 movieId category \n",
+ " 2 rating float32 \n",
+ " 3 timestamp datetime64[ns]\n",
+ "dtypes: category(2), datetime64[ns](1), float32(1)\n",
+ "memory usage: 349.6 MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Change datatype to get less memory\n",
+ "ratings = ratings.astype({'movieId': 'category', 'userId': 'category', \n",
+ " 'rating': 'float32', 'timestamp': 'datetime64[ns]'})\n",
+ "ratings.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "a2fvOjE4o6nb",
+ "outputId": "14be5312-13e2-43e0-c5ab-3f39a7e9b190"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "count 20000263.0\n",
+ "mean 4.0\n",
+ "std 1.0\n",
+ "min 0.0\n",
+ "25% 3.0\n",
+ "50% 4.0\n",
+ "75% 4.0\n",
+ "max 5.0\n",
+ "Name: rating, dtype: float64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ],
+ "source": [
+ "# Range of ratings\n",
+ "ratings['rating'].describe().round()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "stGZ4WzqoAZn",
+ "outputId": "a70a97d6-8915-484e-c665-56ce40b8ee5c"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total Ratings count : 26744 rating\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0 3.5\n",
+ "1 3.5\n",
+ "2 3.5\n",
+ "3 3.5\n",
+ "4 3.5\n",
+ "Name: rating, dtype: float32"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ],
+ "source": [
+ "# Check rating attribute\n",
+ "print(f'Total Ratings count : {len(ratings.movieId.unique())} rating\\n') \n",
+ "ratings.rating.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xXMgxzzPXqsA"
+ },
+ "source": [
+ "### *3. Exploratory Data Analysis - Checking Missing Value*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "8fHcuJG9XP5e",
+ "outputId": "65a994ed-313c-4229-a16d-6140aa399071"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "movieId 0\n",
+ "title 0\n",
+ "genres 0\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ],
+ "source": [
+ "# Check Null value in movies\n",
+ "movies.isna().sum()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "wz1thDRKXJQc",
+ "outputId": "c1ea24ea-8567-4908-f8c8-81e6d5cab825"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "userId 0\n",
+ "movieId 0\n",
+ "rating 0\n",
+ "timestamp 0\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ],
+ "source": [
+ "# Check null value in ratings\n",
+ "ratings.isna().sum()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "z0RYJwW2R6W2"
+ },
+ "source": [
+ "## **Data Preprocessing**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VEtZKK2YR8gI"
+ },
+ "source": [
+ "### *1. Content Based Filtering*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 480
+ },
+ "id": "Ixvj9ynhQ4Ef",
+ "outputId": "292bca8d-fe03-430e-d4f8-a7614805f533"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:12: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only\n",
+ " if sys.path[0] == '':\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " movieId film genre\n",
+ "0 1 toy story Adventure\n",
+ "1 2 jumanji Adventure\n",
+ "2 3 grumpier old men Comedy\n",
+ "3 4 waiting to exhale Comedy\n",
+ "4 5 father of the bride part ii Comedy\n",
+ "... ... ... ...\n",
+ "27273 131254 kein bund für's leben Comedy\n",
+ "27274 131256 feuer, eis & dosenbier Comedy\n",
+ "27275 131258 the pirates Adventure\n",
+ "27276 131260 rentun ruusu NaN\n",
+ "27277 131262 innocence Adventure\n",
+ "\n",
+ "[27278 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " movieId | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " toy story | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " jumanji | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " grumpier old men | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " waiting to exhale | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " father of the bride part ii | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 27273 | \n",
+ " 131254 | \n",
+ " kein bund für's leben | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27274 | \n",
+ " 131256 | \n",
+ " feuer, eis & dosenbier | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27275 | \n",
+ " 131258 | \n",
+ " the pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 27276 | \n",
+ " 131260 | \n",
+ " rentun ruusu | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 27277 | \n",
+ " 131262 | \n",
+ " innocence | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
27278 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 21
+ }
+ ],
+ "source": [
+ "# Make a film only have one genre\n",
+ "genre_fix = movies['genres'].map(lambda genre: genre.split('|')[0])\n",
+ "genre_fix = pd.DataFrame(genre_fix.replace('(no genres listed)', np.nan))\n",
+ "\n",
+ "# Rename genres and rename film title\n",
+ "genre_fix = genre_fix.replace({'genres': {'Sci-Fi': 'Scifi', 'Film-Noir': 'Noir'}})\n",
+ "title_fix = pd.DataFrame(movies['title'].map(lambda title: title.lower()[:-7]))\n",
+ "\n",
+ "movies_fix = movies.copy()\n",
+ "movies_fix['film'] = title_fix\n",
+ "movies_fix['genre'] = genre_fix\n",
+ "movies_fix.drop(['title', 'genres'], 1, inplace=True)\n",
+ "movies_fix['genre'] = movies_fix['genre'].astype('category')\n",
+ "movies_fix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 638
+ },
+ "id": "9Tsud3emVO6O",
+ "outputId": "b83a7d11-3f45-4700-d11e-c2e92ffe449d"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Before droping null value\n",
+ "movieId 0\n",
+ "film 0\n",
+ "genre 246\n",
+ "dtype: int64 \n",
+ "\n",
+ "After droping null value\n",
+ "movieId 0\n",
+ "film 0\n",
+ "genre 0\n",
+ "dtype: int64 \n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " movieId film genre\n",
+ "0 1 toy story Adventure\n",
+ "1 2 jumanji Adventure\n",
+ "2 3 grumpier old men Comedy\n",
+ "3 4 waiting to exhale Comedy\n",
+ "4 5 father of the bride part ii Comedy\n",
+ "... ... ... ...\n",
+ "27272 131252 forklift driver klaus: the first day on the job Comedy\n",
+ "27273 131254 kein bund für's leben Comedy\n",
+ "27274 131256 feuer, eis & dosenbier Comedy\n",
+ "27275 131258 the pirates Adventure\n",
+ "27277 131262 innocence Adventure\n",
+ "\n",
+ "[27032 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " movieId | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " toy story | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " jumanji | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " grumpier old men | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " waiting to exhale | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " father of the bride part ii | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 27272 | \n",
+ " 131252 | \n",
+ " forklift driver klaus: the first day on the job | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27273 | \n",
+ " 131254 | \n",
+ " kein bund für's leben | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27274 | \n",
+ " 131256 | \n",
+ " feuer, eis & dosenbier | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27275 | \n",
+ " 131258 | \n",
+ " the pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 27277 | \n",
+ " 131262 | \n",
+ " innocence | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
27032 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ }
+ ],
+ "source": [
+ "# Drop null value\n",
+ "print('Before droping null value')\n",
+ "print(movies_fix.isna().sum(), '\\n')\n",
+ "\n",
+ "print('After droping null value')\n",
+ "movies_fix.dropna(inplace=True)\n",
+ "print(movies_fix.isna().sum(), '\\n')\n",
+ "movies_fix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "id": "NbTmyWDuX9X9",
+ "outputId": "40a28f4b-60f9-4ea1-8ba6-c8ec8d759cfc"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " movieId film genre\n",
+ "0 1 toy story Adventure\n",
+ "1 2 jumanji Adventure\n",
+ "2 3 grumpier old men Comedy\n",
+ "3 4 waiting to exhale Comedy\n",
+ "4 5 father of the bride part ii Comedy\n",
+ "... ... ... ...\n",
+ "27271 131250 no more school Comedy\n",
+ "27272 131252 forklift driver klaus: the first day on the job Comedy\n",
+ "27273 131254 kein bund für's leben Comedy\n",
+ "27274 131256 feuer, eis & dosenbier Comedy\n",
+ "27275 131258 the pirates Adventure\n",
+ "\n",
+ "[25966 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " movieId | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " toy story | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " jumanji | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " grumpier old men | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " waiting to exhale | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " father of the bride part ii | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 27271 | \n",
+ " 131250 | \n",
+ " no more school | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27272 | \n",
+ " 131252 | \n",
+ " forklift driver klaus: the first day on the job | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27273 | \n",
+ " 131254 | \n",
+ " kein bund für's leben | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27274 | \n",
+ " 131256 | \n",
+ " feuer, eis & dosenbier | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27275 | \n",
+ " 131258 | \n",
+ " the pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
25966 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ],
+ "source": [
+ "# Drop duplacate data\n",
+ "movies_fix.drop_duplicates('film', inplace=True)\n",
+ "movies_fix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "id": "63h1PdlteBBh",
+ "outputId": "fc8d11bd-61d0-48b5-c025-8cf9b78c5660"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " id film genre\n",
+ "0 1 toy story Adventure\n",
+ "1 2 jumanji Adventure\n",
+ "2 3 grumpier old men Comedy\n",
+ "3 4 waiting to exhale Comedy\n",
+ "4 5 father of the bride part ii Comedy\n",
+ "... ... ... ...\n",
+ "27271 131250 no more school Comedy\n",
+ "27272 131252 forklift driver klaus: the first day on the job Comedy\n",
+ "27273 131254 kein bund für's leben Comedy\n",
+ "27274 131256 feuer, eis & dosenbier Comedy\n",
+ "27275 131258 the pirates Adventure\n",
+ "\n",
+ "[25966 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " id | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " toy story | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " jumanji | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " grumpier old men | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " waiting to exhale | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " father of the bride part ii | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 27271 | \n",
+ " 131250 | \n",
+ " no more school | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27272 | \n",
+ " 131252 | \n",
+ " forklift driver klaus: the first day on the job | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27273 | \n",
+ " 131254 | \n",
+ " kein bund für's leben | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27274 | \n",
+ " 131256 | \n",
+ " feuer, eis & dosenbier | \n",
+ " Comedy | \n",
+ "
\n",
+ " \n",
+ " 27275 | \n",
+ " 131258 | \n",
+ " the pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
25966 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ],
+ "source": [
+ "# Create data variable\n",
+ "data = movies_fix.copy().rename(columns={'movieId':'id'})\n",
+ "data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "inLs88GJSKXV"
+ },
+ "source": [
+ "### *2. Collaborative Filtering*"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a-qeCNoQqx8l"
+ },
+ "source": [
+ "#### **Generate movie, user to index variable**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 661
+ },
+ "collapsed": true,
+ "id": "ulQFfTL-S21d",
+ "outputId": "c3a027d3-1d6b-492e-8026-98035422e619"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " userId movieId rating timestamp\n",
+ "0 1 2 3.5 2005-04-02 23:53:47\n",
+ "1 1 29 3.5 2005-04-02 23:31:16\n",
+ "2 1 32 3.5 2005-04-02 23:33:39\n",
+ "3 1 47 3.5 2005-04-02 23:32:07\n",
+ "4 1 50 3.5 2005-04-02 23:29:40\n",
+ "... ... ... ... ...\n",
+ "20000258 138493 68954 4.5 2009-11-13 15:42:00\n",
+ "20000259 138493 69526 4.5 2009-12-03 18:31:48\n",
+ "20000260 138493 69644 3.0 2009-12-07 18:10:57\n",
+ "20000261 138493 70286 5.0 2009-11-13 15:42:24\n",
+ "20000262 138493 71619 2.5 2009-10-17 20:25:36\n",
+ "\n",
+ "[20000263 rows x 4 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " userId | \n",
+ " movieId | \n",
+ " rating | \n",
+ " timestamp | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:53:47 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 29 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:31:16 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 32 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:33:39 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 47 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:32:07 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1 | \n",
+ " 50 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:29:40 | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 20000258 | \n",
+ " 138493 | \n",
+ " 68954 | \n",
+ " 4.5 | \n",
+ " 2009-11-13 15:42:00 | \n",
+ "
\n",
+ " \n",
+ " 20000259 | \n",
+ " 138493 | \n",
+ " 69526 | \n",
+ " 4.5 | \n",
+ " 2009-12-03 18:31:48 | \n",
+ "
\n",
+ " \n",
+ " 20000260 | \n",
+ " 138493 | \n",
+ " 69644 | \n",
+ " 3.0 | \n",
+ " 2009-12-07 18:10:57 | \n",
+ "
\n",
+ " \n",
+ " 20000261 | \n",
+ " 138493 | \n",
+ " 70286 | \n",
+ " 5.0 | \n",
+ " 2009-11-13 15:42:24 | \n",
+ "
\n",
+ " \n",
+ " 20000262 | \n",
+ " 138493 | \n",
+ " 71619 | \n",
+ " 2.5 | \n",
+ " 2009-10-17 20:25:36 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
20000263 rows × 4 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 41
+ }
+ ],
+ "source": [
+ "# Create rate dataset\n",
+ "rate_raw = ratings.copy()\n",
+ "rate_raw"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "lDD2DmZJhgWv",
+ "outputId": "7529143a-c812-45f2-a03b-eb817ba71449"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total userID: 138493\n",
+ "Total restoID: 26744 \n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Get unique userID\n",
+ "user_id = rate_raw['userId'].unique().tolist()\n",
+ "movie_id = rate_raw['movieId'].unique().tolist()\n",
+ "print('Total userID: ', len(user_id))\n",
+ "print('Total restoID: ', len(movie_id), '\\n')\n",
+ " \n",
+ "# Create dic user:index\n",
+ "user_to_index = {x: i for i, x in enumerate(user_id)}\n",
+ "# print('Encoded userID : ', user_to_index)\n",
+ " \n",
+ "# Create dic index:user\n",
+ "index_to_user = {i: x for i, x in enumerate(user_id)}\n",
+ "# print('Encoded index:userID: ', index_to_user)\n",
+ "\n",
+ "# Create dic movie:index\n",
+ "movie_to_index = {x: i for i, x in enumerate(movie_id)}\n",
+ "# print('Encoded movieID: ', movie_to_index)\n",
+ "\n",
+ "# Create dic index:movie\n",
+ "index_to_movie = {i: x for i, x in enumerate(movie_id)}\n",
+ "# print('Encoded index:movieID: ', index_to_movie)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kh-qi7hblkqB"
+ },
+ "source": [
+ "Mohon maaf saya tidak menampilkan hasil output diatas dikarenakan membuat crash aplikasi karena memuat data yang besar (>200000 data)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "CCy0UYH22U-O",
+ "outputId": "c2ba9b65-02c1-46ff-fc6f-ad434fa786a6"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total of users = 138493\n",
+ "Total of movies = 26744\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Create length of user and movie\n",
+ "num_user = len(user_to_index)\n",
+ "num_movie = len(index_to_movie)\n",
+ "print('Total of users = ', num_user)\n",
+ "print('Total of movies = ', num_movie)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "eVrgPqW1oaoB"
+ },
+ "outputs": [],
+ "source": [
+ "# Add collumns user and movie based on user and movie index\n",
+ "rate_raw['user'] = rate_raw['userId'].map(user_to_index)\n",
+ "rate_raw['movie'] = rate_raw['movieId'].map(movie_to_index)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 357
+ },
+ "id": "Mt0bn317qi1H",
+ "outputId": "3114af17-a32e-44b4-dad4-0a9893c71491"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " userId movieId rating timestamp user movie\n",
+ "0 1 2 3.5 2005-04-02 23:53:47 0 0\n",
+ "1 1 29 3.5 2005-04-02 23:31:16 0 1\n",
+ "2 1 32 3.5 2005-04-02 23:33:39 0 2\n",
+ "3 1 47 3.5 2005-04-02 23:32:07 0 3\n",
+ "4 1 50 3.5 2005-04-02 23:29:40 0 4"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " userId | \n",
+ " movieId | \n",
+ " rating | \n",
+ " timestamp | \n",
+ " user | \n",
+ " movie | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:53:47 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 29 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:31:16 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 32 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:33:39 | \n",
+ " 0 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 47 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:32:07 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 1 | \n",
+ " 50 | \n",
+ " 3.5 | \n",
+ " 2005-04-02 23:29:40 | \n",
+ " 0 | \n",
+ " 4 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 45
+ }
+ ],
+ "source": [
+ "rate_raw.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IjXngGc7vJhn"
+ },
+ "source": [
+ "#### **Normalize**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "YDXzpupTyI1t"
+ },
+ "outputs": [],
+ "source": [
+ "# Create min max rate_raw to normalize targed\n",
+ "min_rate = min(rate_raw['rating'])\n",
+ "max_rate = max(rate_raw['rating'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 318
+ },
+ "collapsed": true,
+ "id": "9-taeuarvJhw",
+ "outputId": "2ee29525-ab7a-43ea-f9ec-27ff95417528"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Before normalize : \n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " rating\n",
+ "count 20000263.00\n",
+ "mean 3.53\n",
+ "std 1.05\n",
+ "min 0.50\n",
+ "25% 3.00\n",
+ "50% 3.50\n",
+ "75% 4.00\n",
+ "max 5.00"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " rating | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " count | \n",
+ " 20000263.00 | \n",
+ "
\n",
+ " \n",
+ " mean | \n",
+ " 3.53 | \n",
+ "
\n",
+ " \n",
+ " std | \n",
+ " 1.05 | \n",
+ "
\n",
+ " \n",
+ " min | \n",
+ " 0.50 | \n",
+ "
\n",
+ " \n",
+ " 25% | \n",
+ " 3.00 | \n",
+ "
\n",
+ " \n",
+ " 50% | \n",
+ " 3.50 | \n",
+ "
\n",
+ " \n",
+ " 75% | \n",
+ " 4.00 | \n",
+ "
\n",
+ " \n",
+ " max | \n",
+ " 5.00 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 77
+ }
+ ],
+ "source": [
+ "# Divide data and label then Normalize\n",
+ "print('Before normalize : ')\n",
+ "\n",
+ "x = rate_raw[['user', 'movie']].to_numpy()\n",
+ "y = rate_raw[\"rating\"].apply(lambda x: (x - min_rate) / (max_rate - min_rate)).to_numpy()\n",
+ "\n",
+ "rate_raw.describe().round(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 300
+ },
+ "collapsed": true,
+ "id": "KxjLG9KZvJh0",
+ "outputId": "70361749-2a13-4436-bf20-1174ba134520"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " 0\n",
+ "count 20000263.0\n",
+ "mean 0.7\n",
+ "std 0.2\n",
+ "min 0.0\n",
+ "25% 0.6\n",
+ "50% 0.7\n",
+ "75% 0.8\n",
+ "max 1.0"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " count | \n",
+ " 20000263.0 | \n",
+ "
\n",
+ " \n",
+ " mean | \n",
+ " 0.7 | \n",
+ "
\n",
+ " \n",
+ " std | \n",
+ " 0.2 | \n",
+ "
\n",
+ " \n",
+ " min | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 25% | \n",
+ " 0.6 | \n",
+ "
\n",
+ " \n",
+ " 50% | \n",
+ " 0.7 | \n",
+ "
\n",
+ " \n",
+ " 75% | \n",
+ " 0.8 | \n",
+ "
\n",
+ " \n",
+ " max | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 78
+ }
+ ],
+ "source": [
+ "# After Normalize\n",
+ "pd.DataFrame(y).describe().round(1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Change x dtype to get less memory\n",
+ "x = x.astype('int32')\n",
+ "x.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mRcddjD3c5AL",
+ "outputId": "d83b0e25-ea92-4754-87c8-a10b2f3bba1b"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dtype('int32')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 80
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Change y dtype to get less memory\n",
+ "y = y.astype('float16')\n",
+ "y.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "rIBN5H5Rd29D",
+ "outputId": "a830303d-4805-419b-90ce-4356bd1d2815"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "dtype('float16')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 81
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hwE9MmNyXDuo"
+ },
+ "source": [
+ "#### **Split Dataset**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BYQWJqlmXJyE"
+ },
+ "outputs": [],
+ "source": [
+ "# Split train and test set\n",
+ "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.001, \n",
+ " random_state = 123)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "qjZmCZf7XTDc",
+ "outputId": "b3fd6b8c-e6f4-4957-9d36-2e33d53e33d5"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Total # of sample in whole dataset: 20000263\n",
+ "Total # of sample in train dataset: 19980262\n",
+ "Total # of sample in test dataset: 20001\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Check total of test and train set\n",
+ "print(f'Total # of sample in whole dataset: {len(x)}')\n",
+ "print(f'Total # of sample in train dataset: {len(x_train)}')\n",
+ "print(f'Total # of sample in test dataset: {len(x_test)}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9zpiVQdCZpou"
+ },
+ "source": [
+ "## **Model Development**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0SLgSmWLfwzp"
+ },
+ "source": [
+ "### *1. Content Based Filtering*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "U004XzEwf4iw",
+ "outputId": "86633ba6-5768-4d2a-9407-e66ed99e0abd"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "['Adventure', 'Comedy', 'Action', 'Drama', 'Crime', ..., 'Romance', 'War', 'Scifi', 'Musical', 'IMAX']\n",
+ "Length: 19\n",
+ "Categories (19, object): ['Action', 'Adventure', 'Animation', 'Children', ..., 'Scifi', 'Thriller',\n",
+ " 'War', 'Western']"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 26
+ }
+ ],
+ "source": [
+ "data.genre.unique()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "cn5nPatEZu2Q",
+ "outputId": "ef0b6d36-43b2-4ade-d8c0-33c2e0bfa2ba"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Shape of TF-IDF Matrix = (25966, 19) \n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array(['action', 'adventure', 'animation', 'children', 'comedy', 'crime',\n",
+ " 'documentary', 'drama', 'fantasy', 'horror', 'imax', 'musical',\n",
+ " 'mystery', 'noir', 'romance', 'scifi', 'thriller', 'war',\n",
+ " 'western'], dtype=object)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 27
+ }
+ ],
+ "source": [
+ "# Vectorize data with TF-IDF\n",
+ "tf = TfidfVectorizer()\n",
+ "tfidf_matrix = tf.fit_transform(data['genre']) \n",
+ "\n",
+ "print('Shape of TF-IDF Matrix =', tfidf_matrix.shape, '\\n')\n",
+ "tf.get_feature_names_out()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "QWmqupDtjiBZ",
+ "outputId": "9038b96d-a7b4-45bb-efb0-777242707e6a"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "matrix([[0., 1., 0., ..., 0., 0., 0.],\n",
+ " [0., 1., 0., ..., 0., 0., 0.],\n",
+ " [0., 0., 0., ..., 0., 0., 0.],\n",
+ " ...,\n",
+ " [0., 0., 0., ..., 0., 0., 0.],\n",
+ " [0., 0., 0., ..., 0., 0., 0.],\n",
+ " [0., 1., 0., ..., 0., 0., 0.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 28
+ }
+ ],
+ "source": [
+ "# Dense TF-IDF Matrix\n",
+ "tfidf_matrix.todense()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "gSquNe3gjpZ5",
+ "outputId": "237ec34f-63ce-472f-cbc8-27c0997729c2"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([[1., 1., 0., ..., 0., 0., 1.],\n",
+ " [1., 1., 0., ..., 0., 0., 1.],\n",
+ " [0., 0., 1., ..., 1., 1., 0.],\n",
+ " ...,\n",
+ " [0., 0., 1., ..., 1., 1., 0.],\n",
+ " [0., 0., 1., ..., 1., 1., 0.],\n",
+ " [1., 1., 0., ..., 0., 0., 1.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 29
+ }
+ ],
+ "source": [
+ "# Calculate cosine similarity of tfidf_matrix\n",
+ "cosine_matrix= cosine_similarity(tfidf_matrix) \n",
+ "cosine_matrix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 540
+ },
+ "collapsed": true,
+ "id": "L0-oM3Hfj2O9",
+ "outputId": "b4b28684-e3ef-4b35-aec3-85d441b43056"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Shape: (25966, 25966)\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "film toy story jumanji grumpier old men \\\n",
+ "film \n",
+ "toy story 1.0 1.0 0.0 \n",
+ "jumanji 1.0 1.0 0.0 \n",
+ "grumpier old men 0.0 0.0 1.0 \n",
+ "waiting to exhale 0.0 0.0 1.0 \n",
+ "father of the bride part ii 0.0 0.0 1.0 \n",
+ "\n",
+ "film waiting to exhale father of the bride part ii \\\n",
+ "film \n",
+ "toy story 0.0 0.0 \n",
+ "jumanji 0.0 0.0 \n",
+ "grumpier old men 1.0 1.0 \n",
+ "waiting to exhale 1.0 1.0 \n",
+ "father of the bride part ii 1.0 1.0 \n",
+ "\n",
+ "film heat sabrina tom and huck sudden death \\\n",
+ "film \n",
+ "toy story 0.0 0.0 1.0 0.0 \n",
+ "jumanji 0.0 0.0 1.0 0.0 \n",
+ "grumpier old men 0.0 1.0 0.0 0.0 \n",
+ "waiting to exhale 0.0 1.0 0.0 0.0 \n",
+ "father of the bride part ii 0.0 1.0 0.0 0.0 \n",
+ "\n",
+ "film goldeneye ... what men talk about \\\n",
+ "film ... \n",
+ "toy story 0.0 ... 0.0 \n",
+ "jumanji 0.0 ... 0.0 \n",
+ "grumpier old men 0.0 ... 1.0 \n",
+ "waiting to exhale 0.0 ... 1.0 \n",
+ "father of the bride part ii 0.0 ... 1.0 \n",
+ "\n",
+ "film three quarter moon ants in the pants \\\n",
+ "film \n",
+ "toy story 0.0 0.0 \n",
+ "jumanji 0.0 0.0 \n",
+ "grumpier old men 1.0 1.0 \n",
+ "waiting to exhale 1.0 1.0 \n",
+ "father of the bride part ii 1.0 1.0 \n",
+ "\n",
+ "film werner - gekotzt wird später brother bear 2 \\\n",
+ "film \n",
+ "toy story 0.0 1.0 \n",
+ "jumanji 0.0 1.0 \n",
+ "grumpier old men 0.0 0.0 \n",
+ "waiting to exhale 0.0 0.0 \n",
+ "father of the bride part ii 0.0 0.0 \n",
+ "\n",
+ "film no more school \\\n",
+ "film \n",
+ "toy story 0.0 \n",
+ "jumanji 0.0 \n",
+ "grumpier old men 1.0 \n",
+ "waiting to exhale 1.0 \n",
+ "father of the bride part ii 1.0 \n",
+ "\n",
+ "film forklift driver klaus: the first day on the job \\\n",
+ "film \n",
+ "toy story 0.0 \n",
+ "jumanji 0.0 \n",
+ "grumpier old men 1.0 \n",
+ "waiting to exhale 1.0 \n",
+ "father of the bride part ii 1.0 \n",
+ "\n",
+ "film kein bund für's leben feuer, eis & dosenbier \\\n",
+ "film \n",
+ "toy story 0.0 0.0 \n",
+ "jumanji 0.0 0.0 \n",
+ "grumpier old men 1.0 1.0 \n",
+ "waiting to exhale 1.0 1.0 \n",
+ "father of the bride part ii 1.0 1.0 \n",
+ "\n",
+ "film the pirates \n",
+ "film \n",
+ "toy story 1.0 \n",
+ "jumanji 1.0 \n",
+ "grumpier old men 0.0 \n",
+ "waiting to exhale 0.0 \n",
+ "father of the bride part ii 0.0 \n",
+ "\n",
+ "[5 rows x 25966 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " film | \n",
+ " toy story | \n",
+ " jumanji | \n",
+ " grumpier old men | \n",
+ " waiting to exhale | \n",
+ " father of the bride part ii | \n",
+ " heat | \n",
+ " sabrina | \n",
+ " tom and huck | \n",
+ " sudden death | \n",
+ " goldeneye | \n",
+ " ... | \n",
+ " what men talk about | \n",
+ " three quarter moon | \n",
+ " ants in the pants | \n",
+ " werner - gekotzt wird später | \n",
+ " brother bear 2 | \n",
+ " no more school | \n",
+ " forklift driver klaus: the first day on the job | \n",
+ " kein bund für's leben | \n",
+ " feuer, eis & dosenbier | \n",
+ " the pirates | \n",
+ "
\n",
+ " \n",
+ " film | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " toy story | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " ... | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " jumanji | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " ... | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " grumpier old men | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " ... | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " waiting to exhale | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " ... | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " father of the bride part ii | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " ... | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
5 rows × 25966 columns
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 30
+ }
+ ],
+ "source": [
+ "# Create cosine similary dataframe with film\n",
+ "cosine_df = pd.DataFrame(cosine_matrix, index=data['film'], columns=data['film'])\n",
+ "print('Shape:', cosine_df.shape)\n",
+ " \n",
+ "cosine_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QuF4iwhf0RzL"
+ },
+ "source": [
+ "### *2. Collaborate Filtering*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "SwKfyUIe0UGd"
+ },
+ "outputs": [],
+ "source": [
+ "# Modify RecommenderNet Class\n",
+ "class RecommenderNet(tf.keras.Model):\n",
+ " \n",
+ " # Define Constructor\n",
+ " def __init__(self, num_users, num_movie, embedding_size, **kwargs):\n",
+ " super(RecommenderNet, self).__init__(**kwargs)\n",
+ " self.num_users = num_users\n",
+ " self.num_movie = num_movie\n",
+ " self.embedding_size = embedding_size\n",
+ " self.user_embedding = layers.Embedding( num_users,\n",
+ " embedding_size,\n",
+ " embeddings_initializer = 'he_normal',\n",
+ " embeddings_regularizer = keras.regularizers.l2(1e-6)\n",
+ " )\n",
+ " self.user_bias = layers.Embedding(num_users, 1) \n",
+ " self.resto_embedding = layers.Embedding( \n",
+ " num_movie,\n",
+ " embedding_size,\n",
+ " embeddings_initializer = 'he_normal',\n",
+ " embeddings_regularizer = keras.regularizers.l2(1e-6)\n",
+ " )\n",
+ " self.resto_bias = layers.Embedding(num_movie, 1)\n",
+ " \n",
+ " def call(self, inputs):\n",
+ " user_vector = self.user_embedding(inputs[:,0]) \n",
+ " user_bias = self.user_bias(inputs[:, 0]) \n",
+ " resto_vector = self.resto_embedding(inputs[:, 1]) \n",
+ " resto_bias = self.resto_bias(inputs[:, 1]) \n",
+ " \n",
+ " dot_user_resto = tf.tensordot(user_vector, resto_vector, 2) \n",
+ " \n",
+ " x = dot_user_resto + user_bias + resto_bias\n",
+ " \n",
+ " return tf.nn.sigmoid(x) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "EcLEC-Xm0dOA"
+ },
+ "outputs": [],
+ "source": [
+ "# Create Model\n",
+ "model = RecommenderNet(num_user, num_movie, 50)\n",
+ "model.compile('Adam', 'binary_crossentropy', \n",
+ " [tf.keras.metrics.MeanSquaredError()])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "NYPeZ1L1hMHZ"
+ },
+ "outputs": [],
+ "source": [
+ "# Create Callback\n",
+ "callback= [\n",
+ " EarlyStopping('val_mean_squared_error', 0.1, 8, 1),\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Train Model\n",
+ "history = model.fit(\n",
+ " x_train, \n",
+ " y_train, \n",
+ " 200000, \n",
+ " 50, \n",
+ " callbacks=callback,\n",
+ " validation_data = (x_test, y_test)\n",
+ ")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "B4_HCA1NeX5e",
+ "outputId": "7c6dd734-a0d3-4aae-ffa2-e9a543c366e0"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Epoch 1/50\n",
+ "100/100 [==============================] - 29s 265ms/step - loss: 0.6538 - mean_squared_error: 0.0639 - val_loss: 0.6472 - val_mean_squared_error: 0.0611\n",
+ "Epoch 2/50\n",
+ "100/100 [==============================] - 27s 267ms/step - loss: 1.4197 - mean_squared_error: 0.1478 - val_loss: 0.8099 - val_mean_squared_error: 0.1398\n",
+ "Epoch 3/50\n",
+ "100/100 [==============================] - 33s 331ms/step - loss: 6.1476 - mean_squared_error: 0.2614 - val_loss: 1.3699 - val_mean_squared_error: 0.1526\n",
+ "Epoch 4/50\n",
+ "100/100 [==============================] - 26s 262ms/step - loss: 6.5779 - mean_squared_error: 0.1901 - val_loss: 2.0804 - val_mean_squared_error: 0.4447\n",
+ "Epoch 5/50\n",
+ "100/100 [==============================] - 26s 258ms/step - loss: 4.5104 - mean_squared_error: 0.2610 - val_loss: 0.8934 - val_mean_squared_error: 0.1162\n",
+ "Epoch 6/50\n",
+ "100/100 [==============================] - 27s 266ms/step - loss: 4.1191 - mean_squared_error: 0.1977 - val_loss: 2.6865 - val_mean_squared_error: 0.4805\n",
+ "Epoch 7/50\n",
+ "100/100 [==============================] - 33s 324ms/step - loss: 3.3527 - mean_squared_error: 0.2311 - val_loss: 0.6211 - val_mean_squared_error: 0.0457\n",
+ "Epoch 8/50\n",
+ "100/100 [==============================] - 26s 259ms/step - loss: 4.7516 - mean_squared_error: 0.2012 - val_loss: 4.0370 - val_mean_squared_error: 0.5024\n",
+ "Epoch 9/50\n",
+ "100/100 [==============================] - 29s 284ms/step - loss: 6.0048 - mean_squared_error: 0.2688 - val_loss: 0.6586 - val_mean_squared_error: 0.0594\n",
+ "Epoch 9: early stopping\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7rcvfAibTuI6"
+ },
+ "source": [
+ "## **Model Evaluation**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### *1. Content Based Filtering*"
+ ],
+ "metadata": {
+ "id": "HF4rnIr8oi2n"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Sebelum menjalankan code dibawah, jalankan terlebih dahulu testing Content Based Filtering di cell terakhir"
+ ],
+ "metadata": {
+ "id": "YX2hMvT8pBvx"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Check genre of toy story\n",
+ "genre_target = data[data.film.eq('toy story')]['genre'][0]\n",
+ "genre_target"
+ ],
+ "metadata": {
+ "id": "jL5h-fFReF5w",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ },
+ "outputId": "063ef040-a773-4522-b051-111171e1eae8"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Adventure'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 33
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Get recommendations\n",
+ "num_recommendation = 15\n",
+ "\n",
+ "result = film_recommendations('toy story', num_recommendation)\n",
+ "num_correct = result[result.genre == genre_target].genre.count()\n",
+ "pred_score = (num_correct / num_recommendation)*100\n",
+ "num_all_genre_target = data[data.genre == genre_target].genre.count()\n",
+ "recall_score = (num_correct / num_all_genre_target)*100\n",
+ "\n",
+ "\n",
+ "print(f'Precission of model is : {int(pred_score)} %')\n",
+ "print(f'Recall of model is : {recall_score} ')"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "dni1uUiPmTcd",
+ "outputId": "2d7d780c-4eaa-4569-c908-3c590a31a9ac"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Precission of model is : 100 %\n",
+ "Recall of model is : 1.1727912431587177 \n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "film_recommendations('toy story', num_recommendation)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 520
+ },
+ "id": "zZRyCax7tmUk",
+ "outputId": "0c8b2067-5c3e-408c-93ae-51114b85b64b"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " film genre\n",
+ "0 The Pirates Adventure\n",
+ "1 Toy Story 3 Adventure\n",
+ "2 Descent: Part 2, The Adventure\n",
+ "3 The Black Rose Adventure\n",
+ "4 Young Winston Adventure\n",
+ "5 St Trinian'S 2: The Legend Of Fritton'S Gold Adventure\n",
+ "6 Sky Crawlers, The (Sukai Kurora) Adventure\n",
+ "7 Shrek Forever After (A.K.A. Shrek: The Final C... Adventure\n",
+ "8 Percy Jackson & The Olympians: The Lightning T... Adventure\n",
+ "9 B.N.B. (Bunty Aur Babli) Adventure\n",
+ "10 Agora Adventure\n",
+ "11 When Dinosaurs Ruled The Earth Adventure\n",
+ "12 North Face (Nordwand) Adventure\n",
+ "13 Men Who Tread On The Tiger'S Tail, The (Tora N... Adventure\n",
+ "14 How To Train Your Dragon Adventure"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " The Pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Toy Story 3 | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Descent: Part 2, The | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " The Black Rose | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " Young Winston | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " St Trinian'S 2: The Legend Of Fritton'S Gold | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Sky Crawlers, The (Sukai Kurora) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " Shrek Forever After (A.K.A. Shrek: The Final C... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Percy Jackson & The Olympians: The Lightning T... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " B.N.B. (Bunty Aur Babli) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " Agora | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " When Dinosaurs Ruled The Earth | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " North Face (Nordwand) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " Men Who Tread On The Tiger'S Tail, The (Tora N... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " How To Train Your Dragon | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 68
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### *2. Collaborative Filtering*"
+ ],
+ "metadata": {
+ "id": "tGr1gqR5omsa"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "plt.style.use('dark_background')\n",
+ "\n",
+ "plt.plot(history.history['mean_squared_error'], '#1f77b4')\n",
+ "plt.plot(history.history['val_mean_squared_error'], '#ff7f0e')\n",
+ "plt.title('Mean Squared Error')\n",
+ "plt.ylabel('MSE')\n",
+ "plt.xlabel('epoch')\n",
+ "plt.legend(['train', 'test'], loc='upper left')\n",
+ "plt.show()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 295
+ },
+ "outputId": "d3732c35-9da7-4d07-b595-232cb6aa0912",
+ "id": "OEFtKSGbomsd"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vxHfPGCs8RTV"
+ },
+ "source": [
+ "## **Testing**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "l0lc1ZeODqCa"
+ },
+ "source": [
+ "### *1. Content Based Filtering*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "pjPmobAQ8Ulv"
+ },
+ "outputs": [],
+ "source": [
+ "# Create recommendations function\n",
+ "def film_recommendations(film: str, \n",
+ " n: int=5,\n",
+ " similarity_data: pd.DataFrame=cosine_df, \n",
+ " items: pd.DataFrame=data[['film', 'genre']]):\n",
+ " \"\"\"\n",
+ " Recommends top N-Recommendation of similar film based on genre.\n",
+ " \n",
+ " Parameter:\n",
+ " ---\n",
+ " film : string (str)\n",
+ " A Film that will be a reference for recommendations.\n",
+ " similarity_data : pd.DataFrame (object)\n",
+ " Dataframe similarity, symmetric, resto as a index and \n",
+ " collumns\n",
+ " items : pd.DataFrame (object)\n",
+ " Dataframe of film info\n",
+ " k : integer (int)\n",
+ " Total of return recommendations\n",
+ "\n",
+ " Returns\n",
+ " ---\n",
+ " Returns a dataframe of top N-Recommendation.\n",
+ " \"\"\"\n",
+ "\n",
+ " # Locate similarity restaurant\n",
+ " index = similarity_data.loc[:,film.lower()].to_numpy().argpartition(\n",
+ " range(-1, -n, -1))\n",
+ " \n",
+ " # Sort closest similarity\n",
+ " closest = similarity_data.columns[index[-1:-(n+2):-1]]\n",
+ " \n",
+ " # Drop unused data restaurant\n",
+ " closest = closest.drop(film.lower(), errors='ignore')\n",
+ "\n",
+ " result = pd.DataFrame(closest).merge(items).head(n)\n",
+ " film_fix = result['film'].map(lambda title: title.title())\n",
+ " result['film'] = film_fix\n",
+ " \n",
+ " return result"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 81
+ },
+ "id": "57632FgrBTOh",
+ "outputId": "babff8b0-add9-46b5-824f-d8f9b1dd6423"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " id film genre\n",
+ "0 1 toy story Adventure"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " id | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " toy story | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 36
+ }
+ ],
+ "source": [
+ "# Check genre of toy story\n",
+ "data[data.film.eq('toy story')]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 520
+ },
+ "id": "B_hq1_O7BepY",
+ "outputId": "d7c80c38-6ea1-4737-aaec-bbeba37bd22e"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " film genre\n",
+ "0 The Pirates Adventure\n",
+ "1 Toy Story 3 Adventure\n",
+ "2 Descent: Part 2, The Adventure\n",
+ "3 The Black Rose Adventure\n",
+ "4 Young Winston Adventure\n",
+ "5 St Trinian'S 2: The Legend Of Fritton'S Gold Adventure\n",
+ "6 Sky Crawlers, The (Sukai Kurora) Adventure\n",
+ "7 Shrek Forever After (A.K.A. Shrek: The Final C... Adventure\n",
+ "8 Percy Jackson & The Olympians: The Lightning T... Adventure\n",
+ "9 B.N.B. (Bunty Aur Babli) Adventure\n",
+ "10 Agora Adventure\n",
+ "11 When Dinosaurs Ruled The Earth Adventure\n",
+ "12 North Face (Nordwand) Adventure\n",
+ "13 Men Who Tread On The Tiger'S Tail, The (Tora N... Adventure\n",
+ "14 How To Train Your Dragon Adventure"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " film | \n",
+ " genre | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " The Pirates | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Toy Story 3 | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Descent: Part 2, The | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " The Black Rose | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " Young Winston | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " St Trinian'S 2: The Legend Of Fritton'S Gold | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Sky Crawlers, The (Sukai Kurora) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " Shrek Forever After (A.K.A. Shrek: The Final C... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Percy Jackson & The Olympians: The Lightning T... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " B.N.B. (Bunty Aur Babli) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " Agora | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " When Dinosaurs Ruled The Earth | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " North Face (Nordwand) | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " Men Who Tread On The Tiger'S Tail, The (Tora N... | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " How To Train Your Dragon | \n",
+ " Adventure | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "\n",
+ " \n",
+ "
\n",
+ "
\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 37
+ }
+ ],
+ "source": [
+ "# Get recommendations\n",
+ "film_recommendations('toy story', 15)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7pbOlhvDDv-H"
+ },
+ "source": [
+ "### *2. Collaborate Filtering*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "dataset = data.copy()"
+ ],
+ "metadata": {
+ "id": "fLaPor3LZe9l"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "TmdckyZFLpwi"
+ },
+ "outputs": [],
+ "source": [
+ "def highRatedMovie(user: int=user_id, \n",
+ " n: int=5,\n",
+ " data_raw: pd.DataFrame=dataset, \n",
+ " data_rate: pd.DataFrame=ratings):\n",
+ " '''\n",
+ " Show N-High Rated Movie from user\n",
+ "\n",
+ " Parameter\n",
+ " ---\n",
+ " user: int\n",
+ " Id of user. Must member of dataset.\n",
+ " n: int\n",
+ " Total returns of high rate movies. Default 5\n",
+ " data_raw: pd.DataFrame=dataset\n",
+ " Dataset of movies. Must contain movieId and genres. Default\n",
+ " dataset\n",
+ " data_rate: pd.DataFrame\n",
+ " Dataset of rating. Must contain rating, movieId, and userId.\n",
+ " Default ratings\n",
+ "\n",
+ " Return\n",
+ " ---\n",
+ " Prompt of N-High Rated Movie from user\n",
+ " '''\n",
+ " # Prompt Result\n",
+ " print('===' * 13)\n",
+ " print(f'Movie with high ratings from user {user}')\n",
+ " print('===' * 13)\n",
+ "\n",
+ " watched_movie_by_user = data_rate[ratings.userId == user]\n",
+ " top_movie_user = (watched_movie_by_user.sort_values(\n",
+ " by = 'rating',\n",
+ " ascending=False\n",
+ " )\n",
+ " .head(n)\n",
+ " .movieId.values\n",
+ " )\n",
+ " \n",
+ " dataset_rows = data_raw[data_raw['id'].isin(top_movie_user)]\n",
+ " for row in dataset_rows.itertuples():\n",
+ " print(f' •', (row.film).title(), ':', row.genre)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "4n2-0tlMODeI",
+ "outputId": "4baaebcd-8180-4863-9923-6d4e7f95929b"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "=======================================\n",
+ "Movie with high ratings from user 546\n",
+ "=======================================\n",
+ " • Dr. Strangelove Or: How I Learned To Stop Worrying And Love The Bomb : Comedy\n",
+ " • Godfather, The : Crime\n",
+ " • William Shakespeare'S Romeo + Juliet : Drama\n",
+ " • Reservoir Dogs : Crime\n",
+ " • Ice Storm, The : Drama\n",
+ " • American Beauty : Comedy\n",
+ " • Ghost Dog: The Way Of The Samurai : Crime\n",
+ " • City Of God (Cidade De Deus) : Action\n",
+ " • Lost In Translation : Comedy\n",
+ " • Garden State : Comedy\n"
+ ]
+ }
+ ],
+ "source": [
+ "highRatedMovie(546,10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "g5WKFOkcGMJ2"
+ },
+ "outputs": [],
+ "source": [
+ "def movieRecommendation(user: int=user_id, \n",
+ " n: int=5,\n",
+ " data_raw: pd.DataFrame=dataset,\n",
+ " data_rate: pd.DataFrame=ratings,\n",
+ " movieIdx: dict=movie_to_index,\n",
+ " userIdx: dict=user_to_index):\n",
+ " '''\n",
+ " Show N-Recommendation of Movie from user\n",
+ "\n",
+ " Parameter\n",
+ " ---\n",
+ " user: int\n",
+ " Id of user. Must member of dataset.\n",
+ " n: int\n",
+ " Total returns of high rate movies. Default 5\n",
+ " data_raw: pd.DataFrame=dataset\n",
+ " Dataset of movies. Must contain movieId and genres. Default\n",
+ " dataset\n",
+ " data_rate: pd.DataFrame\n",
+ " Dataset of rating. Must contain rating, movieId, and userId.\n",
+ " Default ratings\n",
+ " movieIdx: dict\n",
+ " Movie to movie encode (index) dataframe. Default movie_to_index\n",
+ " userIdx: dict\n",
+ " User to user encode (index) dataframe. Default user_to_index\n",
+ "\n",
+ " Return\n",
+ " ---\n",
+ " Prompt of N-Recommendation of Movie from user\n",
+ " '''\n",
+ " # Filter not visited movie\n",
+ " watched_movie_by_user = data_rate[data_rate.userId == user]\n",
+ " \n",
+ " movie_not_watched = data_raw[~data_raw['id'].isin(watched_movie_by_user.movieId.values)]['id'] \n",
+ " movie_not_watched = list(\n",
+ " set(movie_not_watched)\n",
+ " .intersection(set(movieIdx.keys()))\n",
+ " )\n",
+ " movie_not_watched = [[movieIdx.get(x)] for x in movie_not_watched]\n",
+ "\n",
+ " user_encoder = userIdx.get(user)\n",
+ " user_movie_array = np.hstack(\n",
+ " ([[user_encoder]] * len(movie_not_watched), movie_not_watched)\n",
+ " )\n",
+ " recommendation = model.predict(user_movie_array).flatten()\n",
+ " top_ratings_indices = recommendation.argsort()[-n:][::-1]\n",
+ "\n",
+ " recommended_movie_ids = [\n",
+ " index_to_movie.get(movie_not_watched[x][0]) for x in top_ratings_indices\n",
+ " ]\n",
+ " \n",
+ " # Prompt result\n",
+ " print('====' * 11)\n",
+ " print(f'Top {n} movie recommendation for user {user}')\n",
+ " print('====' * 11)\n",
+ " \n",
+ " recommended_resto = data_raw[data_raw['id'].isin(recommended_movie_ids)]\n",
+ " for row in recommended_resto.itertuples():\n",
+ " print(f' •', (row.film).title(), ':', row.genre)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "-HJBNHpTLsB8",
+ "outputId": "aeb03bea-0550-4fe1-ea53-6cb96fa70014"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "============================================\n",
+ "Top 10 movie recommendation for user 546\n",
+ "============================================\n",
+ " • Vertigo : Drama\n",
+ " • Rear Window : Mystery\n",
+ " • It Happened One Night : Comedy\n",
+ " • Sunset Blvd. (A.K.A. Sunset Boulevard) : Drama\n",
+ " • 12 Angry Men : Drama\n",
+ " • Best Years Of Our Lives, The : Drama\n",
+ " • On The Waterfront : Crime\n",
+ " • 400 Blows, The (Les Quatre Cents Coups) : Crime\n",
+ " • Rashomon (Rashômon) : Crime\n",
+ " • Secret In Their Eyes, The (El Secreto De Sus Ojos) : Crime\n"
+ ]
+ }
+ ],
+ "source": [
+ "movieRecommendation(546, 10)"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "collapsed_sections": [
+ "h_7HniFhz08I",
+ "5_FM0R6A-V88",
+ "V4bj8rD0-PdR",
+ "80_TMAaBl6sm",
+ "znMufTthmIyf",
+ "xXMgxzzPXqsA",
+ "z0RYJwW2R6W2",
+ "VEtZKK2YR8gI",
+ "inLs88GJSKXV",
+ "a-qeCNoQqx8l",
+ "IjXngGc7vJhn",
+ "hwE9MmNyXDuo",
+ "0SLgSmWLfwzp",
+ "QuF4iwhf0RzL",
+ "7rcvfAibTuI6",
+ "tGr1gqR5omsa"
+ ],
+ "provenance": [],
+ "authorship_tag": "ABX9TyO2ZIy/QtQU/4ia5H4B9vy2",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file