The main objective of this analysis is to determine the most important factors that affect a wine’s overall rating. The factors and ratings were obtained from a collection of wine reviews by sommeliers at Wine Enthusiast, which was then analyzed to identify which factors had the highest importance and which factors had the lowest influence. Furthermore, this paper identifies popular wine topics and keywords, in addition to exploring the sentiment in the sommeliers’ reviews to determine if those two factors also have an affect on a wine’s rating. Lastly, a kNN-based recommender system is created to suggest similar options based on one given wine.
The purpose of this analysis is to help wine retailers strategically identify the most popular wines based on this study, which will presumably increase sales. Retailers can use the kNN recommender system to suggest new wines to customers based on other wines they prefer in order to make the best suggestions and increase both customer loyalty and customer lifetime value. As a result, this may lead to a decrease in turnover inventory as products will sell faster, allowing retailers to regularly purchase new wines and diversify their product selection.
- Data Cleaning
- Data Analysis
- Topic Modeling
- Logistic Regression
- kNN Recommender System
Python. Python is an interpreted, high-level and general-purpose programming language.
Integrated Development Environment (IDE). Any IDE that can be used to view, edit, and run Python code, such as:
Install the following packages in Python prior to running the code.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re
import string
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from wordcloud import WordCloud, STOPWORDS
from textblob import TextBlob
import nltk
nltk.download('punkt')
nltk.download('stopwords')
import spacy
import gensim
from gensim import corpora
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
%matplotlib inline
If using Google Colab, import drive.mount('/content/drive') and follow instructions in the output to authorize access to Google Drive in order to obtain directories.
Download the Python File Wine_Enthusiast.ipynb and open it in the IDE. Download and import the dataset wine.csv.
Download and unzip the wine.csv file.
Change the file path to the directory where the data file is located.
Joshua Rotuna - GitHub
This project is licensed under the MIT License.
The Dataset used was provided by Wine Enthusiast.