The reading sequence is:
- Yelp_Dataset-Data_Preprocessing
- Yelp_Dataset-EDA
- Yelp_Dataset-SetimentAnalysis
- Yelp_Dataset-Clustering
- Yelp_Dataset-Restaurant_Recommender
Summary of data analysis
Part one: Yelp_Dataset-Data_Preprocessing
■ Introduced data structure and content.
■ Filtered data based on city, time and category for futher analysis.
■ Saved processed data.
Part two: Yelp_Dataset-EDA
I explored three questions.
■ What are the top 50 resturants with most reviews in Las Vegas in 2017?
■ Does more reviews mean better the quality?
■ What is the popular restaurant style in Las Vegas in 2017?
Part three: Yelp_Dataset-SetimentAnalysis
■ Transferred unstructured review text data into feature vectors using NLP technologies like lemmatization and TF-IDF.
■ Performed sentiment analysis to predict users' rating score based on reviews with Random Forest.
■ Discovered that users can't give accurate explanation related to their rating scores.
Part four: Yelp_Dataset-Clustering
■ Identified the common users' review words within each group through clustering method K-Means.
■ Suggested using three classes rating method to replace current five-stars rating method.
Part five: Yelp_Dataset-Restaurant_Recommender
■ Constructed a restaurant recommender system using collaborative filtering and matrix factorization based on clients' past visits and ratings.