-
Data Source
First we get business and review data from Yelp (https://www.yelp.com/dataset), then we do some filtering and get the input file input_data.csv and also the 2 test data files for our tasks. -
Ideas
There are 2 tasks here. The first one is finding similar business pairs according their Jaccard Similarities. The second one predicting the rating when we input user and business. In task 1, we used MinHashing and Locality Sensitive Hashing. In task2, we used Spark MLlib, user-based CF and item-based CF. -
How to run
-python
For task 1: spark-submit findSimilarBusiness.py [path of input_data.csv] [path of output file]
For task 2: spark-submit predict.py [path of input_data.csv] [path of test_data_predictions.csv] [caseId] [path of output file], where caseId could be integer from 1 to 4. 1 is for MLlib, 2 is for user-based CF, 3 is for item-based CF and 4 is for item-based CF using MinHashing and LSH.
-scala
First we need go to the directory of of myRecommendation.jar: /scala/out/artifacts/myRecommendation, then we type command in terminal.
For task 1: spark-submit --class findSimilarBusiness myRecommendation.jar [path of input_data.csv] [path of output file]
For task 2: spark-submit --class predict myRecommendation.jar [path of input_data.csv] [path of test_data_predictions.csv] [caseId] [path of output file]. -
Evaluation
For task1, we use evaluation_similar_business.py to output precision and recall, the input for this is the output of task1 and test_data_similarities.csv file. For task2, we use test_data_predictions.csv to output RMSE, the input for this is test_data_predictions.csv and the output of task2.
-
Notifications
You must be signed in to change notification settings - Fork 0
709867472/Recommedation-System-with-Spark
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description or website provided.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published