https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset
Yelp_academic_dataset_business.json
Yelp_academic_dataset_checkin.json
Yelp_academic_dataset_review.json
Yelp_academic_dataset_tip.json
Yelp_academic_dataset_user.json
- Persistence
- Lazy evaluation
- Fault tolerance
- Data Partitioning
- Parallelism
- Transparency
Implementation of a Distributed System to execute Spark Using Multiple Computers (1 master and 2 workers)
Spark
Databricks
Azure Blob
Tableau
Setting up standalone clusters
Hadoop environment setup
Team Work