Project developed for the Cloud Computing course of the Master of Artificial Intelligence and Data Engineering at the University of Pisa.
This project consists in the design and implementation of a Bloom Filter for IMDb datasets using MapReduce (Hadoop and Spark frameworks).
The repository is organized as follows:
- dataset/ contains the IMDb dataset stored in film_ratings.txt
- docs/ contains the report and the assignment
- hadoop/ contains the Hadoop implementation and test
- results/ contains testing results and analysis
- spark/ contains the Spark implementation and test
- Francesca Pezzuti @frax1819
- Francesco Hudema @MrFransis
- Tommaso Baldi @balditommaso
- Edoardo Ruffoli @edoardoruffoli