Data Intensive Computing - Assignment 2

In this lab assignment done togheter with Haris Poljo, we practiced stream processing and graph processing using Apache Spark, Apache Kafka, and Apache Cassandra. Moreover, we practiced Apache Spark GraphX within two jupyter notebook.

Assignment 2 - Part 1

We implemented a Spark Streaming application which calculate the average value of (key, value) pairs and continously update it, while new pairs arrive. We read data from Apache Kafka and store the results in Cassandra continuosuly. The results are in the form of (key, average value) pairs.

Requirements: Kafka 2.6.0, Cassandra 3.11.2, Python 2.7, Spark 2.4.3.
Run the code & implementation explanation: Information can be found in LAB 2, PART 1.pdf

Assignment 2 - Part 2

graphx_songs.ipynb: Use GraphX to cluster music songs according to the tags attached to each songs.
graphx_social_network.ipynb: Use a GraphX to analyse a property graph.

Collaborators

Haris Poljo.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
generator		generator
KafkaSpark.scala		KafkaSpark.scala
Lab 2 - Part 1.pdf		Lab 2 - Part 1.pdf
README.md		README.md
build.sbt		build.sbt
graphx_social_network.ipynb		graphx_social_network.ipynb
graphx_songs.ipynb		graphx_songs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Intensive Computing - Assignment 2

Assignment 2 - Part 1

Assignment 2 - Part 2

Collaborators

About

Releases

Packages

Languages

leonardoremondini/data-intensive-computing-assignment-2

Folders and files

Latest commit

History

Repository files navigation

Data Intensive Computing - Assignment 2

Assignment 2 - Part 1

Assignment 2 - Part 2

Collaborators

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages