Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 1.54 KB

README.md

File metadata and controls

54 lines (33 loc) · 1.54 KB

BigData Project For Global Terrorism Database

Made by : Ines Achour / Safa Laabidi / Amal Sammari

In this project we made a pipeline to process the Global Terrorism Database from Kaggle.

The pipeline includes batch and stream processing that's why it's based on the Lambda Architecture.

Architecure

Architecture

1. Data Ingestion

  • Kafka

2. Data Processing

  • Streaming : Spark Streaming
  • Batch : Hadoop MapReduce

3. Data Storage

  • Streaming : MongoDB
  • Batch : HDFS (data before processing) & MongoDB (data after processing)

4. Data Visualization

  • Dashboarding : MongoDB Charts

Other

Project Folders

1. No Kafka And No MongoDB

  • GlobalTerrorism_Stream
  • GlobalTerrorism_Batch

2. Kafka Without MongoDB

  • GlobalTerrorism_Kafka_Stream

3. Kafka And MongoDB

  • GlobalTerrorism_Kafka_Batch : append the sent data from Kafka to the database csv file
  • GlobalTerrorism_Batch_MongoDB : launch the batch process on the csv database and save the result in MongoDB database
  • GlobalTerrorism_Kafka_MongoDB : receive streaming data, process them and save result in MongoDB database

Visualization

We used MongoDB Charts for visualization.

Dashboard