Skip to content

Big data pipeline based on the lambda architecture to process the Global Terrorism Database from Kaggle.

Notifications You must be signed in to change notification settings

safa-abidi/BigData_Project

 
 

Repository files navigation

BigData Project For Global Terrorism Database

Made by : Ines Achour / Safa Laabidi / Amal Sammari

In this project we made a pipeline to process the Global Terrorism Database from Kaggle.

The pipeline includes batch and stream processing that's why it's based on the Lambda Architecture.

Architecure

Architecture

1. Data Ingestion

  • Kafka

2. Data Processing

  • Streaming : Spark Streaming
  • Batch : Hadoop MapReduce

3. Data Storage

  • Streaming : MongoDB
  • Batch : HDFS (data before processing) & MongoDB (data after processing)

4. Data Visualization

  • Dashboarding : MongoDB Charts

Other

Project Folders

1. No Kafka And No MongoDB

  • GlobalTerrorism_Stream
  • GlobalTerrorism_Batch

2. Kafka Without MongoDB

  • GlobalTerrorism_Kafka_Stream

3. Kafka And MongoDB

  • GlobalTerrorism_Kafka_Batch : append the sent data from Kafka to the database csv file
  • GlobalTerrorism_Batch_MongoDB : launch the batch process on the csv database and save the result in MongoDB database
  • GlobalTerrorism_Kafka_MongoDB : receive streaming data, process them and save result in MongoDB database

Visualization

We used MongoDB Charts for visualization.

Dashboard

About

Big data pipeline based on the lambda architecture to process the Global Terrorism Database from Kaggle.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%