The purpose of the project is to carry out a sentiment analysis on the comments posted by Instagram users in order to evaluate which famous people are more or less loved by the Internet.
The components of the pipeline are listed below:
-
Instap Producer: retrieves data from Instagram using the Instaloader package and sends it to Logstash
-
Logstash: receives the data from the producer and writes on Kafka's Instap topic.
-
Kafka: message broker, connects logstash to the Spark processing component.
-
Spark: received data from Kafka and perform machine learning prediction
-
Elasticsearch: Indexing incoming data.
-
Kibana: UI dedicated to Data Visualization.
More technical details in the specific folder, more details on the actual usage in this project in doc.
- Docker (Desktop on Windows)
- Docker Compose
- Instagram Account credentials
- Clone the project repository:
git clone https://github.com/rosarioamantia/insTAP
-
Move to producer folder and edit the producer.env file with your Instagram user credentials, users, number of posts and comments you want to see.
-
Download spark-3.1.2-bin-hadoop2.7 in spark/setup folder.
-
In the root repository (called insTAP) run all the docker containers:
docker-compose up
- Now, the producer will generate data.
- Go to:
localhost:5601
and import visualizations located in kibana/export.ndjson
to Left Hambuger menu > Management > Stack Management > Saved Objects > Import
.