APAN 5400 final project
The idea of this project is to create an information retrieval system which visualizes and analyzes the cryptocurrency data.
Here is our system architecture: The project consists of four parts:
In this project, we build a TCP socket using python and receive the tweets from the Twitter Streaming API. Spark Streaming process real-time tweets to get the trending hashtags about crpytocurrency. Finally, the output is printed in jupyterLab.
The ETL pipeline feteches real-time crypto data from Coin API and then stores it in mongodb.
Also, historical crypto data and other APi data are stored in mongodb.
A open source analytical dashboard for analyzing and visualizing crypto data. Metabase Official Website.
Some high level information like name, creator, description about crypto are stored in PostgreSQL.
Since our project consists of different components, we use docker to run multiple containers as the same time.
In docker-compose.yml, we configure our ETL pipeline ,Pyspark notebook image, Mongodb image, PostgreSQL image, and Metabase dashboard image.
To run this project, type the following commands in the terminal:
docker compose build
docker compose up -d # run in background