This dashboard has a Machin learning pipeline in background, which analyses a data set of massages and their categories to train and based on this training it can predict the categories for a new massage and display it in the dashboard
Stack overflow trend requires:
- Python (>= 3.6)
- the following libraries are needed to run the code:
- numpy
- pandas
- matplotlib
- seaborn
- nltk
- re
- sklearn (>= 0.20)
- plotly
- json
- flask
being able to reliably distinguish the disasters in the emergency and its category is vital to able to properly help the people on time. To be able to do this, is simple but efficient dashboard designed, which is able to be fed with the massage and it gives the categories.
This project consists of three main parts:
-
ETL Pipeline:
This pipeline performs the Extract, Transform, and Load process to prepare the learning data as clean input for the Machine learning process. This data will be saved in a SQLite database. This pipeline can be found in process_data.py. To perform this task, there are two .csv data needed. The disaster_categories.csv and disaster_messages which can be replaced with the .csv data having the same format and including similar infomation.
-
ML Pipeline:
Using the NLTK method and multi-output classification, this pipeline uses the massages to predict their categories. This pipeline can be found in train_classifier.py.
-
Flask Web app:
The flask web app provides the web-based user interface which is connected with the database and pipelines and generate the visualisations. The master.html includes the HTML format of the webpage and go.html highlights the gategories regarding the searched massege.
to run this project, the follosing commands should be executed in the project root directory (please be sure to update the sklearn):
1- Runing the ETL pipeline:
python data/process_data.py [messages_filepath] [categories_filepath] [database_filepath]
[messages_filepath] is the path to the .csv data which contains the "disaster_messages"
[categories_filepath] is the path to the .csv data which contains the "disaster_categories"
[database_filepath] is the path to the .db data which contains the which is going to stor the data
2- Runing the ML pipeline:
python models/train_classifier.py [database_filepath] models/classifier.pkl
[database_filepath] is again the same path to the .db data which contains the which is going to store the data
3- Starting the web app
python app/run.py [database_filepath]
[database_filepath] is again the same path to the .db data which contains the which is going to store the data
4- The connection link:
env|grep WORK
(in a new Terminal)
the connection link consists of
https://SPACEID-3001.SPACEDOMAIN
The findings and the results of the code is available here