Develop Machine Learning models to decipher covid tweets and predict whether they are positive, negative or neutral tweets.
Please click on link to interact with app deployed on Heroku.
People use social media not only to share information, but to share their feelings. Over the last year, Coronavirus and the resulting quarantine has greatly affected our lives, and social media platforms, primarily Twitter, which are overflowing with posts about this topic. While positivity and negativity are blatantly obvious in some tweets, other times we can struggle to decipher sentiments in a loaded tweet.
We used machine learning tools to execute sentiment analysis on COVID-19 related tweets starting with data extraction from Kaggle, uploaded into Jupyter notebook for clean up and sorting (filling NaN values with “Unknown” for those that were missing location data; drop unnecessary columns such as user names and screen names, checked that we had unique labels to ensure our datasets were clean.
We then selected Scikit-learn as our primary machine learning library because of its simple and effective nature for which we started with text processing using CountVectorizer, to tokenizing the text, to then transform the data using TFIDF Transformer.
We employed the following classification models for the analysis and prediction:
- Logistic regression
- Linear SVC
- Naive Bayes
- Random Forest Classifier
We selected our second model LinearSVC as it yielded the highest accuracy. We then saved CountVectorizer and TFID Transformer objects into a pickle file. Also, saved a LinearSVC model with best parameters from the GridSearchCV.
We used Flask to render the result of submitted tweets. This will enable us to make real time predictions. We also extracted more data to test our model. Due to the large size of datasets, we decided to use about 0.05% of the new data which still had about 360000 rows. This data together with train and test data were loaded in a postgresql database.
Once our model is finalized, we were then able to deploy it in the web browser and hosted on Heroku. A screenshot of the app is seen below: