This tool runs supervised sentiment analysis in Spark using the streaming of Twitter. Tweets are filtered by a word or hashtag and are classified in real-time. Positive or negative sentiments are trained with algortithms contained in MlLib. Kakfa and Zookeeper are used to conect to the Twitter stream. Tweets and sentiments are stored in no-Sql MongoDB and can be visualized in real-time. All scripts can run in Amanzon Web Services for Big Data challenges. Before using any of the scripts the models must be trained using the traning script contained in the notebook twitter-spark-model-training.ipynb
Arcila, C., Vicente, M., Ortega, F. & Álvarez, M. (2017). Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication Research. [Technical Report]. Proof of Concept funded by the University of Salamanca Foundation.