A link to the project guestbook can be found here
This project looks at Twitter data from the internet archive from 2011 and 2019 respectively. It compares and contrasts tweeting habits from both points in time such as content, lexical complexity, tweet length, and tweet sentiment.
- How is popular content of either era percieved or talked about?
- Have tweets become a medium for displaying more complex sentiment than they used to be?
- How does this complexity relate to overall sentiment?
The main data used in this project came from a sample of the 2011 and 2019 internet archive JSON files, where the top 1% of tweets were scraped from October of 2011 and September of 2019. There was also a classifier built for the sentiment analysis portion of the analysis which used the open source data from Sentiment140, a pre-existing algorithm for sentiment analysis. The data used in the project can be found here, the classifier data can be found on the Sentiment140 website linked above.
- Important documents
- Folders
- Data sample
- 2011 dataset
- 2019 dataset
- Notebooks
- build_classifier: build the classifier for sentiment analysis.
- data_parsing: strips the tweet data to necessary columns
- data_analysis and classifyanalysis: different stages of the analysis process
- finalnb: the final jupyter notebook
- Images:
- Contains images of all plots
- Data sample