WeRateDogs Twitter Archive Analysis

The aim of this project is to gather data for WeRateDogs Twitter page through a variety of mediums, in different formats and further work upon wrangling the data and analyzing it.

We worked with following data here:

twitter-archive-enhanced.csv was downloaded manually and was available to be directly read. It contains details like tweet_id, timestamp, source, tweet text, retweet is, retweet user id, retweet timestamp, rating numerator, rating denominator, dog name and dog classfication provided by WeRateDogs.
image_predictions.tsv was downloaded programmatically through url. This is a tab separated file and was read accordingly using pandas. To give a background of this dataset, it was generated by neural network to predict the dog breed. It only contains top 3 image predictions, their corresponding confidence ratings and a boolean column determining if the prediction is a dog or not.
tweet_json.txt Tweets were extracted from twitter based on tweet_id from twitter-archive-enhanced.csv using twitter API and tweepy. We have extracted data from the tweet_id's that are available online. Each tweet's json dump was written into the file in a newline. In the next part, we read records from tweet_json.txt one by one and extracted tweet_id, retweet_count, favorite_count to be written into a pandas dataframe.

Main parameters that we have taken into account for this analysis are individual tweets, the rating they received and the number of retweets/favorites that they got. Retweets and Favorite counts shows how popular and liked a particular tweet was among the followers.

What's the most common rating?

Through the data we have gathered for WeRateDogs twitter Archive, we have come up with some quite interesting findings.

87.08 % of the tweets that are posted by WeRateDogs, get a rating in range (0.8 – 1.4]. Or we can also say that if WeRateDogs is posting a tweet, the probability that it will receive a rating in range (0.8 – 1.4] is 0.8708
71.58 % of the tweets that are posted by WeRateDogs, get a rating in range (0.8 – 1.2]

What tweets are most liked?

We also have another interesting analysis that shows the average retweets and favorites count received for each rating bin.

Well, quite clearly the bars for the range (1.2 - 1.4] quite shoot up. It goes to show that the dogs/tweets that receive a rating in range (1.2 - 1.4] are liked the most and hence retweeted/favorited the most. Around 15.51 % of tweets receive ratings in this range.
We have next high bars for rating above 1.4. Very small percentage of tweets (0.29 %) receive these ratings. Quite strangely they are liked lesser than (1.2 - 1.4].
One more interesting point to note in this graph is for Dogs falling in range (0 - 0.2]. Again, quite strangely, they receive even higher retweet and favorites than those in range (0.2 - 1].

Predicting dogs in tweets through NN

We also have data available from neural network where we predict the breed of the dog. Here p1 represents the algorithm’s first mostly likely prediction, p2 represents second most likely prediction and p3 represents third most likely prediction.

The neural network predicts the dog in the image, 73.8 % of the times for p1. Also, the probability of correctly identifying dog's breed given they are predicted as dog, is 0.614 as per p1.
The probability of correctly idenfying dog's breed given they are predicted as dog as per p2, is 0.140
The probability of correctly idenfying dog's breed given they are predicted as dog as per p2, is 0.062

Given the above observations, the decline in probability of correctly identifying the dog's breed in p1, p2 and p3 is quite clearly evident.

Disclaimer: This project was developed as part of Udacity's Data Analyst Nano Degree.

Pre-requsite:

pip install pandas

pip install tweepy

pip install matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
act_report.md		act_report.md
counts_bar.png		counts_bar.png
image_pred_master.csv		image_pred_master.csv
image_predictions.tsv		image_predictions.tsv
tweet_json.txt		tweet_json.txt
tweet_ratio.png		tweet_ratio.png
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.md		wrangle_report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeRateDogs Twitter Archive Analysis

What's the most common rating?

What tweets are most liked?

Predicting dogs in tweets through NN

Pre-requsite:

About

Releases

Packages

Languages

melunaich/Wrangle-and-Analyze-Data

Folders and files

Latest commit

History

Repository files navigation

WeRateDogs Twitter Archive Analysis

What's the most common rating?

What tweets are most liked?

Predicting dogs in tweets through NN

Pre-requsite:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages