Skip to content

Latest commit

 

History

History
95 lines (41 loc) · 6.47 KB

File metadata and controls

95 lines (41 loc) · 6.47 KB

Wrangle and Analyze Data-WeRateDogs

Introduction

The requirement of current project is to wrangle the tweet archive of Twitter user @dog_rates Twitter data, also known as WeRateDogs, to create interesting and trustworthy analyses and visualizations.

WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

alt text

(Image is from Udacity.)

WeRateDogs downloaded their Twitter archive and sent it to Udacity via email exclusively for us to use in this project. This archive contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017.

The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required for "Wow!"-worthy analyses and visualizations. In this project, Tweepy is used to query Twitter's API for additional data beyond the data included in the WeRateDogs Twitter archive.

Table of Contents
Prerequisites 🔍📜
Design 📐
Conclusions 📌
License 🔖

Prerequisites

  • Python 3.6.3
  • Jupyter Notebook
  • Anaconda-Navigator

Design

  • 1 Data wrangling, which consists of:

    • Gathering data
    • Assessing data
    • Cleaning data
  • 2 Storing, analyzing, and visualizing wrangled data

  • Reporting on 1) the data wrangling efforts and 2) the data analyses and visualizations

It is also worth noting:

  • We only need original ratings (no retweets) that have images. Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.

  • Cleaning includes merging individual pieces of data according to the rules of tidy data.

  • The fact that the rating numerators are greater than the denominators does not need to be cleaned. This unique rating system is a big part of the popularity of WeRateDogs.

Conclusions

To sum up, a male dog named Atticus received the highest rating_numerator score which is 1776 as well as highest ratings. A male Chihuahua named Stephan received the highest favorite count (122452) as well as highest retweet count (60735), respectively, surprisingly, the same dog of English_setter breed received the least favorite count (80) and least retweet count (13), notwithstanding, we are still in the dark regarding the rest information. The names associated higher ratings might be depend on its popularity. Male dogs seem more favored on the market. Even pupper takes over 70.1% of all stages, however, highest rating is correlated with doggo. Taking about the prediction odds, the highest accuracy lies in puppo stage.

License

MIT Licence