Skip to content

Latest commit

 

History

History
161 lines (124 loc) · 5.79 KB

readme.md

File metadata and controls

161 lines (124 loc) · 5.79 KB

Superbowl commercials

The data this week comes from FiveThirtyEight. They have a corresponding article on the topic. Note that the original source was superbowl-ads.com. You can watch all the ads via the FiveThirtyEight article above.

Like millions of viewers who tune into the big game year after year, we at FiveThirtyEight LOVE Super Bowl commercials. We love them so much, in fact, that we wanted to know everything about them … by analyzing and categorizing them, of course. We dug into the defining characteristics of a Super Bowl ad, then grouped commercials based on which criteria they shared — and let me tell you, we found some really weird clusters of commercials.

We watched 233 ads from the 10 brands that aired the most spots in all 21 Super Bowls this century, according to superbowl-ads.com.1 While we watched, we evaluated ads using seven specific criteria, marking every spot as a "yes" or "no" for each:

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2021-03-02')
tuesdata <- tidytuesdayR::tt_load(2021, week = 10)

youtube <- tuesdata$youtube

# Or read in the data manually

youtube <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-03-02/youtube.csv')

Data Dictionary

youtube.csv

variable class description
year double Superbowl year
brand character Brand for commercial
superbowl_ads_dot_com_url character Superbowl ad URL
youtube_url character Youtube URL
funny logical Contains humor
show_product_quickly logical Shows product quickly
patriotic logical Patriotic
celebrity logical Contains celebrity
danger logical Contains danger
animals logical Contains animals
use_sex logical Uses sexuality
id character Youtube ID
kind character Youtube Kind
etag character Youtube etag
view_count integer Youtube view count
like_count integer Youtube like count
dislike_count integer Youtube dislike count
favorite_count integer Youtube favorite count
comment_count integer Youtube comment count
published_at character Youtube when published
title character Youtube title
description character Youtube description
thumbnail character Youtube thumbnail
channel_title character Youtube channel name
category_id character Youtube content category id

Cleaning Script

Note this is optional, and NOT required. I downloaded the youtube data via httr from the youtube API and an API key.

library(tidyverse)
library(tuber)
library(rvest)
library(httr)

raw_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv")

all_ids <-raw_data$youtube_url %>% 
  str_remove_all("https://www.youtube.com/watch") %>% 
  str_remove("\\?v=") %>% 
  str_subset("NA", negate = TRUE)

all_ids

api_key = "SUPER_SECRET_API"

get_youtube_data <- function(ids_in, query_type = "statistics"){
  
  url_in <- modify_url("https://www.googleapis.com/youtube/v3/videos", 
                     query = list(
                       "part" = query_type,
                       "id" = paste(ids_in, collapse=","),
                       "key" = api_key)
  )
  out_content <- content(GET(url_in), as = "parsed", type = "application/json")
  
  if(query_type == "statistics"){
    out_content$items %>% 
      enframe() %>% 
      unnest_wider(value) %>% 
      unnest_wider(statistics) %>% 
      janitor::clean_names()
    
  }
}

get_youtube_details <- function(ids_in){
  
  url_in <- modify_url("https://www.googleapis.com/youtube/v3/videos", 
                       query = list(
                         "part" = "snippet",
                         "id" = paste(ids_in, collapse=","),
                         "key" = api_key)
  )
  out_content <- content(GET(url_in), as = "parsed", type = "application/json")
  
  out_content$items %>% 
    enframe() %>% 
    unnest_wider(value) %>% 
    unnest_wider(snippet) %>% 
    janitor::clean_names()
}

all_vid_stats <- list(
  all_ids[1:50],
  all_ids[51:100],
  all_ids[101:150],
  all_ids[151:200],
  all_ids[200:length(all_ids)]
) %>% 
  map_dfr(get_youtube_data)

all_vid_details <- list(
    all_ids[1:50],
    all_ids[51:100],
    all_ids[101:150],
    all_ids[151:200],
    all_ids[200:length(all_ids)]
    ) %>% 
  map_dfr(get_youtube_details)

clean_details <- all_vid_details %>% 
  hoist(thumbnails, 
        thumbnail = list("standard", "url")) %>% 
  select(id, published_at, title:category_id, -tags, -thumbnails)

combo_vid <- all_vid_stats %>% 
  left_join(clean_details) %>% 
  select(-name)

all_youtube <- raw_data %>% 
  mutate(id = str_remove(youtube_url, "https://www.youtube.com/watch") %>% 
           str_remove("\\?v=")) %>% 
  left_join(combo_vid) %>% 
  mutate(across(view_count:comment_count, as.integer)) 

write_csv(all_youtube, "2021/2021-03-02/youtube.csv")

all_youtube %>% 
  glimpse()