Jack Carter 18/04/2022
This project uses the number of articles containing certain terms to show how the conversation in The New York Times (NYT) changed following the 2016 election of Donald Trump. Just like the characters in Clint Eastwood’s famous spaghetti western, Trump's rise highlighted good (a heightened national belonging for many (albeit mostly white) Americans), bad (political polarization), and ugly (social discrimination) characteristics of US society.
The data above show only relative changes in the number of articles for each term between 2011 and 2022, not how many times a term appeared overall or the context in which it was used. This means any conclusions we make about good, bad and ugly changes in Trump’s America are only assumptions, not necessarily facts.
The terms were selected on the basis of trial and error in an attempt to find underlying trends in the data during Trump’s presidency. The table below details the number of articles for each term between 2011 and 2022.
Terms (articles in 000s)
Anti-Semitism | Islamophobia | National Identity | Partisan Politics | Partisanship | Patriotism | Political Differences | Political Divide | Political Polarization | Racism | Sexism | Transphobia |
2.72 | 0.59 | 1.28 | 0.8 | 2.34 | 2.65 | 0.44 | 0.5 | 0.56 | 14.08 | 3.15 | 0.15 |
The data were collected using an API call from the New York Times. A repeat try loop is used to ensure the full data are collected even if the connection drops out on a particular call.
# find out how many results are returned for a given year.
get_data <- function(start_dates, end_dates, terms) {
url <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=%22",
# query.
results_counter <- 1L
results <- list()
search <- repeat{try({query <- fromJSON(url, flatten = TRUE)})
# error handling.
if(exists("query")) {
results <- query
} else {
if(results_counter <= 45L) {
message("Re-trying query: attempt ", results_counter, " of 45.")
results_counter <- results_counter +1L
} else {
message("Retry limit reached: initial query unsuccessful.")
The number of articles is converted to each term’s z-score. This allows us to view the term’s relative distribution over time. It is calculated as 1) the number of articles less the term’s mean, 2) divided by the term’s standard deviation.
# gets the z-score for each term.
get_z_score <- function(data) {
mean <- mean(data)
sigma <- sd(data)
z_scores <- list()
for(i in 1:length(data)) {
z_scores[i] <- (data[i] - mean) / sigma
# gets the z-score for a list of terms.
get_z_scores <- function(data) {
groups <- data %>%
z_scores <- list()
for(i in 1:length(groups)) {
z_scores[[i]] <- list(get_z_score(groups[[i]]$hits))
The data for each term is plotted with the use of a loess regression line (geom_smooth in the code below). This transforms the data into a smooth curve for a better visualization of overall trends.
# creates a plot with smoothed loess regression lines.
make_plot <- function(category, title) {
#plot <- sorted_df %>%
#filter(term %in% str_to_title(category)) %>%
#col=term)) +
span = 0.5,
size = 0.5)
#ggtitle(title) +
#ylab("Articles (z-scores)") +
#xlab("") +
Boyer (2019) https://www.esquire.com/news-politics/a26454551/donald-trump-interview-new-york-times-media-objectivity/
New York Times (2021) https://developer.nytimes.com/apis
Rutenberg (2016) https://www.nytimes.com/2016/08/08/business/balance-fairness-and-a-proudly-provocative-presidential-candidate.html
Statology (2021) https://www.statology.org/interpret-z-scores/