-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
305 lines (239 loc) · 9.57 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
---
title: "Looking at tweets by @year_progress 2019-2020"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
stub: "year-progress-tweets-viz"
output:
github_document:
html_preview: true
toc: false
fig_width: 9
fig_height: 5
md_extensions: +emoji+bracketed_spans+inline_notes
keywords: [lubridate, ggplot, twitter, api, dataviz]
always_allow_html: true
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, setup, include = FALSE}
ragg_png = function(..., res = 192) {
ragg::agg_png(..., res = res, units = "in")
}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
dev = "ragg_png",
# dev.args = list(ragg_png = list(res = 192)),
fig.ext = "png",
fig.showtext = TRUE,
fig.path = "man/figures/README-",
out.width = "100%"
)
library(dplyr)
library(extrafont)
library(ggplot2)
library(ggthemr)
library(lubridate)
library(purrr)
library(stringr)
library(tweetrmd)
library(yearprogress)
```
<!-- badges: start -->
<!-- badges: end -->
:::{#flow}
I was randomly interested in whether there might be patterns in the amount of interaction with tweets from the **@year_progress** bot during 2020 :scream:,
and what the shape of the year might look like compared to 2019.
This gave me a chance to review the current landscape of `R` clients for the twitter API :disappointed:.
After doing this review, I decided it would be easier and more satisfying for me to script my own access to [v2 of the API][tw-api].
[tw-api]: https://developer.twitter.com/en/docs/twitter-api/early-access
This meant more practice with `httr`, error handling, working with lists with the frankly brilliant [`purrr` toolkit][purrr], and with writing functions.
Happy as a :pig: in :poop:, _tbqhwy_.
It's also chance for me to try various other things I haven't really used before:
* Creating a GitHub README as an `.Rmd` file (this very file here),
instead of my usual `.md`, so I might incorporate some output charts
* Using a couple of [Pandoc's Markdown extensions][pdmd] ^[1][1]^
* GitHub Actions (set up using the `usethis` helper function)
* Making _a whole actual package_ :package: just for a little bit of fun like
this, as opposed to as some bigger project
* Learning a bit more about testing with `testthat`
* Yet more practice at package architecture and requirements,
and appeasing R CMD check
* Doing some hopefully nice [dataviz]{.sparkle} :bar_chart: with `ggplot2`, which I still don't feel like I have really used _well_ yet.
My charts usually look way too crappy for my liking.
[1]: # "We've got md_extensions: +emoji+bracketed_spans+inline_notes going on in the yaml here"
[purrr]: https://purrr.tidyverse.org/
[pdmd]: https://pandoc.org/MANUAL.html#pandocs-markdown
If you suspect that this whole idea was essentially driven by pre-Christmas/end-of-term task avoidance, angst and procrastination,
you would be correct.
```{r, get-dtf, eval = FALSE}
# Set variables and run overall query -------------------------------------
# Obtained originally by:
# user_id <- get_user_id(username = "year_progress")
# It isn't going to change, so once obtained just hardcode it:
year_progress_id <- "3233484298"
# This doesn't work with the Twitter API
# https://github.com/tidyverse/lubridate/issues/941
# start_2019 <- lubridate::make_datetime(2019) %>%
# lubridate::format_ISO8601(usetz = TRUE)
# so instead:
start_2019 <- "2019-01-01T00:00:02Z"
# see `R/get_user_twetes.R` for how this works
dtf <- get_user_twetes(user_id = year_progress_id, start = start_2019) %>%
purrr::compact() %>%
unpack_list() # <- in `R/helpers.R`
```
```{r, load-dtf, include = FALSE}
# myrmidon::save_it(dtf)
dtf <- readRDS(here::here("rds_data", "dtf.Rds"))
```
I realised the other day that it would be fine to call a dataframe `dtf` instead of boring old `df` :smirk:...
```{r, dtf-tweet, echo = FALSE, cache = TRUE}
tweetrmd::include_tweet("https://twitter.com/ludictech/status/1338815874770329602?s=20")
```
```{r, glimpse-dtf}
glimpse(dtf)
```
Let's lick this dataframe into shape :icecream: so it's ready for visualising.
```{r, prepare-dtf}
dtf2 <- dtf %>%
# filter out any "normal" tweets!
filter(str_detect(text, "^(\u2591|\u2593)+")) %>%
mutate(pcnt = as.numeric(str_extract(text, "[0-9]+"))) %>%
mutate(date_tweeted = case_when(
# 100% tweets get tweeted just _after_ midnight on 1st Jan: need correcting
# Be better here to check *if* date == January 1 but OK for now
pcnt == 100 ~ as_date(created_at) - period(1, "days"),
TRUE ~ as_date(created_at))) %>%
mutate(year = year(date_tweeted)) %>%
mutate(yday = yday(date_tweeted)) %>%
mutate(wkdy = weekdays(as_date(date_tweeted))) %>%
mutate(across(c(year, wkdy), ~ as.factor(.)))
```
```{r, ggthemr-set, echo = FALSE}
ggthemr::ggthemr(palette = "solarized", layout = "clean", type = "outer")
# ggthemr::ggthemr_reset()
theme_update(
text = element_text(family = "IBM Plex Sans", colour = "#100810")
)
```
Let's look at how retweets vary through the year:
```{r, plot-retweets, warning = FALSE, message = FALSE, cache = TRUE}
dtf2 %>%
filter(year == 2019) %>%
slice_max(n = 4, order_by = retweet_count) %>%
mutate(label = str_glue("{format(date_tweeted, '%b %d')}: {pcnt}%")) %>%
left_join(dtf2, .) %>%
ggplot(aes(yday, retweet_count)) +
geom_line(aes(colour = year), size = 0.6, alpha = 1,) +
geom_point(aes(colour = year), size = 1) +
geom_label(
aes(label = label),
vjust = "outward",
hjust = "inward",
nudge_y = 0.1,
fill = "aquamarine3",
colour = "white",
size = 3,
fontface = "bold") +
scale_colour_brewer("Year", palette = "Set2") +
scale_y_log10() +
labs(
x = "Day of the year",
y = "Number of retweets (log scale)",
title = "RTs of @year_progress tweets, 2019 vs. 2020"
)
```
Certain tweets are vastly more popular than the rest, with four tweets standing out as peaks - even when a log scale is employed on the y axis.
The next tranche of popularity is reserved for the multiples of 10%, apart from 50% which is already one of the top four retweeted.
There doesn't seem to be a particularly different trend affecting the popularity of tweets or the pattern of retweets as 2020 has progressed, relative to 2019. 2020 numbers are generally a little higher, which is most likely due to the account having acquired more followers over time.
Let's see if the numbers of likes for each tweet show a similar pattern
(we'd generally expect them to):
```{r, plot-likes, warning = FALSE, message = FALSE}
dtf3 <- dtf2 %>%
filter(year == 2019) %>%
slice_max(n = 4, order_by = like_count) %>%
mutate(label_2019 = str_glue("{format(date_tweeted, '%b %d')}: {pcnt}%")) %>%
left_join(dtf2, .)
dtf3 %>%
filter(year == 2020) %>%
slice_max(n = 4, order_by = like_count) %>%
mutate(label_2020 = str_glue("{format(date_tweeted, '%b %d')}: {pcnt}%")) %>%
left_join(dtf3, .) %>%
mutate(like_count = like_count/1000) %>%
ggplot(aes(yday, like_count)) +
geom_line(aes(colour = year), size = 0.6, alpha = 1) +
geom_point(aes(colour = year), size = 1) +
geom_label(
aes(label = label_2019),
vjust = "outward",
hjust = "inward",
nudge_y = 0.1,
fill = "aquamarine3",
colour = "white",
size = 3,
fontface = "bold") +
geom_label(
aes(label = label_2020),
vjust = "outward",
hjust = "inward",
nudge_y = 0.1,
fill = "chocolate2",
colour = "white",
size = 3,
fontface = "bold") +
scale_colour_brewer("Year", palette = "Set2") +
scale_y_log10() +
labs(
x = "Day of the year",
y = "Number of likes ('000s) (log scale)",
title = "\"Likes\" of @year_progress tweets, 2019 vs. 2020"
)
```
And quote tweets:
```{r, plot-quotes, warning = FALSE, message = FALSE}
dtf3 <- dtf2 %>%
filter(year == 2019) %>%
slice_max(n = 4, order_by = quote_count) %>%
mutate(label_2019 = str_glue("{format(date_tweeted, '%b %d')}: {pcnt}%")) %>%
left_join(dtf2, .)
dtf3 %>%
filter(year == 2020) %>%
slice_max(n = 4, order_by = quote_count) %>%
mutate(label_2020 = str_glue("{format(date_tweeted, '%b %d')}: {pcnt}%")) %>%
left_join(dtf3, .) %>%
# mutate(quote_count = quote_count/1000) %>%
ggplot(aes(yday, quote_count)) +
geom_line(aes(colour = year), size = 0.6, alpha = 1) +
geom_point(aes(colour = year), size = 1) +
geom_label(
aes(label = label_2019),
vjust = "outward",
hjust = "inward",
nudge_y = 0.1,
fill = "aquamarine3",
colour = "white",
size = 3,
fontface = "bold") +
geom_label(
aes(label = label_2020),
vjust = "outward",
hjust = "inward",
nudge_y = 0.1,
fill = "chocolate2",
colour = "white",
size = 3,
fontface = "bold") +
scale_colour_brewer("Year", palette = "Set2") +
scale_y_log10() +
labs(
x = "Day of the year",
y = "Number of quote-tweets (log scale)",
title = "Quotes of @year_progress tweets, 2019 vs. 2020"
)
```
Now, am I imagining it, or is there a much higher proportional difference between 2019 and 2020 when it comes to the QTs, than was visible with the RTs and the Likes?
From about day 80? There's some quite big gaps in there between the two lines, particularly if you look at the period between day 80 and day 200.
2020 never drops down below 100 QTs for any tweet like 2019 did.
The gap is a bit less noticeable in the 70-percents, around day 270-280, when 2019's figures start to pick up as the end of the year approaches.
Again, some of this may just be a factor of more people following the account, and RTs and QTs generate more follows, and so it goes round.
I'll be interested to see what happens to 2020's final few data points in the closing days of December.
:::