forked from TiagoVentura/workshop_big_data_conference
-
Notifications
You must be signed in to change notification settings - Fork 0
/
notebook_twitter.qmd
664 lines (463 loc) · 22.9 KB
/
notebook_twitter.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
---
title: "Workshop Analyzing Social Media Data I : Twitter Data"
author: "Tiago Ventura"
date: "10/21/2020"
toc: true
format:
html:
theme: cosmo
html-math-method: katex
code-tools: true
self-contained: true
execute:
warning: false
message: false
error: false
cache: true
editor_options:
chunk_output_type: console
---
## Introduction
Social media data can come on many flavors and from many different sources. For this reason, it is not possible to cover in-depth in a one-hour workshop many different types of social media data. In addition, social media companies provide different access to researchers to their data.
There is not a "one way to rule them all" when it comes to working with social media data.
For this reason, I decided start this workshop on the most used, and the most easily accessible social media data for researchers: Twitter data. However, even though we will spend most time working with Twitter data, several of the techniques I hope to cover here are no restricted to Twitter data. These are general techniques and can be applied pretty much to any type of social media data.
With Twitter data, we will cover:
- Analyzing and understanding network structure of social media data.
- Some extra endpoints from the Twitter API (timelines, friends list, among others).
To save some time, I stored all the data I am using in this tutorial [here](https://www.dropbox.com/s/ifevgl21kjotdc9/workshop_data_compress.zip?dl=0). This notebook should run if you place all the data in the same working directory.
## Getting Access to the Twitter APIs.
In order to get access to Twitter data, you need to first [apply for a Twitter developer account](https://developer.twitter.com/en/apply-for-access). Once your developer application has been approved, you get access to the standard product track by default. However, if you are an academic researcher and meet certain requirements, you can [apply to the academic research product track](https://developer.twitter.com/en/portal/petition/academic/is-it-right-for-you) which will give you elevated access to the Twitter API v2 including access to historical public Tweets for free.
We will be using the academic research access to the Twitter V2 API.
### Standard Access
- Search for Tweets from the last 7 days by specifying queries using supported operators (more on building queries in later sections)
- Stream Tweets in real-time as they are happening by specifying rules to filter for Tweets that you are interested in.
- Get Tweets from a user’s timeline (up to 3200 most recent Tweets)
- Build the full Tweet objects from a Tweet ID, or a set of Tweet IDs
- Look up follower relationships
These are just some examples of what you can get from the standard product track, relevant to academics.
Currently, you can get upto 500,000 Tweets per month using the standard product track and this limit does not apply to the sampled stream endpoint, which gives a 1% sample of public Tweets in real-time
## Academic Research product track
This track includes:
- Ability to get historical Tweets from the entire archive of public conversation on Twitter, dating back to 2006 (using the full-archive search endpoint)
- Higher monthly Tweet volume cap of 10 million Tweets per month
- More advanced filter options to return relevant data, including a longer query length, support for more concurrent rules (for filtered stream endpoint), and additional operators that are only supported in this product track (more on this later)
For a complete list of available endpoints in the V2 API, check out the [Twitter API documentation](https://developer.twitter.com/en/docs/twitter-api).
For a complete course on accessing the Twitter V2 API, I suggest you take a look the [101 course](https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research) prepared by Twitter API team.
## Twitter Data: Presidential Elections in Brazil.
To collect Twitter data from the V2 API, we will use the [`academictwitteR`](https://github.com/cjbarrie/academictwitteR) package developed by Chris Barrie. For R users, this is an amazing package because it allows you to easily query the API, and process the data in easily readable format for R.
If you prefer Python, I suggest you to check the library [Twarc](https://twarc-project.readthedocs.io/en/latest/).
### Access tweets from the archive
Let's start collecting some data using a textual query through the search endpoint. The academic access to the V2 API allows you to query the Twitter archive, and get access to data way back on time.
Let's start collecting some data about the recent presidential elections in Brazil.
```{r}
# Call packages using pacman
#install.packages("pacman")
pacman::p_load(here, jsonlite, tidyverse, academictwitteR)
```
```{r, eval=FALSE}
# Using Academic Twitter to add yourkey
set_bearer()
# Collect data
tweets <-
get_all_tweets(
query = "(eleicoes2022 OR lula OR bolsonaro OR ciro OR tebet)", # query
start_tweets = "2022-10-01T00:00:00Z", #start time
end_tweets = "2022-10-04T00:00:00Z", #end time
file = "br_elections", # file to save
data_path = "data_br/", # folder where all data as jsons will be stores
n = 200000, # number of tweets
lang = "pt",
)
```
This data is stored as a series of smaller jsons. The `academictwitter` has a specific function to easily combine these json files in a single file in the tidy format.
If you are collecting this through other packages or accessing the API directly, you would get long json files as responses. Jsons are basically a set of nested lists, and can be tricky to clean. So the `bind_tweets` function can be very handy
Another option, which is very common if you have a consistent data pipeline, is to build your own cleaning function, getting the data and variables in the format your project needs.
```{r eval=FALSE}
# data processing
tweets_tidy <- bind_tweets("./data_br", output_format = "tidy")
glimpse(tweets_tidy)
```
```{r echo=FALSE}
# data processing
tweets_tidy <- read_rds("tweets_tidy.rds")
glimpse(tweets_tidy)
```
A lot of data. But only a portion of what comes through the API. So you can actually process the whole data here. This is a really nice feature of this package. The json data is stored in smaller pieces, which makes it easier for you to process later.
```{r eval=FALSE}
# examing the data
tweets_raw <- bind_tweets("./data_br", output_format = "raw")
str(tweets_raw, max.level=1)
```
```{r echo=FALSE}
# data processing
tweets_raw <- read_rds("tweets_raw_tidy.rds")
str(tweets_raw, max.level=1)
```
### Network Analysis with Twitter Data
There are many different ways you can analyze Twitter data. You can analyze the text, the images, the geolocations, the links shared, among many other things.
A favorite way for computational social scientists to analyze social media data is to look at the user connections using some sort of network models. This is not limited to Twitter data. Pretty much any social media application comprises some sort of network structure.
So let's start with some basics of network analysis in R.
A network has two core elements: nodes and edges. On Twitter this means:
- Nodes are Twitter users
- Edges are any sort of connections these users make. A reply, a friendship, or the most common, a retweet.
We will be using retweets and quote tweets to give some examples of network analysis in R. We will use the `igraph` package to analyze network data in R. This package has many different features, but it stores data in a very distinct way. So let's work through it.
### Bulding a Network
### Step 1: Filter Nodes
```{r}
# Filter retweets
tweets_tidy_rt <- tweets_tidy %>%
filter(!is.na(sourcetweet_type))
dim(tweets_tidy_rt)
# Visualize the dta
tweets_tidy_rt %>%
select(user_username, sourcetweet_author_id) %>%
head()
```
### Step 2: Create a edge list
```{r}
# Create a edge list
# using the user id on both sides here to keep the same unit
data <- cbind(tweets_tidy_rt$author_id, tweets_tidy_rt$sourcetweet_author_id)
dim(data)
head(data)
```
Notice here we have two different types of users. Hubs are the users who retweet, and authorities are the user who receive a retweets.
### Step 3: Create your network structure
```{r}
pacman::p_load(igraph)
# Create an empty network
net <- graph.empty()
# Add nodes
net <- add.vertices(net,
length(unique(c(data))), # number of nodes
name=as.character(unique(c(data)))) # unique names
# Add edges
net <- add.edges(net, t(data))
# summary
summary(net)
```
Your output:
- Igraph object
- 79886 unique nodes
- 154583 edges
### Step four: Add information to your network object
Information comes in two flavors. Information at the edge level (`E(object)`) and at the Node leve (`V(object)`). Let's see how it works.
```{r}
library(urltools)
# Edges
E(net)$text <- tweets_tidy_rt$text
E(net)$idauth <- tweets_tidy_rt$sourcetweet_author_id
E(net)$namehub <- tweets_tidy_rt$user_username
# Capturing hashtags
E(net)$hash <- str_extract_all(tweets_tidy_rt$text, "#\\S+")
# grab expanded and unwound_url
entities <- tweets_raw$tweet.entities.urls
tidy_entities <- entities %>%
# get columns we need
select(tweet_id, unwound_url) %>%
#extract domains
mutate(unwound_url=domain(unwound_url)) %>%
# remove nas and combine multiple links
filter(!is.na(unwound_url)) %>%
group_by(tweet_id) %>%
summarise(domain=paste0(unwound_url, collapse=" -- "))
# Merge back with id
tweets_tidy_rt <- left_join(tweets_tidy_rt, tidy_entities)
# add to the network
E(net)$domain <- tweets_tidy_rt$domain
```
## Network Statistics, Communities and Layout
Two very common concepts in network science are in-degree and out-degree. In-degree refers to how many links pointing to themselves the user has. So in our case it shows how many retweets this user has received. The opposite explains the out-degree. In this case, out-degree means how many retweets the user has given.
A user is called an authority when their in-degree is high. That is, this user received many retweets from others. We call it a hub when its out-degree is high, as this user retweets very often.
Accounst who are considered bots usually have a huge difference between in-degree and out-degree -- nobody retweets them, they just retweet a lot, and usually really quick.
```{r}
# Calculate in degree and out degree
V(net)$outdegree<-degree(net, mode="out")
V(net)$indegree<-degree(net, mode="in")
summary(net)
```
### Layout
Networks are always in the latente space. To visualize them, we usually reccur algorithms that maximize some dynamics of networks and give us layouts for visualization. You can try out different algorithm, but the Fruchterman-Reingold is a popular choice
```{r eval=FALSE}
l <- layout_with_fr(net, grid = c("nogrid"))
#saveRDS(l, "layout.rds")
head(l)
```
```{r echo=FALSE}
l <- read_rds("layout.rds")
head(l)
```
### Communities
Community detection is a big part of network analysis. The idea of these techniques is to find clusters of connections across your entire network space. Community detection is a huge subfield of network science. Here is a nice review piece by [Porter et al](https://www.ams.org/notices/200909/rtx090901082p.pdf).
My take here is that for large networks, like we usually deal with in social media, most of the algorithms will do the job you need, which in general is identify the core communities in a network. An important point is to always validate these communities, and use some qualitative analysis to verify the results.
We will use an random walk algorithm for community detection
```{r eval=FALSE}
my.com.fast <- walktrap.community(net)
str(my.com.fast, max.level = 1)
```
```{r echo=FALSE}
my.com.fast <- read_rds("community_detection.rds")
str(my.com.fast, max.level = 1)
```
# Add the layout and membership to your igraph object.
```{r}
V(net)$l1 <- l[,1]
V(net)$l2 <- l[,2]
V(net)$membership <- my.com.fast$membership
```
## What are the largest communities?
```{r}
comunidades<- data_frame(membership=V(net)$membership)
comunidades %>%
count(membership) %>%
arrange(desc(n)) %>%
top_n(5)
```
## Who are the main authorities in each community?
```{r autoridade}
# Create an datafram for the authoritiew
authorities <- data_frame(name=V(net)$name, ind=V(net)$indegree,
membership=V(net)$membership) %>%
filter(membership==11| membership==4|membership==8) %>%
group_by(membership) %>%
arrange(desc(ind)) %>%
slice(1:10)
```
The V2 API does not return the name of the author of the original tweet. We can collect this information using the function `get_user_profile()`
```{r}
# I will get only from the 100 most retweeted to save some time.
users_most_retweets <-authorities %>%
mutate(data_user=map(name, get_user_profile)) %>%
unnest()
# Main Communities
ggplot(users_most_retweets %>% filter(membership=="4"), aes(x=reorder(username,
ind,
fill=membership),
y=ind)) +
geom_histogram(stat="identity", width=.5, color="black") +
coord_flip() +
xlab("") + ylab("") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(size = 22, face = "bold"),
axis.title=element_text(size=16),
axis.text = element_text(size=12, face="bold")) +
facet_grid(~membership)
# Main Communities
ggplot(users_most_retweets %>% filter(membership=="11"), aes(x=reorder(username,
ind,
fill=membership),
y=ind)) +
geom_histogram(stat="identity", width=.5, color="black") +
coord_flip() +
xlab("") + ylab("") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(size = 22, face = "bold"),
axis.title=element_text(size=16),
axis.text = element_text(size=12, face="bold")) +
facet_grid(~membership)
# Main Communities
ggplot(users_most_retweets %>% filter(membership=="8"), aes(x=reorder(username,
ind,
fill=membership),
y=ind)) +
geom_histogram(stat="identity", width=.5, color="black") +
coord_flip() +
xlab("") + ylab("") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(size = 22, face = "bold"),
axis.title=element_text(size=16),
axis.text = element_text(size=12, face="bold")) +
facet_grid(~membership)
```
## Visualizing communities
```{r}
# A function with the density. Nice to visualize as well.
my.den.plot <- function(l=l,new.color=new.color, ind=ind, legend, color){
library(KernSmooth)
est <- bkde2D(l, bandwidth=c(10, 10))
plot(l,cex=log(ind+1)/4, col=new.color, pch=16, xlim=c(-160,140),ylim=c(-140,160), xlab="", ylab="", axes=FALSE)
legend("topright", c(legend[1],legend[2], legend[3]), pch = 17:19, col=c(color[1], color[2], color[3]))
contour(est$x1, est$x2, est$fhat, col = gray(.6), add=TRUE)
#text(-140,115,paste("ENCG: ",ENCG,sep=""), cex=1, srt=0)
}
# Add colors for each community
# Building a empty container
temp <- rep(1,length(V(net)$membership))
new.color <- "white"
new.color[V(net)$membership==11] <- "Yellow" ####
new.color[V(net)$membership==8] <- "pink" ####
new.color[V(net)$membership==4] <- "red" ####
# Save as a variable in the network object
V(net)$new.color <- new.color
# Plot
my.den.plot(l=cbind(V(net)$l1,V(net)$l2),new.color=V(net)$new.color, ind=V(net)$indegre, legend =c("Pro-Bolsonaro", "Anti-Bolsonaro I", "Anti-Bolsonaro II"),
color =c("Yellow", "red", "pink"))
```
## Hashtags by communities
### Most Popular Hashtags
```{r}
hashtags <- tweets_tidy_rt %>%
mutate(hashtags=str_extract_all(tweets_tidy_rt$text, "#\\S+")) %>%
unnest(hashtags) %>%
count(hashtags) %>%
arrange(desc(n)) %>%
slice(1:30) %>%
drop_na()
# Contando as hashtags
ggplot(hashtags, aes(x=reorder(hashtags,
n),
y=n)) +
geom_histogram(stat="identity", width=.5, color="black",
fill="steelblue") +
coord_flip() +
xlab("") + ylab("") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(size = 22, face = "bold"),
axis.title=element_text(size=16),
axis.text = element_text(size=12, face="bold"))
```
### Heterogenous activation of Hashtags
```{r}
# get communities
auth <- data_frame(name=V(net)$name, membership=V(net)$membership)
# merge
data_hash <- tweets_tidy_rt %>%
left_join(auth, by=c("author_id"="name")) %>%
mutate(hashtags=str_extract_all(tweets_tidy_rt$text, "#\\S+")) %>%
unnest(hashtags) %>%
drop_na(hashtags)
# Vamos aggrupar por comunidade
hash_by_comm <- data_hash %>%
filter(membership==11| membership==8|membership==4) %>%
count(membership, hashtags) %>%
top_n(20, n)
ggplot(hash_by_comm, aes(x=reorder(hashtags,
n),
y=n, fill=as.factor(membership))) +
geom_histogram(stat="identity", width=.5, color="black") +
coord_flip() +
xlab("") + ylab("") +
scale_fill_manual(name="Communities", values=c("Red", "pink", "Yellow")) +
theme_minimal(base_size = 10) +
facet_wrap(~membership, scale="free")
```
## Sharing news on Twitter
Let me present you with one final analysis of Twitter data. It comes from a paper I worked together with Ernesto Calvo and Natalia Aruguete, [News Sharing, Gatekeeping, and Polarization: A Study of the #Bolsonaro Election](https://www.tandfonline.com/doi/full/10.1080/21670811.2020.1852094). The paper proposes an model to estimate behavioral parameters for news sharing with social media data. But, before we get to the model, we show descriptive how news organizations are activated differently by each community on Twitter. This is a interesting way to visualize the formation of echo chambers on Twitter coming straight from activation of content, instead of sorting.
A lot of this code was developed by Ernesto Calvo. Thanks, Ernesto!
#### Counting news domains at the node level
```{r}
summary(net)
# Vamos primeiro selecionar as mídias mais ativadas
keynews <- head(sort(table(unlist(E(net)$domain)),decreasing=TRUE),20)
keynews.names <- names(keynews)
N<-length(keynews.names)
count.keynews<- array(0,dim=c(length(E(net)),N))
str(count.keynews)
# Looping
for(i in 1:N){
temp<- grepl(keynews.names[i], E(net)$domain, ignore.case = TRUE)
#temp <- str_match(E(net)$text,'Arangur[A-Za-z]+[A-Za-z0-9_]+')
count.keynews[temp==TRUE,i]<-1
Sys.sleep(0.1)
}
# Setting the names of the media
colnames(count.keynews)<- keynews.names
count.keynews[1:5, 1:3]
```
#### Recovering the neighhood matrix for nodes
```{r, eval=FALSE}
# Recovering all nodes connections
el <- get.adjedgelist(net, mode="all")
al <- get.adjlist(net, mode="all")
```
```{r, echo=FALSE}
# Recovering all nodes connections
el <- read_rds("el.rds")
al <- read_rds("al.rds")
```
#### Function to estimate the propagation of content at edge level
```{r}
#Function to detect the spread in the adj list
fomfE<- function(var=var, adjV=adjV,adjE=adjE){
stemp <- map_dbl(adjE, function(x) sum(var[x]))
#mstemp <- sapply(adjV, function(x) mean(stemp[x]))
out<-cbind(stemp)
}
# container
result_hash<- array(0,dim=c(length(V(net)),N))
# Repita para cada uma das hashtags
for(i in 1:N){
bb<-fomfE(count.keynews[,i],al,el)
bb[bb[,1]=="NaN"]<-0
result_hash[,i]<- bb[,1]
}
colnames(result_hash)<- keynews.names
```
#### Visualization
```{r}
par(mfrow=c(2,2))
# Hashtag
plot(V(net)$l1,V(net)$l2,pch=16,
col=V(net)$new.color,
cex=log(result_hash[,1]+1),
xlim=c(-100,150),ylim=c(-100,150), xlab="", ylab="",
main=colnames(result_hash)[1], cex.main=1)
# Hashtag
plot(V(net)$l1,V(net)$l2,pch=16,
col=V(net)$new.color,
cex=log(result_hash[,7]+1),
xlim=c(-100,150),ylim=c(-100,150), xlab="", ylab="",
main=colnames(result_hash)[7], cex.main=1)
# Hashtag
plot(V(net)$l1,V(net)$l2,pch=16,
col=V(net)$new.color,
cex=log(result_hash[,9]+1),
xlim=c(-100,150),ylim=c(-100,150), xlab="", ylab="",
main=colnames(result_hash)[9], cex.main=1)
# Hashtag
plot(V(net)$l1,V(net)$l2,pch=16,
col=V(net)$new.color,
cex=log(result_hash[,10]+1),
xlim=c(-100,150),ylim=c(-100,150), xlab="", ylab="",
main=colnames(result_hash)[10], cex.main=1)
```
```{r, echo=FALSE}
dev.off()
```
#### Extra notes
With the matrix of users and links, we can do many things. For example, in this paper [News by Popular Demand](https://journals.sagepub.com/doi/abs/10.1177/19401612211057068) which I also work together with Ernesto Calvo and Natalia Aruguete, we show how to estimate the importance of ideology, reputation, and attention on sharing behavior on Twitter.
In this working paper by [Eady et al](https://osf.io/ch8gj/), the same matrix is used to estimate ideal points for users and media organization from Twitter. The authors use a sophisticated Bayesian count model to estimate the ideal points. But, as you can see from this paper by [Pablo Barbera](http://pablobarbera.com/static/PSS-supplementary-materials.pdf), one can get to very similar results using a simple singular value decomposition on a matrix of normalized residuals.
## Other APIs endpoint
Most of our work with the Twitter API happens with the capacity to query the API with search terms. For this reason, the search (and filter for live data collection) endpoints are the most popular.
However, there are a few other endpoints from the Twitter API that can also be very useful for research puporses. Let's walk through them briefly.
## Getting user id
Imagine a research in which you have the Twitter accounts of elites, and you want to collect their Twitter data. The first step is to collect their ids.
```{r}
# getting some Twitter Ids
pelosi <- get_user_id("SpeakerPelosi")
```
## Getting whom a user follows
```{r}
pelosi_network <- get_user_following(pelosi)
glimpse(pelosi_network)
```
## Estimate user ideology
```{r}
#devtools::install_github("pablobarbera/twitter_ideology/pkg/tweetscores")
library(tweetscores)
results <- estimateIdeology("SpeakerPelosi", pelosi_network$id, verbose = FALSE)
plot(results)
```
## User timeline
```{r}
pelosi_tl = get_user_timeline(pelosi,
start_tweets = "2022-01-01T00:00:00Z",
end_tweets = "2022-10-22T00:00:00Z",
n=100) #limit
glimpse(pelosi_tl)
```
## Tweets liked by an user
```{r}
pelosi_likes = get_liked_tweets(pelosi) #limit
glimpse(pelosi_likes) # she mostly liked her own tweets
```