-
Notifications
You must be signed in to change notification settings - Fork 0
/
assignment-2.Rmd
257 lines (193 loc) · 11.8 KB
/
assignment-2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
title: "Assignment 2 - Functions, iteration and debugging"
author: "Aditya Narayan Rai + adityanarayan-rai"
date: "`r format(Sys.time(), '%B %d, %Y | %H:%M:%S | %Z')`"
output:
html_document:
code_folding: show
df_print: paged
highlight: tango
number_sections: no
theme: cosmo
toc: no
---
<style>
div.answer {background-color:#f3f0ff; border-radius: 5px; padding: 20px;}
</style>
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
eval = TRUE,
error = FALSE,
message = FALSE,
warning = FALSE,
comment = NA)
```
<!-- Do not forget to input your Github username in the YAML configuration up there -->
***
```{r, include = T}
# LOAD THE PACKAGES YOU ARE USING IN THIS CODE CHUNK library(nameofpackage)
library(tidyverse)
library(dplyr)
library(lubridate)
library(unvotes)
```
<br>
***
### Task 1 - Fun with functions
a) Program a function `ultimate_answer()` that always returns the number 42 regardless of which input is provided, and show that it works providing three function calls that test different input types!
```{r}
# Let's create 'ultimate_answer()' using the function()
ultimate_answer <- function(input) {
return(42)
}
#Now, let's create 3 function calls to check the output of 'ultimate_answer()
test1 <- ultimate_answer(1709)
test2 <- ultimate_answer("Simon")
test3 <- ultimate_answer(FALSE)
# Finally, let's print the results of 3 function calls
test1
test2
test3
```
b) Write an R function called `color_guessing_game()` that allows the player to guess one of four colors (red, yellow, green, blue). The game will randomly select one of the colors and prompt the player to guess it. If the player's guess matches the selected color, they win, otherwise, they lose. *Hint:* You will need the function `readline()` to enable interactive communication between the game and the player.
```{r}
# First, let's create a vector named colors
colors <- c("red", "yellow", "green", "blue")
# Now, let's write down the suggested function
color_guessing_game <- function() {
# Now, let's select one of the colors randomly
select_color <- sample(colors, 1)
# Now, let's prompt the player to guess the color
cat("Pick one of the color (red, yellow, green, or blue): ")
guess <- tolower(readline(prompt = ""))
# Now, let's check if the guess is correct
if (guess == select_color) {
cat("Yay Yay Yay! You guessed the color correctly. The color is", select_color, "\n")
} else {
cat("Sorry, but you guessed it wrong. The correct color is", select_color, "\n")
}
}
#Finally let's check if the game works
color_guessing_game()
```
<br>
***
### Task 2 - Functional programming in practice
The `unvotes` package (hosted both on [CRAN](https://cran.r-project.org/web/packages/unvotes/index.html) and [GitHub](https://github.com/dgrtwo/unvotes)) provides data on the voting history of countries in the [United Nations General Assembly](http://www.un.org/en/ga/), along with information such as date, description, and topics for each vote. The package contains three datasets: `un_votes`, providing the history of each country's votes, `un_roll_calls`, providing information on each roll call vote, and `un_roll_call_issues`, providing issue (topic) classifications of roll call votes. Check out the [database tutorial](https://github.com/dgrtwo/unvotes) if you want to make yourself familiar with the data structure. Then, work on the following tasks.
a) Calculate how often, on average, Germany agreed with the US on a vote in the UN general assembly since 1990. Votes where voting information for one of the countries is missing should be discarded.
```{r}
# First, let's check the data structures
# str(un_votes)
# un_votes|> distinct(rcid, country) |> n_distinct()
# str(un_roll_calls)
# un_roll_calls |> distinct(rcid) |> n_distinct()
# First, let's join the two datasets (un_votes & un_roll_calls) by "rcid" and select the needed columns
un_votes_roll_calls <- inner_join(un_votes, un_roll_calls, by = "rcid") |>
select(rcid, date, country, country_code, vote)
# Now, let's create new column named "year" containing the year component extracted from the "date" column using lubridate and dplyr together
un_votes_roll_calls <- un_votes_roll_calls %>%
mutate(year = lubridate::year(date))
# Now, let's create a a new data frame for germany that contains only the rows where the country is "Germany," the year is 1990 or later, and the vote is not "abstain", i.e., the votes are 'yes' and 'no'.
germany_votes <- un_votes_roll_calls |>
filter(str_detect(country, "Germany") & year >= 1990 & vote != "abstain")
# Now, similarly for USA, let's create a a new data frame for germany that contains only the rows where the country is "Germany," the year is 1990 or later, and the vote is not "abstain", i.e., the votes are 'yes' and 'no'.
usa_votes <- un_votes_roll_calls |>
filter(str_detect(country, "United States") & year >= 1990 & vote != "abstain")
# Note: I am not taking 'abstain' into consideration because in my opinion it represent the decision by a country to refrain from taking a clear position on a particular issue
# Now let's combine the two new data frames using inner_join
combined_votes <- inner_join(germany_votes, usa_votes, by = "rcid", suffix = c(".x", ".y")) |>
mutate(agree = vote.x == vote.y)
average_agreement <- combined_votes |>
summarize(average_agreement = sum(agree) / n()) #Note: I learnt about the suffix component using Google search
# Now let's print the result. (Note: I was using cat function but it doesn't seem to work, any feedback on this Prof. I found this using Google search)
print(paste0("On average, Germany agreed with the US on a vote in the UN general assembly since 1990 at an agreement rate of " ,
round(average_agreement * 100, digits = 3),
"%"))
```
<br>
b) Now, create a function, `votes_agreement_calculator()`, that takes country identifiers as well as a `year_min` argument as inputs and that returns the share of agreement in voting between any two specified countries as numeric value, for a time period specified with year >= `year_min`. The function should take the necessary data frames directly from the `unvotes` package. Then, test the function by computing the agreement rates for (a) the United States and Russia for votes cast in 2000 and later and (b) France and the UK for votes cast in 2000 and later!
```{r}
# Let's create the function votes_agreement_calculator()
votes_agreement_calculator <- function(un_votes, un_roll_calls, country_code1, country_code2, year_min) {
# First, let's join the two datasets (un_votes & un_roll_calls) by "rcid" and select the needed columns
un_votes_roll_calls <- inner_join(un_votes, un_roll_calls, by = "rcid") |>
select(rcid, date, country, country_code, vote)
# Now, let's create new column named "year" containing the year component extracted from the "date" column using lubridate and dplyr together
un_votes_roll_calls <- un_votes_roll_calls %>%
mutate(year = lubridate::year(date))
# Now, let's create new data frame for country1 and country2 that contains only the rows where the country is "country_code1" & "country_code2" the year is year_min or later, and the vote is not "abstain", i.e., the votes are 'yes' and 'no'.
# country1
country1_votes <- un_votes_roll_calls |>
filter(country_code == country_code1 & year >= year_min & vote != "abstain")
#country2
country2_votes <- un_votes_roll_calls |>
filter(country_code == country_code2 & year >= year_min & vote != "abstain")
# Note: I am not taking 'abstain' into consideration because in my opinion it represent the decision by a country to refrain from taking a clear position on a particular issue
# Finally, let's combine the two new data frames using inner_join and calculate average agreement
average_agreement <- inner_join(country1_votes, country2_votes, by = "rcid", suffix = c(".x", ".y")) |>
mutate(agreement = vote.x == vote.y) |>
summarize(average_agreement = sum(agreement) / n())
return(average_agreement[[1]])
}
# Now, as mentioned let's test the function for (a) United States and Russia in 2000 and later
us_russia_agreement <- votes_agreement_calculator(un_votes, un_roll_calls, "US", "RU", 2000)
cat("The average share of agreement between the United States and Russia in votes cast in 2000 and later is:", us_russia_agreement, "\n")
# And, as mentioned let's test the function for (b) France and UK in 2000 and later
france_uk_agreement <- votes_agreement_calculator(un_votes, un_roll_calls, "FR", "GB", 2000)
cat("The average share of agreement between the France and UK in votes cast in 2000 and later is:", france_uk_agreement, "\n")
```
<br>
c) Using `purrr` functionality, find out which three countries on average agreed with the US the most from the year 2000 on!
```{r, eval = TRUE}
# First let's get the list of country_code from the un_votes dataset using count, filter and pull
codes <- un_votes |>
count(country_code) |>
filter(country_code != "US") |>
pull(country_code)
#(2) Now let's run the function votes_agreement_calculator(...) on all valid country codes vs. US
agree_with_US <- codes |>
map_dbl(~(votes_agreement_calculator(un_votes, un_roll_calls, .x, "US", 2000)))
# Now let's create a tibble that combines country codes which agreed with US
order_country_codes <- tibble(country_code = codes, agree_with_US = agree_with_US) |>
arrange(desc(agree_with_US)) |>
pull(country_code)
# Finally let's print the result
print(paste0("The top three countries which agreed with the US the most from the year 2000 onwards are, ", order_country_codes[1], ", ", order_country_codes[2], ", and ", order_country_codes[3]))
# In this question, I was a bit confused so I took help of google and one of my friend to understand the context and use. Please let me know how can be done in a better way.
```
<br>
***
### Task 3 - Debugging code
The following code snippet contains various bugs. Flag them and suggest a fix by adding a comment in the respective line. Example:
```{r, eval = FALSE}
library(Tidyverse) # BUG: typo in library(Tidyverse); should be library(tidyverse) instead
```
```{r, eval = FALSE}
# load packages
library(tidyverse)
library(countrycode)
library(Unvotes) # BUG: typo in library(Unvotes); should be library(unvotes) instead
# get continents data
continents <- countrycode:codelist %>% #BUG: only a single ':'; should be ::
select(continent, iso2c) %>%
rename(country_code == iso2c) #BUG: use of equal to sign; should be only one = sign for variable renaming
un_votes <- left_join(x = un_votes, y = continents, by = country_code) #BUG: country_code should be in " " as per the convention
# get data on European UN votes
eu_un_votes <- left_join(un_votes, un_roll_calls, by = "rcid") %>% #BUG: we should also join by "continent" otherwise the filter function won't work
left_join(., un_roll_call_issues, by = "rcid") %>%
filter(continent == "Europe",
date >= "1991-01-01" & date <= "2021-12-31") %>%
drop_na(short_name)
# encode Europe's voting behavior numerically
eu_un_votes <- eu_un_votes %>%
mutate(yes_vt = if_else(vote == "yes", 1, 0)), #BUG: an extra right parentheses; remove it
no_vt = if_else(vote == "no", 1, 0),
abstention = if_else(vote == "abstain", 1, 0))
# list top 10 most disagreed with UN resolutions
hated_un <- eu_un_votes %>%
group_by(unres) #BUG: missing pipe operator; add %>%
summarise(sh_disagree_with = sum(no_vt,na.rm = T) / sum(c(yes_vt,no_vt, abstention), na.rm = T)) %>%
arrange(desc(sh_disagree_with)) %>%
head(10)
hated_un
```