Francisco Yira Albornoz October 12th 2018
- 20.3 Important types of atomic vector
- 20.4 Using atomic vectors
- 20.5 Recursive vectors (lists)
- 20.7 Augmented vectors
- Describe the difference between
is.finite(x)
and!is.infinite(x)
!is.infinite(x)
will return TRUE
when input is NA
.
- Read the source code for
dplyr::near()
(Hint: to see the source code, drop the()
). How does it work?
It checks that the difference between each element in vectors x
and
y
is less that a certain tolerance threshold (tol
).
- A logical vector can take 3 possible values. How many possible values can an integer vector take? How many possible values can a double take? Use google to do some research.
An integer can take around 4 * 10^9
(+/- 2*10^9
).
A double can take values between from about 2e-308
to 2e+308
. The
exact boundaries are given by the hardware in which R is running on and
can be seen in .Machine$double.xmin
and .Machine$double.xmax
.
- Brainstorm at least four functions that allow you to convert a double to an integer. How do they differ? Be precise.
Using as.integer
directly
as.integer(x)
Using integer division, then as.integer
to_integer_1 <- function(x) {
integer_division <- x %/% 1
as.integer(integer_division)
}
Using round
, then as.integer
to_integer_2 <- function(x) {
as.integer(round(x))
}
Using ceiling
, then as.integer
to_integer_3 <- function(x) {
as.integer(ceiling(x))
}
Showing their differences
x <- c(-Inf, NaN, 0, 1, 1.2, 1.5, 1.7, NA)
as.integer(x)
## Warning: NAs introduced by coercion to integer range
## [1] NA NA 0 1 1 1 1 NA
to_integer_1(x)
## Warning in to_integer_1(x): NAs introduced by coercion to integer range
## [1] NA NA 0 1 1 1 1 NA
to_integer_2(x)
## Warning in to_integer_2(x): NAs introduced by coercion to integer range
## [1] NA NA 0 1 1 2 2 NA
to_integer_3(x)
## Warning in to_integer_3(x): NAs introduced by coercion to integer range
## [1] NA NA 0 1 2 2 2 NA
The round division function produces the same results as just using
as.integer
(both functions truncate the decimal part), but it doesn’t
show a warning when the argument contains Inf
or -Inf
.
The round function approximates to the next bigger integer when the decimal part is more than 0.5, and the ceiling function approximates to the next integer when the decimal part is more than 0.
- What functions from the readr package allow you to turn a string into logical, integer, and double vector?
readr::parse_logical
readr::parse_integer
readr::parse_double
- What does
mean(is.na(x))
tell you about a vectorx
? What aboutsum(!is.finite(x))
?
mean(is.na(x))
shows the missing values ratio in a vector.
sum(!is.finite(x))
tell us how many non finite values (such as Inf
,
-Inf
, NaN
and NA
) there is in a vector.
- Carefully read the documentation of
is.vector()
. What does it actually test for? Why doesis.atomic()
not agree with the definition of atomic vectors above?
is.vector()
tests the absence of atributes other than name in the
vector. So, it can return TRUE
for lists, and FALSE
for vectors
which have other attributes (such as factors).
is.vector(list(a = "a", b = 1))
## [1] TRUE
is.atomic()
think that NULL
is an atomic vector type, and this does
not match the definition of atomic vector given in the book.
- Compare and contrast
setNames()
withpurrr::set_names()
.
purrr::set_names()
has a different default arguments. While in
setNames()
the user can provide just a vector of names and obtain a
vector with those names replicated also as values, purrr::set_names()
always requiere a vector or object to name/rename, and allows omiting
the names vector (in this case the object values are used as names).
Also, purrr::set_names()
is more flexible, since it allows to supply
the names in separate elements through ...
(instead of a single
vector), or even to use a formula that modifies the original names of
the object.
-
Create functions that take a vector as input and returns:
-
The last value. Should you use
[
or[[
?
x <- 1:10
y <- list(1:3, 4:6)
last_element <- function(x) {
x[[length(x)]]
}
last_element(x)
## [1] 10
last_element(y)
## [1] 4 5 6
We should use [[
in order to return the last value for recursive
vectors (instead of a list containing the last value).
- The elements at even numbered positions.
even_positions <- function(x) {
positions <- seq_along(x)
even_pos <- (positions %% 2 == 0)
x[even_pos]
}
- Every element except the last value.
minus_last <- function(x) {
positions <- seq_along(x)
not_last <- positions != length(x)
x[not_last]
}
- Only even numbers (and no missing values).
even_numbers <- function(x) {
is_even <- x %% 2 == 0
not_na <- !is.na(x)
x[is_even & not_na]
}
- Why is
x[-which(x > 0)]
not the same asx[x <= 0]
?
x[x <= 0]
will convert NaN
values in NA
, since we “don’t know” if
NaN
is less or equal than zero. However, x[-which(x > 0)]
will
return the original NaN
values. Why? Because in the expression we’re
just excluding all the values where x > 0
evaluates to TRUE
(or
viceversa, keeping “as is” the values where x > 0
evaluates to
anything else). Since NaN > 0
doesn’t evaluates to TRUE
, the value
is returned without being converted to NA
.
- What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?
It return a missing value (NA
).
x <- 1:10
x[20]
## [1] NA
x["hola"]
## [1] NA
-
Draw the following lists as nested sets:
list(a, b, list(c, d), list(e, f))
list(list(list(list(list(list(a))))))
- What happens if you subset a tibble as if you’re subsetting a list? What are the key differences between a list and a tibble?
Subsetting should work the same way for tibbles and lists most of the time, since tibbles are just augmented lists. See below:
mtcars <- tibble::as_tibble(mtcars)
mtcars[1:2] #should return a tibble (list) with two columns (atomic vectors)
## # A tibble: 32 x 2
## mpg cyl
## <dbl> <dbl>
## 1 21 6
## 2 21 6
## 3 22.8 4
## 4 21.4 6
## 5 18.7 8
## 6 18.1 6
## 7 14.3 8
## 8 24.4 4
## 9 22.8 4
## 10 19.2 6
## # ... with 22 more rows
mtcars[3:4][1] #should return a tibble with the first column from a tibble with the 3rd and 4th columns of mtcars
## # A tibble: 32 x 1
## disp
## <dbl>
## 1 160
## 2 160
## 3 108
## 4 258
## 5 360
## 6 225
## 7 360
## 8 147.
## 9 141.
## 10 168.
## # ... with 22 more rows
mtcars[[1]] #should extract the first element of the tibble as an atomic vector
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
mtcars[["cyl"]] #should extract the element (column) named "cyl" as an atomic vector
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cyl"]][3:6] #should extract the column "cyl" as a an atomic vector, then extract the 3rd to 6th elements from the vector
## [1] 4 6 8 6
The key difference between a tibble and a ordinary list is that tibble elements should have the same length.
- What does
hms::hms(3600)
return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?
It returns an atomic double vector with classes “hms” and “diffftime”. The returned vector also has an attribute which specifies a unit of time (seconds). It gets printed as a period of time in format hh:mm:ss.
- Try and make a tibble that has columns with different lengths. What happens?
tibble(
a = 1:10,
b = 1:20,
c = 1:15
)
## Error:
## ! Tibble columns must have compatible sizes.
## * Size 10: Existing data.
## * Size 20: Column `b`.
## i Only values of size one are recycled.
It throws an error indicating that all columns should have length 1, or equal to the longest column.
- Based on the definition above, is it ok to have a list as a column of a tibble?
Yes! As long as the list has the same length as the other columns.
tibble(
a = list(1:2, 3:7),
b = c("first", "second"))
## # A tibble: 2 x 2
## a b
## <list> <chr>
## 1 <int [2]> first
## 2 <int [5]> second