Skip to content

Latest commit

 

History

History
386 lines (276 loc) · 9.44 KB

ch20.md

File metadata and controls

386 lines (276 loc) · 9.44 KB

Chapter 20 - Exercises - R for Data Science

Francisco Yira Albornoz October 12th 2018

20.3 Important types of atomic vector

20.3.5 Exercises

  1. Describe the difference between is.finite(x) and !is.infinite(x)

!is.infinite(x) will return TRUE when input is NA.

  1. Read the source code for dplyr::near() (Hint: to see the source code, drop the ()). How does it work?

It checks that the difference between each element in vectors x and y is less that a certain tolerance threshold (tol).

  1. A logical vector can take 3 possible values. How many possible values can an integer vector take? How many possible values can a double take? Use google to do some research.

An integer can take around 4 * 10^9 (+/- 2*10^9).

A double can take values between from about 2e-308 to 2e+308. The exact boundaries are given by the hardware in which R is running on and can be seen in .Machine$double.xmin and .Machine$double.xmax.

  1. Brainstorm at least four functions that allow you to convert a double to an integer. How do they differ? Be precise.

Using as.integer directly

as.integer(x)

Using integer division, then as.integer

to_integer_1 <- function(x) {
  integer_division <- x %/% 1
  as.integer(integer_division)
}

Using round, then as.integer

to_integer_2 <- function(x) {
  as.integer(round(x))
}

Using ceiling, then as.integer

to_integer_3 <- function(x) {
  as.integer(ceiling(x))
}

Showing their differences

x <- c(-Inf, NaN, 0, 1, 1.2, 1.5, 1.7, NA)

as.integer(x)
## Warning: NAs introduced by coercion to integer range

## [1] NA NA  0  1  1  1  1 NA
to_integer_1(x)
## Warning in to_integer_1(x): NAs introduced by coercion to integer range

## [1] NA NA  0  1  1  1  1 NA
to_integer_2(x)
## Warning in to_integer_2(x): NAs introduced by coercion to integer range

## [1] NA NA  0  1  1  2  2 NA
to_integer_3(x)
## Warning in to_integer_3(x): NAs introduced by coercion to integer range

## [1] NA NA  0  1  2  2  2 NA

The round division function produces the same results as just using as.integer (both functions truncate the decimal part), but it doesn’t show a warning when the argument contains Inf or -Inf.

The round function approximates to the next bigger integer when the decimal part is more than 0.5, and the ceiling function approximates to the next integer when the decimal part is more than 0.

  1. What functions from the readr package allow you to turn a string into logical, integer, and double vector?

readr::parse_logical

readr::parse_integer

readr::parse_double

20.4 Using atomic vectors

20.4.6 Exercises

  1. What does mean(is.na(x)) tell you about a vector x? What about sum(!is.finite(x))?

mean(is.na(x)) shows the missing values ratio in a vector.

sum(!is.finite(x)) tell us how many non finite values (such as Inf, -Inf, NaN and NA) there is in a vector.

  1. Carefully read the documentation of is.vector(). What does it actually test for? Why does is.atomic() not agree with the definition of atomic vectors above?

is.vector() tests the absence of atributes other than name in the vector. So, it can return TRUE for lists, and FALSE for vectors which have other attributes (such as factors).

is.vector(list(a = "a", b = 1))
## [1] TRUE

is.atomic() think that NULL is an atomic vector type, and this does not match the definition of atomic vector given in the book.

  1. Compare and contrast setNames() with purrr::set_names().

purrr::set_names() has a different default arguments. While in setNames() the user can provide just a vector of names and obtain a vector with those names replicated also as values, purrr::set_names() always requiere a vector or object to name/rename, and allows omiting the names vector (in this case the object values are used as names).

Also, purrr::set_names() is more flexible, since it allows to supply the names in separate elements through ... (instead of a single vector), or even to use a formula that modifies the original names of the object.

  1. Create functions that take a vector as input and returns:

  2. The last value. Should you use [ or [[?

x <- 1:10

y <- list(1:3, 4:6)

last_element <- function(x) {
  x[[length(x)]]
}

last_element(x)
## [1] 10
last_element(y)
## [1] 4 5 6

We should use [[ in order to return the last value for recursive vectors (instead of a list containing the last value).

  1. The elements at even numbered positions.
even_positions <- function(x) {
  positions <- seq_along(x)
  even_pos <- (positions %% 2 == 0)
  x[even_pos]
}
  1. Every element except the last value.
minus_last <- function(x) {
  positions <- seq_along(x)
  not_last <- positions != length(x)
  x[not_last]
}
  1. Only even numbers (and no missing values).
even_numbers <- function(x) {
  is_even <- x %% 2 == 0
  not_na <- !is.na(x)
  
  x[is_even & not_na]
}
  1. Why is x[-which(x > 0)] not the same as x[x <= 0]?

x[x <= 0] will convert NaN values in NA, since we “don’t know” if NaN is less or equal than zero. However, x[-which(x > 0)] will return the original NaN values. Why? Because in the expression we’re just excluding all the values where x > 0 evaluates to TRUE (or viceversa, keeping “as is” the values where x > 0 evaluates to anything else). Since NaN > 0 doesn’t evaluates to TRUE, the value is returned without being converted to NA.

  1. What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?

It return a missing value (NA).

x <- 1:10
x[20]
## [1] NA
x["hola"]
## [1] NA

20.5 Recursive vectors (lists)

20.5.4 Exercises

  1. Draw the following lists as nested sets:

    list(a, b, list(c, d), list(e, f))

“List 1”

list(list(list(list(list(list(a))))))

“List 2”

  1. What happens if you subset a tibble as if you’re subsetting a list? What are the key differences between a list and a tibble?

Subsetting should work the same way for tibbles and lists most of the time, since tibbles are just augmented lists. See below:

mtcars <- tibble::as_tibble(mtcars)

mtcars[1:2] #should return a tibble (list) with two columns (atomic vectors)
## # A tibble: 32 x 2
##      mpg   cyl
##    <dbl> <dbl>
##  1  21       6
##  2  21       6
##  3  22.8     4
##  4  21.4     6
##  5  18.7     8
##  6  18.1     6
##  7  14.3     8
##  8  24.4     4
##  9  22.8     4
## 10  19.2     6
## # ... with 22 more rows
mtcars[3:4][1] #should return a tibble with the first column from a tibble with the 3rd and 4th columns of mtcars
## # A tibble: 32 x 1
##     disp
##    <dbl>
##  1  160 
##  2  160 
##  3  108 
##  4  258 
##  5  360 
##  6  225 
##  7  360 
##  8  147.
##  9  141.
## 10  168.
## # ... with 22 more rows
mtcars[[1]] #should extract the first element of the tibble as an atomic vector
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
mtcars[["cyl"]] #should extract the element (column) named "cyl" as an atomic vector
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cyl"]][3:6] #should extract the column "cyl" as a an atomic vector, then extract the 3rd to 6th elements from the vector
## [1] 4 6 8 6

The key difference between a tibble and a ordinary list is that tibble elements should have the same length.

20.7 Augmented vectors

20.7.4 Exercises

  1. What does hms::hms(3600) return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?

It returns an atomic double vector with classes “hms” and “diffftime”. The returned vector also has an attribute which specifies a unit of time (seconds). It gets printed as a period of time in format hh:mm:ss.

  1. Try and make a tibble that has columns with different lengths. What happens?
tibble(
  a = 1:10,
  b = 1:20,
  c = 1:15
)
## Error:
## ! Tibble columns must have compatible sizes.
## * Size 10: Existing data.
## * Size 20: Column `b`.
## i Only values of size one are recycled.

It throws an error indicating that all columns should have length 1, or equal to the longest column.

  1. Based on the definition above, is it ok to have a list as a column of a tibble?

Yes! As long as the list has the same length as the other columns.

tibble(
  a = list(1:2, 3:7), 
  b = c("first", "second"))
## # A tibble: 2 x 2
##   a         b     
##   <list>    <chr> 
## 1 <int [2]> first 
## 2 <int [5]> second