Functionals.Rmd

# Functionals

```{r Functionals-1, include = FALSE}
source("common.R")
```

Attaching the needed libraries:

```{r Functionals-2, warning=FALSE, message=FALSE}
library(purrr, warn.conflicts = FALSE)
```

## My first functional: `map()` (Exercises 9.2.6)

---

**Q1.** Use `as_mapper()` to explore how `{purrr}` generates anonymous functions for the integer, character, and list helpers. What helper allows you to extract attributes? Read the documentation to find out.

**A1.** Let's handle the two parts of the question separately.

- `as_mapper()` and `{purrr}`-generated anonymous functions:

Looking at the experimentation below with `map()` and `as_mapper()`, we can see that, depending on the type of the input, `as_mapper()` creates an extractor function using `pluck()`.

```{r Functionals-3}
# mapping by position -----------------------

x <- list(1, list(2, 3, list(1, 2)))

map(x, 1)
as_mapper(1)

map(x, list(2, 1))
as_mapper(list(2, 1))

# mapping by name -----------------------

y <- list(
  list(m = "a", list(1, m = "mo")),
  list(n = "b", list(2, n = "no"))
)

map(y, "m")
as_mapper("m")

# mixing position and name
map(y, list(2, "m"))
as_mapper(list(2, "m"))

# compact functions ----------------------------

map(y, ~ length(.x))
as_mapper(~ length(.x))
```

- You can extract attributes using `purrr::attr_getter()`:

```{r Functionals-4}
pluck(Titanic, attr_getter("class"))
```

---

**Q2.** `map(1:3, ~ runif(2))` is a useful pattern for generating random numbers, but `map(1:3, runif(2))` is not. Why not? Can you explain why it returns the result that it does?

**A2.** As shown by `as_mapper()` outputs below, the second call is not appropriate for generating random numbers because it translates to `pluck()` function where the indices for plucking are taken to be randomly generated numbers, and these are not valid accessors and so we get `NULL`s in return.

```{r Functionals-5}
map(1:3, ~ runif(2))
as_mapper(~ runif(2))

map(1:3, runif(2))
as_mapper(runif(2))
```

---

**Q3.** Use the appropriate `map()` function to:

    a) Compute the standard deviation of every column in a numeric data frame.

    a) Compute the standard deviation of every numeric column in a mixed data frame. (Hint: you'll need to do it in two steps.)

    a) Compute the number of levels for every factor in a data frame.

**A3.** Using the appropriate `map()` function to:

- Compute the standard deviation of every column in a numeric data frame:

```{r Functionals-6}
map_dbl(mtcars, sd)
```

- Compute the standard deviation of every numeric column in a mixed data frame:

```{r Functionals-7}
keep(iris, is.numeric) %>%
  map_dbl(sd)
```

- Compute the number of levels for every factor in a data frame:

```{r Functionals-8}
modify_if(dplyr::starwars, is.character, as.factor) %>%
  keep(is.factor) %>%
  map_int(~ length(levels(.)))
```

---

**Q4.** The following code simulates the performance of a *t*-test for non-normal data. Extract the *p*-value from each test, then visualise.

```{r Functionals-9}
trials <- map(1:100, ~ t.test(rpois(10, 10), rpois(7, 10)))
```

**A4.**

- Extract the *p*-value from each test:

```{r Functionals-10}
trials <- map(1:100, ~ t.test(rpois(10, 10), rpois(7, 10)))

(p <- map_dbl(trials, "p.value"))
```

- Visualise the extracted *p*-values:

```{r Functionals-11}
plot(p)

hist(p)
```

---

**Q5.** The following code uses a map nested inside another map to apply a function to every element of a nested list. Why does it fail, and  what do you need to do to make it work?

```{r Functionals-12, error = TRUE}
x <- list(
  list(1, c(3, 9)),
  list(c(3, 6), 7, c(4, 7, 6))
)

triple <- function(x) x * 3
map(x, map, .f = triple)
```

**A5.** This function fails because this call effectively evaluates to the following:

```{r Functionals-13, eval=FALSE}
map(.x = x, .f = ~ triple(x = .x, map))
```

But `triple()` has only one parameter (`x`), and so the execution fails.

Here is the fixed version:

```{r Functionals-14}
x <- list(
  list(1, c(3, 9)),
  list(c(3, 6), 7, c(4, 7, 6))
)

triple <- function(x) x * 3
map(x, .f = ~ map(.x, ~ triple(.x)))
```

---

**Q6.** Use `map()` to fit linear models to the `mtcars` dataset using the formulas stored in this list:

```{r Functionals-15}
formulas <- list(
  mpg ~ disp,
  mpg ~ I(1 / disp),
  mpg ~ disp + wt,
  mpg ~ I(1 / disp) + wt
)
```

**A6.** Fitting linear models to the `mtcars` dataset using the provided formulas:

```{r Functionals-16}
formulas <- list(
  mpg ~ disp,
  mpg ~ I(1 / disp),
  mpg ~ disp + wt,
  mpg ~ I(1 / disp) + wt
)

map(formulas, ~ lm(formula = ., data = mtcars))
```

---

**Q7.** Fit the model `mpg ~ disp` to each of the bootstrap replicates of `mtcars` in the list below, then extract the $R^2$ of the model fit (Hint: you can compute the $R^2$ with `summary()`.)

```{r Functionals-17, eval=FALSE}
bootstrap <- function(df) {
  df[sample(nrow(df), replace = TRUE), , drop = FALSE]
}

bootstraps <- map(1:10, ~ bootstrap(mtcars))
```

**A7.** This can be done using `map_dbl()`:

```{r Functionals-18}
bootstrap <- function(df) {
  df[sample(nrow(df), replace = TRUE), , drop = FALSE]
}

bootstraps <- map(1:10, ~ bootstrap(mtcars))

bootstraps %>%
  map(~ lm(mpg ~ disp, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared")
```

---

## Map variants (Exercises 9.4.6)

---

**Q1.** Explain the results of `modify(mtcars, 1)`.

**A1.** `modify()` returns the object of type same as the input. Since the input here is a data frame of certain dimensions and `.f = 1` translates to plucking the first element in each column, it returns a data frame with the same dimensions with the plucked element recycled across rows.

```{r Functionals-19}
head(modify(mtcars, 1))
```

---

**Q2.** Rewrite the following code to use `iwalk()` instead of `walk2()`. What are the advantages and disadvantages?

```{r Functionals-20, eval = FALSE}
cyls <- split(mtcars, mtcars$cyl)
paths <- file.path(temp, paste0("cyl-", names(cyls), ".csv"))
walk2(cyls, paths, write.csv)
```

**A2.** Let's first rewrite provided code using `iwalk()`:

```{r Functionals-21, eval=FALSE}
cyls <- split(mtcars, mtcars$cyl)
names(cyls) <- file.path(temp, paste0("cyl-", names(cyls), ".csv"))
iwalk(cyls, ~ write.csv(.x, .y))
```

The advantage of using `iwalk()` is that we need to now deal with only a single variable (`cyls`) instead of two (`cyls` and `paths`).

The disadvantage is that the code is difficult to reason about: 
In `walk2()`, it's explicit what `.x` (`= cyls`) and `.y` (`= paths`) correspond to, while this is not so for `iwalk()` (i.e., `.x = cyls` and `.y = names(cyls)`) with the `.y` argument being "invisible".

---

**Q3.** Explain how the following code transforms a data frame using functions stored in a list.

```{r Functionals-22}
trans <- list(
  disp = function(x) x * 0.0163871,
  am = function(x) factor(x, labels = c("auto", "manual"))
)

nm <- names(trans)
mtcars[nm] <- map2(trans, mtcars[nm], function(f, var) f(var))
```

Compare and contrast the `map2()` approach to this `map()` approach:

```{r Functionals-23, eval = FALSE}
mtcars[nm] <- map(nm, ~ trans[[.x]](mtcars[[.x]]))
```

**A3.** `map2()` supplies the functions stored in `trans` as anonymous functions via placeholder `f`, while the names of the columns specified in `mtcars[nm]` are supplied as `var` argument to the anonymous function. Note that the function is iterating over indices for vectors of transformations and column names.

```{r Functionals-24, eval=FALSE}
trans <- list(
  disp = function(x) x * 0.0163871,
  am = function(x) factor(x, labels = c("auto", "manual"))
)

nm <- names(trans)
mtcars[nm] <- map2(trans, mtcars[nm], function(f, var) f(var))
```

In the `map()` approach, the function is iterating over indices for vectors of column names.

```{r Functionals-25, eval=FALSE}
mtcars[nm] <- map(nm, ~ trans[[.x]](mtcars[[.x]]))
```

The latter approach can't afford passing arguments to placeholders in an anonymous function.

---

**Q4.** What does `write.csv()` return, i.e. what happens if you use it with `map2()` instead of `walk2()`?

**A4.** If we use `map2()`, it will work, but it will print `NULL`s to the console for every list element.

```{r Functionals-26}
withr::with_tempdir(
  code = {
    ls <- split(mtcars, mtcars$cyl)
    nm <- names(ls)
    map2(ls, nm, write.csv)
  }
)
```

---

## Predicate functionals (Exercises 9.6.3)

---

**Q1.** Why isn't `is.na()` a predicate function? What base R function is closest to being a predicate version of `is.na()`?

**A1.** As mentioned in the docs:

> A predicate is a function that returns a **single** `TRUE` or `FALSE`.

The `is.na()` function does not return a `logical` scalar, but instead returns a vector and thus isn't a predicate function.

```{r Functionals-27}
# contrast the following behavior of predicate functions
is.character(c("x", 2))
is.null(c(3, NULL))

# with this behavior
is.na(c(NA, 1))
```

The closest equivalent of a predicate function in base-R is `anyNA()` function.

```{r Functionals-28}
anyNA(c(NA, 1))
```

---

**Q2.** `simple_reduce()` has a problem when `x` is length 0 or length 1. Describe the source of the problem and how you might go about fixing it.

```{r Functionals-29}
simple_reduce <- function(x, f) {
  out <- x[[1]]
  for (i in seq(2, length(x))) {
    out <- f(out, x[[i]])
  }
  out
}
```

**A2.** The supplied function struggles with inputs of length 0 and 1 because function tries to subscript out-of-bound values.

```{r Functionals-30, error=TRUE}
simple_reduce(numeric(), sum)
simple_reduce(1, sum)
simple_reduce(1:3, sum)
```

This problem can be solved by adding `init` argument, which supplies the default or initial value:

```{r Functionals-31}
simple_reduce2 <- function(x, f, init = 0) {
  # initializer will become the first value
  if (length(x) == 0L) {
    return(init)
  }

  if (length(x) == 1L) {
    return(x[[1L]])
  }

  out <- x[[1]]

  for (i in seq(2, length(x))) {
    out <- f(out, x[[i]])
  }

  out
}
```

Let's try it out:

```{r Functionals-32}
simple_reduce2(numeric(), sum)
simple_reduce2(1, sum)
simple_reduce2(1:3, sum)
```

Depending on the function, we can provide a different `init` argument:

```{r Functionals-33}
simple_reduce2(numeric(), `*`, init = 1)
simple_reduce2(1, `*`, init = 1)
simple_reduce2(1:3, `*`, init = 1)
```

---

**Q3.** Implement the `span()` function from Haskell: given a list `x` and a predicate function `f`, `span(x, f)` returns the location of the longest sequential run of elements where the predicate is true. (Hint: you might find `rle()` helpful.)

**A3.** Implementation of `span()`:

```{r Functionals-34}
span <- function(x, f) {
  running_lengths <- purrr::map_lgl(x, ~ f(.x)) %>% rle()

  df <- dplyr::tibble(
    "lengths" = running_lengths$lengths,
    "values" = running_lengths$values
  ) %>%
    dplyr::mutate(rowid = dplyr::row_number()) %>%
    dplyr::filter(values)

  # no sequence where condition is `TRUE`
  if (nrow(df) == 0L) {
    return(integer())
  }

  # only single sequence where condition is `TRUE`
  if (nrow(df) == 1L) {
    return((df$rowid):(df$lengths - 1 + df$rowid))
  }

  # multiple sequences where condition is `TRUE`; select max one
  if (nrow(df) > 1L) {
    df <- dplyr::filter(df, lengths == max(lengths))
    return((df$rowid):(df$lengths - 1 + df$rowid))
  }
}
```

Testing it once:

```{r Functionals-35}
span(c(0, 0, 0, 0, 0), is.na)
span(c(NA, 0, NA, NA, NA), is.na)
span(c(NA, 0, 0, 0, 0), is.na)
span(c(NA, NA, 0, 0, 0), is.na)
```

Testing it twice:

```{r Functionals-36}
span(c(3, 1, 2, 4, 5, 6), function(x) x > 3)
span(c(3, 1, 2, 4, 5, 6), function(x) x > 9)
span(c(3, 1, 2, 4, 5, 6), function(x) x == 3)
span(c(3, 1, 2, 4, 5, 6), function(x) x %in% c(2, 4))
```

---

**Q4.** Implement `arg_max()`. It should take a function and a vector of inputs, and return the elements of the input where the function returns the highest value. For example, `arg_max(-10:5, function(x) x ^ 2)` should return -10. `arg_max(-5:5, function(x) x ^ 2)` should return `c(-5, 5)`. Also implement the matching `arg_min()` function.

**A4.** Here are implementations for the specified functions:

- Implementing `arg_max()`

```{r Functionals-37}
arg_max <- function(.x, .f) {
  df <- dplyr::tibble(
    original = .x,
    transformed = purrr::map_dbl(.x, .f)
  )

  dplyr::filter(df, transformed == max(transformed))[["original"]]
}

arg_max(-10:5, function(x) x^2)
arg_max(-5:5, function(x) x^2)
```

- Implementing `arg_min()`

```{r Functionals-38}
arg_min <- function(.x, .f) {
  df <- dplyr::tibble(
    original = .x,
    transformed = purrr::map_dbl(.x, .f)
  )

  dplyr::filter(df, transformed == min(transformed))[["original"]]
}

arg_min(-10:5, function(x) x^2)
arg_min(-5:5, function(x) x^2)
```

---

**Q5.** The function below scales a vector so it falls in the range [0, 1]. How would you apply it to every column of a data frame? How would you apply it to every numeric column in a data frame?

```{r Functionals-39}
scale01 <- function(x) {
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
```

**A5.** We will use `{purrr}` package to apply this function. Key thing to keep in mind is that a data frame is a list of atomic vectors of equal length.

- Applying function to every column in a data frame: We will use `anscombe` as example since it has all numeric columns.

```{r Functionals-40}
purrr::map_df(head(anscombe), .f = scale01)
```

- Applying function to every numeric column in a data frame: We will use `iris` as example since not all of its columns are of numeric type.

```{r Functionals-41}
purrr::modify_if(head(iris), .p = is.numeric, .f = scale01)
```

---

## Base functionals (Exercises 9.7.3)

---

**Q1.** How does `apply()` arrange the output? Read the documentation and perform some experiments.

**A1.** Let's prepare an array and apply a function over different margins:

```{r Functionals-42}
(m <- as.array(table(mtcars$cyl, mtcars$am, mtcars$vs)))

# rows
apply(m, 1, function(x) x^2)

# columns
apply(m, 2, function(x) x^2)

# rows and columns
apply(m, c(1, 2), function(x) x^2)
```

As can be seen, `apply()` returns outputs organised first by the margins being operated over, and only then the results. 

---

**Q2.** What do `eapply()` and `rapply()` do? Does purrr have equivalents?

**A2.** Let's consider them one-by-one.

- `eapply()` 

As mentioned in its documentation:

> `eapply()` applies FUN to the named values from an environment and returns the results as a list.

Here is an example:

```{r Functionals-43}
library(rlang)

e <- env("x" = 1, "y" = 2)
rlang::env_print(e)

eapply(e, as.character)
```

`{purrr}` doesn't have any function to iterate over environments.

- `rapply()` 

> `rapply()` is a recursive version of lapply with flexibility in how the result is structured (how = "..").

Here is an example:

```{r Functionals-44}
X <- list(list(a = TRUE, b = list(c = c(4L, 3.2))), d = 9.0)

rapply(X, as.character, classes = "numeric", how = "replace")
```

`{purrr}` has something similar in `modify_tree()`.

```{r Functionals-45}
X <- list(list(a = TRUE, b = list(c = c(4L, 3.2))), d = 9.0)

purrr::modify_tree(X, leaf = length)
```

---

**Q3.** Challenge: read about the [fixed point algorithm](https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-12.html#%25_idx_1096). Complete the exercises using R.

**A3.** As mentioned in the suggested reading material:

> A number $x$ is called a fixed point of a function $f$ if $x$ satisfies the equation $f(x) = x$. For some functions $f$ we can locate a fixed point by beginning with an initial guess and applying $f$ repeatedly, $f(x), f(f(x)), f(f(f(x))), ...$ until the value does not change very much. Using this idea, we can devise a procedure fixed-point that takes as inputs a function and an initial guess and produces an approximation to a fixed point of the function. 

Let's first implement a fixed-point algorithm:

```{r Functionals-46}
close_enough <- function(x1, x2, tolerance = 0.001) {
  if (abs(x1 - x2) < tolerance) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

find_fixed_point <- function(.f, .guess, tolerance = 0.001) {
  .next <- .f(.guess)
  is_close_enough <- close_enough(.next, .guess, tolerance = tolerance)

  if (is_close_enough) {
    return(.next)
  } else {
    find_fixed_point(.f, .next, tolerance)
  }
}
```

Let's check if it works as expected:

```{r Functionals-47}
find_fixed_point(cos, 1.0)

# cos(x) = x
cos(find_fixed_point(cos, 1.0))
```

We will solve only one exercise from the reading material. Rest are beyond the scope of this solution manual.

> Show that the golden ratio $\phi$ is a fixed point of the transformation $x \mapsto 1 + 1/x$, and use this fact to compute $\phi$ by means of the fixed-point procedure.

```{r Functionals-48}
golden_ratio_f <- function(x) 1 + (1 / x)

find_fixed_point(golden_ratio_f, 1.0)
```

---

## Session information

```{r Functionals-49}
sessioninfo::session_info(include_base = TRUE)
```