-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathfunctions.Rmd
148 lines (95 loc) · 10.1 KB
/
functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
```{r include = FALSE}
if(!knitr:::is_html_output())
{
options("width"=56)
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=56, indent = 2), tidy = TRUE)
knitr::opts_chunk$set(fig.pos = 'H')
}
```
# Functions
Being a functional programming language, functions are at the heart of R. We've already used lots of functions in the previous chapters, but now we're going to look in more detail about what a function is.
John Chambers, creator of the S programming language upon which R is based and core member of the R programming language project, said this:
> "To understand computations in R, two slogans are helpful:
>
> * Everything that exists is an object.
> * Everything that happens is a function call.
>
> --- John Chambers"
For now, we're going to focus on that second statement. What does it mean?
Well, a function is quite simple. It has an input, it does something, and then it gives an output. A really simple example of this is just typing `print(1)` into the console and hitting enter. You've given an input, there was a calculation, and now there's an output. Something's happened (1 was printed in the console) and it was done by calling a function (`print`).
If you're well-versed in mathematics, you'll know that functions in maths are the same. $f(x) = {3x}$ means that the to get `y`, you take `x` and multiply it by three. In this case, our input is `x`, our bit in the middle is multiplying by three, and then our output is `y`.
If you haven't used functions in mathematics then don't worry. Even by getting this far in the book, you've already used functions loads of times. For example, how do you create a vector? If you remember, you use the `c()` function, which we know stands for "concatenate". So, every time you've created a vector, you've used a function without even knowing it. The input was whatever you provided in the brackets. The computation was to concatenate everything together. And then the output was the vector.
Similarly, whenever you created a factor or a matrix or a dataframe or whatever, you used a function. You provided an input, there was a computation to change that input, and then you got an output.
As confusing as functions will inevitably become, just try to remember the core of what a function is:
When you call a function, there's an input, something happens, and there's an output.
## Functions in R
So more specifically, what do functions look like in R? Well, a good starting point is that when we call (use) a function, we provide the name and it's almost always followed by brackets (`()`). This helps make it clear what values you're providing as your inputs to the function.
I say that nearly almost all functions are followed by `()`, because some aren't. A simple example of this is `+`. `+` is still a function:
```{r}
is.function(`+`)
# the backticks just mean I'm referring
# to the + function without using it
```
But it doesn't have brackets. Instead, we can use a shorthand where we provide the values we want to give to the function either side of it (e.g. `1 + 2`). Importantly however, the logic is exactly the same, and you can still use the `+` like a normal function with brackets:
```{r}
`+`(1,2)
```
It's just that this looks a little weird to us, so we often use the shorthand way. But the long and short of it is: an easy way to tell when someone is calling (using) a function is to look for the `()` after the function name.
## Inputs
We know that to use a function in R, we have to provide inputs\*. And we also know that we provide our inputs within the brackets after the function name. But how do we know what values are allowed?
\*Technically, sometimes you don't have to provide an input to a function (e.g. `Sys.Date()`, which gives us the current date without putting anything in the brackets). But most functions require at least one input.
By typing a `?` followed by the name of the function into the console (e.g. `?length()`), you'll get a help page showing you the input parameters allowed by the function. So if we use `?length()` as an example, the help page tells us that the `length()` function expects one input parameter, `x`, and that needs to be an `R` object. Nice and simple.
#### `...`
In some cases, you'll see a `...` as one of the input parameters. This essentially means that you can provide an indeterminate number of values for that input. I know that sounds confusing, but the `c()` function is a good way of demonstrating this. When you create a vector, you can provide an (essentially) infinite number of values to the function. So the `c()` function basically bundles everything you provide to it into that `...` parameter.
### Explicit input parameters
If you type `?c()` into the console however, you'll see that there are also some other input parameters: `recursive` and `use.names`. Well Adam, if `...` just bundles everything I provide into a single input, then how do those work? Well this outlines the importance of providing **explicit** input parameters. When we're explicit, we're saying exactly which input parameter we're referring to with each value we provide. And to do this, we just provide the name of the input parameter when we give it. Let's look at the `substr()` function as an example.
The `substr()` function simply returns part of a character string that you provide. So, if I was to type:
```{r}
substr("Hello", 1, 3)
```
I get the first to the third characters in the string "hello". With this function call however, I haven't been explicit. Instead, I've just provided the inputs in the order that they're listed in the documentation:
* `x`
+ a character vector
* `start`
+ the first element to be extracted
* `stop`
+ the last element to be extracted
To be explicit, I need to provide the name of the input parameter that I'm referring to when I provide my inputs:
```{r}
substr(x = "Hello", start = 1, stop = 3)
substr(start = 1, stop = 3, x = "Hello")
```
Notice how when I'm being explicit it doesn't matter what order I provide my inputs in, R knows which value should be mapped to which input parameter.
Also, notice how we're using `=` here and not anything else like `<-`? This is another reason why I suggest not using `=` for assignment: we use `=` and when we're providing input parameters and so it's good to keep them separate.
So how does this link back with the `...`? Well, with the `c()` function, every unnamed parameter you provide is bundled into the `...` parameter. To give values for the `recursive` and `use.names` parameters, you'd need to provide them *explicitly* (e.g. `recursive = TRUE`). This will be true of many functions where you see a `...`. If you're not explicit with the parameters that you don't want to be included in the `...`, you're going to have a bad time.
#### Optional input parameters
For many functions, certain parameters have a predefined value that they will default to. This provides a level of flexibility whilst not requiring lines and lines of code for every function call; there's a default value, but you can override it if needed.
Optional parameters are easily distinguished in the documentation of a function because they will a value already assigned to them like this: `use.names = TRUE`.
For instance, when we create a vector using the `c()` function, there are two optional parameters (`recursive` and `use.names`) that already have the values `TRUE` and `FALSE` assigned to them. To override these defaults, we just need to provide a new value to the parameter like this:
```{r}
c('one'=1,'ten'=10,'fifteen'=15, use.names = FALSE)
```
Values without default values are often called *required parameters*. This is a bit of a misnomer because there will be instances where you won't provide a value to these parameters and the function will work just fine. This is down to two things that we'll look at in the For Teachers section: lazy evaluation and the `missing()` function. Most developers try and stay away from having required parameters that aren't actually required, but you'll still see them around (e.g. you can use the `download.file()` function without providing a value to the `method` parameter even though it doesn't have a default value.
## Outputs
First and foremost, in R you can have as many input as you like to a function. However, a function will only ever return one *thing*. I say one *thing*, because functions can return a list which itself can contain multiple values, but just keep this in mind: **Functions in R have a single return value**.
### Reassigning outputs
Functions in R do not edit the inputs you provide in place. Instead, they essentially work on copies of the inputs you provide. We call this **immutability**. Here's a quick example:
```{r}
x <- 1
sum(x, 1)
x
```
As you can see, when we call the `sum()` function with `x` as an input parameter, the value of `x` stays the same.
If you do want to edit your original value, you just need to reassign the output of the function call back to the variable. I know that sounds complicated, but it's quite simple:
```{r}
x <- 1
x <- sum(x,1)
x
```
This works because the right-hand side of the assignment line is executed first. In other words, when the `sum(x,1)` is evaluated, `x` is still equal to one. This makes sense because otherwise it'd be very hard to keep track of what `x` was equal to!
This behaviour (not changing the input parameter value in place) is a major point of difference between functions and what are called methods in other languages. If you're coming from something like Python, you may be used to changing objects through methods: `object.AddNew` or something like that. In R, functions do not change variables in the global environment because they are executed in their own environment. To learn more about environments, there is an [environments](environments.html) chapter in the "For Teachers" section.
## Questions {#questions-functions}
1. Why does `` `<-`(test, 2) `` work? What does this tell us about `<-`?
2. Why does `mean(1,2)` not return the output you'd expect but `sum(1,2)` does?
+ Hint: the documentation of both functions will help.
3. Other than `Sys.Date()`, can you think of another example of a function that be executed without any explicit input parameters?