-
Notifications
You must be signed in to change notification settings - Fork 0
/
Vectors.Rmd
503 lines (320 loc) · 12.1 KB
/
Vectors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
# Vectors
```{r Vectors-1, include = FALSE}
source("common.R")
```
## Atomic vectors (Exercises 3.2.5)
**Q1.** How do you create raw and complex scalars? (See `?raw` and `?complex`.)
**A1.** In R, scalars are nothing but vectors of length 1, and can be created using the same constructor.
- Raw vectors
The raw type holds raw bytes, and can be created using `charToRaw()`. For example,
```{r Vectors-2}
x <- "A string"
(y <- charToRaw(x))
typeof(y)
```
An alternative is to use `as.raw()`:
```{r Vectors-3}
as.raw("–") # en-dash
as.raw("—") # em-dash
```
- Complex vectors
Complex vectors are used to represent (surprise!) complex numbers.
Example of a complex scalar:
```{r Vectors-4}
(x <- complex(length.out = 1, real = 1, imaginary = 8))
typeof(x)
```
**Q2.** Test your knowledge of the vector coercion rules by predicting the output of the following uses of `c()`:
```{r Vectors-5, eval=FALSE}
c(1, FALSE)
c("a", 1)
c(TRUE, 1L)
```
**A2.** The vector coercion rules dictate that the data type with smaller size will be converted to data type with bigger size.
```{r Vectors-6}
c(1, FALSE)
c("a", 1)
c(TRUE, 1L)
```
**Q3.** Why is `1 == "1"` true? Why is `-1 < FALSE` true? Why is `"one" < 2` false?
**A3.** The coercion rules for vectors reveal why some of these comparisons return the results that they do.
```{r Vectors-7}
1 == "1"
c(1, "1")
```
```{r Vectors-8}
-1 < FALSE
c(-1, FALSE)
```
```{r Vectors-9}
"one" < 2
c("one", 2)
sort(c("one", 2))
```
**Q4.** Why is the default missing value, `NA`, a logical vector? What's special about logical vectors? (Hint: think about `c(FALSE, NA_character_)`.)
**A4.** The `"logical"` type is the lowest in the coercion hierarchy.
So `NA` defaulting to any other type (e.g. `"numeric"`) would mean that any time there is a missing element in a vector, rest of the elements would be converted to a type higher in hierarchy, which would be problematic for types lower in hierarchy.
```{r Vectors-10}
typeof(NA)
c(FALSE, NA_character_)
```
**Q5.** Precisely what do `is.atomic()`, `is.numeric()`, and `is.vector()` test for?
**A5.** Let's discuss them one-by-one.
- `is.atomic()`
This function checks if the object is a vector of atomic *type* (or `NULL`).
Quoting docs:
> `is.atomic` is true for the atomic types ("logical", "integer", "numeric", "complex", "character" and "raw") and `NULL`.
```{r Vectors-11}
is.atomic(NULL)
is.atomic(list(NULL))
```
- `is.numeric()`
Its documentation says:
> `is.numeric` should only return true if the base type of the class is `double` or `integer` and values can reasonably be regarded as `numeric`
Therefore, this function only checks for `double` and `integer` base types and not other types based on top of these types (`factor`, `Date`, `POSIXt`, or `difftime`).
```{r Vectors-12}
is.numeric(1L)
is.numeric(factor(1L))
```
- `is.vector()`
As per its documentation:
> `is.vector` returns `TRUE` if `x` is a vector of the specified mode having no attributes *other than names*. It returns `FALSE` otherwise.
Thus, the function can be incorrectif the object has attributes other than `names`.
```{r Vectors-13}
x <- c("x" = 1, "y" = 2)
is.vector(x)
attr(x, "m") <- "abcdef"
is.vector(x)
```
A better way to check for a vector:
```{r Vectors-14}
is.null(dim(x))
```
## Attributes (Exercises 3.3.4)
**Q1.** How is `setNames()` implemented? How is `unname()` implemented? Read the source code.
**A1.** Let's have a look at implementations for these functions.
- `setNames()`
```{r Vectors-15}
setNames
```
Given this function signature, we can see why, when no first argument is given, the result is still a named vector.
```{r Vectors-16}
setNames(, c("a", "b"))
setNames(c(1, 2), c("a", "b"))
```
- `unname()`
```{r Vectors-17}
unname
```
`unname()` removes existing names (or dimnames) by setting them to `NULL`.
```{r Vectors-18}
unname(setNames(, c("a", "b")))
```
**Q2.** What does `dim()` return when applied to a 1-dimensional vector? When might you use `NROW()` or `NCOL()`?
**A2.** Dimensions for a 1-dimensional vector are `NULL`. For example,
```{r Vectors-19}
dim(c(1, 2))
```
`NROW()` and `NCOL()` are helpful for getting dimensions for 1D vectors by treating them as if they were matrices or dataframes.
```{r Vectors-20}
# example-1
x <- character(0)
dim(x)
nrow(x)
NROW(x)
ncol(x)
NCOL(x)
# example-2
y <- 1:4
dim(y)
nrow(y)
NROW(y)
ncol(y)
NCOL(y)
```
**Q3.** How would you describe the following three objects? What makes them different from `1:5`?
```{r Vectors-21}
x1 <- array(1:5, c(1, 1, 5))
x2 <- array(1:5, c(1, 5, 1))
x3 <- array(1:5, c(5, 1, 1))
```
**A3.** `x1`, `x2`, and `x3` are one-dimensional **array**s, but with different "orientations", if we were to mentally visualize them.
`x1` has 5 entries in the third dimension, `x2` in the second dimension, while `x1` in the first dimension.
**Q4.** An early draft used this code to illustrate `structure()`:
```{r Vectors-22}
structure(1:5, comment = "my attribute")
```
But when you print that object you don't see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)
**A4.** From `?attributes` (emphasis mine):
> Note that some attributes (namely class, **comment**, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.
```{r Vectors-23}
structure(1:5, x = "my attribute")
structure(1:5, comment = "my attribute")
```
## S3 atomic vectors (Exercises 3.4.5)
**Q1.** What sort of object does `table()` return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?
**A1.** `table()` returns an array of `integer` type and its dimensions scale with the number of variables present.
```{r Vectors-24}
(x <- table(mtcars$am))
(y <- table(mtcars$am, mtcars$cyl))
(z <- table(mtcars$am, mtcars$cyl, mtcars$vs))
# type
purrr::map(list(x, y, z), typeof)
# attributes
purrr::map(list(x, y, z), attributes)
```
**Q2.** What happens to a factor when you modify its levels?
```{r Vectors-25, results = FALSE}
f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
```
**A2.** Its levels change but the underlying integer values remain the same.
```{r Vectors-26}
f1 <- factor(letters)
f1
as.integer(f1)
levels(f1) <- rev(levels(f1))
f1
as.integer(f1)
```
**Q3.** What does this code do? How do `f2` and `f3` differ from `f1`?
```{r Vectors-27, results = FALSE}
f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))
```
**A3.** In this code:
- `f2`: Only the underlying integers are reversed, but levels remain unchanged.
```{r Vectors-28}
f2 <- rev(factor(letters))
f2
as.integer(f2)
```
- `f3`: Both the levels and the underlying integers are reversed.
```{r Vectors-29}
f3 <- factor(letters, levels = rev(letters))
f3
as.integer(f3)
```
## Lists (Exercises 3.5.4)
**Q1.** List all the ways that a list differs from an atomic vector.
**A1.** Here is a table of comparison:
| feature | atomic vector | list (aka generic vector) |
| :----------------------------- | :----------------------------------------------- | :----------------------------------------------------- |
| element type | unique | mixed^[a list can contain a mix of types] |
| recursive? | no | yes^[a list can contain itself] |
| return for out-of-bounds index | `NA` | `NULL` |
| memory address | single memory reference^[`lobstr::ref(c(1, 2))`] | reference per list element^[`lobstr::ref(list(1, 2))`] |
**Q2.** Why do you need to use `unlist()` to convert a list to an atomic vector? Why doesn't `as.vector()` work?
**A2.** A list already *is* a (generic) vector, so `as.vector()` is not going to change anything, and there is no `as.atomic.vector`. Thus, we need to use `unlist()`.
```{r Vectors-30}
x <- list(a = 1, b = 2)
is.vector(x)
is.atomic(x)
# still a list
as.vector(x)
# now a vector
unlist(x)
```
**Q3.** Compare and contrast `c()` and `unlist()` when combining a date and date-time into a single vector.
**A3.** Let's first create a date and datetime object
```{r Vectors-31}
date <- as.Date("1947-08-15")
datetime <- as.POSIXct("1950-01-26 00:01", tz = "UTC")
```
And check their attributes and underlying `double` representation:
```{r Vectors-32}
attributes(date)
attributes(datetime)
as.double(date) # number of days since the Unix epoch 1970-01-01
as.double(datetime) # number of seconds since then
```
- Behavior with `c()`
Since `S3` method for `c()` dispatches on the first argument, the resulting class of the vector is going to be the same as the first argument. Because of this, some attributes will be lost.
```{r Vectors-33}
c(date, datetime)
attributes(c(date, datetime))
c(datetime, date)
attributes(c(datetime, date))
```
- Behavior with `unlist()`
It removes all attributes and we are left only with the underlying double representations of these objects.
```{r Vectors-34}
unlist(list(date, datetime))
unlist(list(datetime, date))
```
## Data frames and tibbles (Exercises 3.6.8)
**Q1.** Can you have a data frame with zero rows? What about zero columns?
**A1.** Data frame with 0 rows is possible. This is basically a list with a vector of length 0.
```{r Vectors-35}
data.frame(x = numeric(0))
```
Data frame with 0 columns is also possible. This will be an empty list.
```{r Vectors-36}
data.frame(row.names = 1)
```
And, finally, data frame with 0 rows *and* columns is also possible:
```{r Vectors-37}
data.frame()
dim(data.frame())
```
Although, it might not be common to *create* such data frames, they can be results of subsetting. For example,
```{r Vectors-38}
BOD[0, ]
BOD[, 0]
BOD[0, 0]
```
**Q2.** What happens if you attempt to set rownames that are not unique?
**A2.** If you attempt to set data frame rownames that are not unique, it will not work.
```{r Vectors-39, error=TRUE}
data.frame(row.names = c(1, 1))
```
**Q3.** If `df` is a data frame, what can you say about `t(df)`, and `t(t(df))`? Perform some experiments, making sure to try different column types.
**A3.** Transposing a data frame:
- transforms it into a matrix
- coerces all its elements to be of the same type
```{r Vectors-40}
# original
(df <- head(iris))
# transpose
t(df)
# transpose of a transpose
t(t(df))
# is it a dataframe?
is.data.frame(df)
is.data.frame(t(df))
is.data.frame(t(t(df)))
# check type
typeof(df)
typeof(t(df))
typeof(t(t(df)))
# check dimensions
dim(df)
dim(t(df))
dim(t(t(df)))
```
**Q4.** What does `as.matrix()` do when applied to a data frame with columns of different types? How does it differ from `data.matrix()`?
**A4.** The return type of `as.matrix()` depends on the data frame column types.
As docs for `as.matrix()` mention:
> The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g. all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give an integer matrix, etc.
Let's experiment:
```{r Vectors-41}
# example with mixed types (coerced to character)
(df <- head(iris))
as.matrix(df)
str(as.matrix(df))
# another example (no such coercion)
BOD
as.matrix(BOD)
```
On the other hand, `data.matrix()` always returns a numeric matrix.
From documentation of `data.matrix()`:
> Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. [...] Character columns are first converted to factors and then to integers.
Let's experiment:
```{r Vectors-42}
data.matrix(df)
str(data.matrix(df))
```
## Session information
```{r Vectors-43}
sessioninfo::session_info(include_base = TRUE)
```