Skip to content

Commit

Permalink
differences for PR #908
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jan 7, 2025
1 parent 51352b7 commit 1c75ecf
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 54 deletions.
90 changes: 38 additions & 52 deletions 04-data-structures-part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,30 +237,38 @@ No matter how
complicated our analyses become, all data in R is interpreted as one of these
basic data types. This strictness has some really important consequences.

A user has added details of another cat. This information is in the file
`data/feline-data_v2.csv`.
A user has provided details of another cat. We can add an additional row to our cats table using `rbind()`.


``` r
file.show("data/feline-data_v2.csv")
additional_cat <- data.frame(coat = "tabby", weight = "2.3 or 2.4", likes_catnip = 1)
additional_cat
```

``` output
coat weight likes_catnip
1 tabby 2.3 or 2.4 1
```

``` r
coat,weight,likes_catnip
calico,2.1,1
black,5.0,0
tabby,3.2,1
tabby,2.3 or 2.4,1
cats2 <- rbind(cats, additional_cat)
cats2
```

``` output
coat weight likes_catnip
1 calico 2.1 1
2 black 5 0
3 tabby 3.2 1
4 tabby 2.3 or 2.4 1
```

Load the new cats data like before, and check what type of data we find in the
`weight` column:
Let's check what type of data we find in the
`weight` column of our new `cats2` object:


``` r
cats <- read.csv(file="data/feline-data_v2.csv")
typeof(cats$weight)
typeof(cats2$weight)
```

``` output
Expand All @@ -272,18 +280,18 @@ we did on them before, we run into trouble:


``` r
cats$weight + 2
cats2$weight + 2
```

``` error
Error in cats$weight + 2: non-numeric argument to binary operator
Error in cats2$weight + 2: non-numeric argument to binary operator
```

What happened?
The `cats` data we are working with is something called a *data frame*. Data frames
The `cats` (and `cats2`) data we are working with is something called a *data frame*. Data frames
are one of the most common and versatile types of *data structures* we will work with in R.
A given column in a data frame cannot be composed of different data types.
In this case, R does not read everything in the data frame column `weight` as a *double*, therefore the entire
In this case, R cannot store everything in the data frame column `weight` as a *double* anymore once we add the row for the additional cat (because its weight is `2.3 or 2.4`), therefore the entire
column data type changes to something that is suitable for everything in the column.

When R reads a csv file, it reads it in as a *data frame*. Thus, when we loaded the `cats`
Expand All @@ -292,42 +300,22 @@ is written by the `str()` function:


``` r
str(cats)
str(cats2)
```

``` output
'data.frame': 4 obs. of 3 variables:
$ coat : chr "calico" "black" "tabby" "tabby"
$ weight : chr "2.1" "5" "3.2" "2.3 or 2.4"
$ likes_string: int 1 0 1 1
$ likes_catnip: num 1 0 1 1
```

*Data frames* are composed of rows and columns, where each column has the
same number of rows. Different columns in a data frame can be made up of different
data types (this is what makes them so versatile), but everything in a given
column needs to be the same type (e.g., vector, factor, or list).

Let's explore more about different data structures and how they behave.
For now, let's remove that extra line from our cats data and reload it,
while we investigate this behavior further:

feline-data.csv:

```
coat,weight,likes_catnip
calico,2.1,1
black,5.0,0
tabby,3.2,1
```

And back in RStudio:


``` r
cats <- read.csv(file="data/feline-data.csv")
```


Let's explore more about different data structures and how they behave. For now, we will focus on our original data frame `cats` (and we can forget about `cats2` for the rest of this episode).

### Vectors and Type Coercion

Expand Down Expand Up @@ -555,8 +543,7 @@ Create a new script in RStudio and copy and paste the following code. Then
move on to the tasks below, which help you to fill in the gaps (\_\_\_\_\_\_).

```
# Read data
cats <- read.csv("data/feline-data_v2.csv")
Using the object `cats2`:
# 1. Print the data
_____
Expand All @@ -568,15 +555,15 @@ _____(cats)
# The correct data type is: ____________.
# 4. Correct the 4th weight data point with the mean of the two given values
cats$weight[4] <- 2.35
cats2$weight[4] <- 2.35
# print the data again to see the effect
cats
# 5. Convert the weight to the right data type
cats$weight <- ______________(cats$weight)
cats2$weight <- ______________(cats2$weight)
# Calculate the mean to test yourself
mean(cats$weight)
mean(cats2$weight)
# If you see the correct mean value (and not NA), you did the exercise
# correctly!
Expand All @@ -586,7 +573,7 @@ mean(cats$weight)

#### 1\. Print the data

Execute the first statement (`read.csv(...)`). Then print the data to the
Print the data to the
console

::::::::::::::: solution
Expand All @@ -601,8 +588,8 @@ Show the content of any variable by typing its name.
Two correct solutions:

```
cats
print(cats)
cats2
print(cats2)
```

:::::::::::::::::::::::::
Expand All @@ -611,7 +598,7 @@ print(cats)

The data type of your data is as important as the data itself. Use a
function we saw earlier to print out the data types of all columns of the
`cats` table.
`cats2` `data.frame`.

::::::::::::::: solution

Expand All @@ -628,15 +615,14 @@ here.
> ### Solution to Challenge 1.2
>
> ```
> str(cats)
> str(cats2)
> ```
#### 3\. Which data type do we need?
The shown data type is not the right one for this data (weight of
a cat). Which data type do we need?
- Why did the `read.csv()` function not choose the correct data type?
- Fill in the gap in the comment with the correct data type for cat weight!
::::::::::::::: solution
Expand Down Expand Up @@ -715,8 +701,8 @@ auto-complete function: Type "`as.`" and then press the TAB key.
> There are two functions that are synonymous for historic reasons:
>
> ```
> cats$weight <- as.double(cats$weight)
> cats$weight <- as.numeric(cats$weight)
> cats2$weight <- as.double(cats2$weight)
> cats2$weight <- as.numeric(cats2$weight)
> ```
::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
2 changes: 1 addition & 1 deletion data/feline-data.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"coat","weight","likes_string"
"coat","weight","likes_catnip"
"calico",2.1,1
"black",5,0
"tabby",3.2,1
2 changes: 1 addition & 1 deletion md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"episodes/01-rstudio-intro.Rmd" "04f6b758558750cef962768d78dd63b0" "site/built/01-rstudio-intro.md" "2024-12-03"
"episodes/02-project-intro.Rmd" "cd60cc3116d4f6be92f03f5cc51bcc3b" "site/built/02-project-intro.md" "2024-12-03"
"episodes/03-seeking-help.Rmd" "d24c310b8f36930e70379458f3c93461" "site/built/03-seeking-help.md" "2024-12-03"
"episodes/04-data-structures-part1.Rmd" "afc6c3ced3677ab088457152f8d84b54" "site/built/04-data-structures-part1.md" "2024-12-03"
"episodes/04-data-structures-part1.Rmd" "5e680e381a7d16228ee1ee2c9ec8a151" "site/built/04-data-structures-part1.md" "2025-01-07"
"episodes/05-data-structures-part2.Rmd" "95c5dd30b8288090ce89ecbf2d3072bd" "site/built/05-data-structures-part2.md" "2024-12-03"
"episodes/06-data-subsetting.Rmd" "5d4ce8731ab37ddea81874d63ae1ce86" "site/built/06-data-subsetting.md" "2024-12-03"
"episodes/07-control-flow.Rmd" "6a8691c8668737e4202f49b52aeb8ac6" "site/built/07-control-flow.md" "2024-12-03"
Expand Down

0 comments on commit 1c75ecf

Please sign in to comment.