Skip to content

Latest commit

 

History

History
89 lines (65 loc) · 2.26 KB

p8105_hw1_hs3393.md

File metadata and controls

89 lines (65 loc) · 2.26 KB

p8105_hw1_hs3393

2022-09-16

load packages

data("penguins", package = "palmerpenguins")
library(ggplot2)
library(tidyverse)

Problem 1

The dataset penguins describes:

  1. Three species of penguins, including adelie, chinstrap and gentoo.
  2. The penguins lives on three islands including Biscoe, dream and Torgersen.
  3. The data also describes the bill length, bill depth, flipper length, body mass, sex of the penguins.
  4. The data includes the year of observation.

The size of the dataset is 344 (rows) * 8 (columns). The mean of flipper length is 200.9152047 mm when NA is removed.

plot and save figure

ggplot(data = penguins, aes(x = bill_length_mm, y = flipper_length_mm, color = species)) + geom_point()

ggsave(filename = "flipper_length vs bill_length.png", width = 5, height = 2.5)

Problem 2

Create a dataframe

df <- tibble(
  list_norm = rnorm(10),
  list_logical = list_norm > 0 ,
  list_char = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
  list_factor = factor(c(rep( c("a","b","c"), 3), "c"))
)

for (i in 1:4){
  print((mean(pull(df , i))))
}
## [1] 0.5562774
## [1] 0.7

## Warning in mean.default(pull(df, i)): argument is not numeric or logical:
## returning NA

## [1] NA

## Warning in mean.default(pull(df, i)): argument is not numeric or logical:
## returning NA

## [1] NA

The mean function works on the sample from normal distribution and their logical vector indicating wheter they are greater than zero. However, it cannot work on character vector and factor vector.

converting variable types

list_logical = c(T, F, T, F, F, T)
list_char = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
list_factor = factor(c(rep( c("a","b","c"), 3), "c"))

as.numeric(list_logical)
as.numeric(list_char)
as.numeric(list_factor)

The result indicates that R can get the mean of a data when their data types are logical, because it can convert logical data to numerical data (True = 1 and False = 0). R cannot convert character into numeric data, so it cannot get their mean. Although R can convert factor to numeric data based on their levels, R still cannot get their mean.