-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmany-variables.Rmd
65 lines (47 loc) · 1.28 KB
/
many-variables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: "many-variables"
author: "SasanGN"
date: "December 20, 2018"
output: html_document
editor_options:
chunk_output_type: console
---
## import libraries
```{r}
library(ggplot2)
library(dplyr)
```
## Reading in the tab separated file // read.csv or read.delim
```{r}
getwd()
setwd("D://Documents//GitHub//100-days-of-code-challenge/")
list.files()
pf <- read.csv('pseudo_facebook.tsv', sep= '\t')
## inFile2 <- read.delim('pseudo_facebook.tsv')
names(pf)
```
```{r}
ggplot(aes(x = gender, y = age),
data = subset(pf, !is.na(gender))) + geom_boxplot() +
stat_summary(fun.y = mean, geom = 'point', shape = 4)
ggplot(aes(age, friend_count),
data = subset(pf, !is.na(gender))) +
geom_line(aes(color = gender), stat = 'summary', fun.y = median)
```
```{r}
pf.fc_by_age_gender <- pf %>%
filter(!is.na(gender)) %>%
group_by(age, gender) %>%
summarise(friend_count_mean = mean(friend_count),
friend_count_median = median(friend_count),
n = n()) %>%
ungroup() %>%
arrange(age)
head(pf.fc_by_age_gender, 20)
```
## create a line graph showing the median friend count over the ages of each gender
```{r}
ggplot(aes(age, friend_count_median),
data = pf.fc_by_age_gender) + geom_line(aes(color = gender))
```
## to be continued...