forked from susanli2016/Data-Analysis-with-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
alcoholConsumptionCanada.Rmd
231 lines (170 loc) · 11.2 KB
/
alcoholConsumptionCanada.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
output:
html_document:
smart: false
---
Alcohol Consumption in Canada
===================================================================
By Susan Li
March 21, 2017
According to [2012 Health Canada Canadian Alcohol and Drug Use Monitoring Survey (CADUMS)](http://www.hc-sc.gc.ca/hc-ps/drugs-drogues/stat/_2012/summary-sommaire-eng.php), 78.4 percent of Canadians ages 15 or older reported that they drank alcohol in the past 12 months, 91 percent reported lifetime alcohol use. At least 14 percent of those drinkers drank enough to be at risk for immediate injury and harm with at least 20 percent at risk for chronic health effects, such as liver cirrhosis and various forms of cancer.
Now let's look at the hard truth.
```{r global_options, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)
```
My first dataset was downloaded from [Government Canada website - Volume of sales of alcoholic beverages in litres of absolute alcohol and per capita 15 years and over, from 1989 to 2013](http://open.canada.ca/data/en/dataset/93f88532-193b-48a9-8ccb-f483557b28e5).The data seems pretty clean, no missing values.
```{r}
drink <- read.csv('01830019-eng.csv', stringsAsFactors = F)
drink <- drink[complete.cases(drink[,-1]),]
drink <- drink[drink$SALES == 'Total per capita sales', ]
drink$Value<-as.numeric(drink$Value)
```
### Do we drink more or less compare with several years ago? And what do we drink?
Alcohol consumption is defined as annual sales of pure alcohol in litres per person aged 15 years and older. So I will only use the data per capita.
```{r}
library(ggplot2)
ggplot(aes(x = Ref_Date, y = Value, group = 1), data = drink[(drink$GEO == 'Canada') & (drink$BEVERAGE == 'Total alcoholic beverages'), ]) +
geom_line(size = 2.5, alpha = 0.7, color = "mediumseagreen") +
geom_point(size = 0.5) + xlab("Year") + ylab("Pure Alcohol by Litre per Capita") +
ggtitle('Alcohol Consumption Per Capita')
```
Canadians from 25 years ago were bigger drinkers than we are today.
Let's have a look what we drink.
```{r}
library(ggthemes)
ggplot(data=drink[(drink$GEO == 'Canada') & (drink$BEVERAGE != 'Total alcoholic beverages'), ], aes(x=Ref_Date, y=Value, group = BEVERAGE, colour = as.factor(BEVERAGE))) + geom_line() + theme_solarized() +
xlab('Year') + ylab('Pure Alcohol by Litre') + ggtitle('Alcohol Consumption per Capita')
```
Seems Canadians drink a lot more wine, a little bit more liquor and less beer. Although it's declining, beer is still the most popular alcoholic drink in Canada.
### How about each province?
```{r}
drink$BEVERAGE <- ordered(drink$BEVERAGE, levels = c('Total alcoholic beverages', 'Beer', 'Spirits', 'Wine'))
ggplot(data=drink[(drink$GEO != 'Canada'), ], aes(x=Ref_Date, y=Value, group = BEVERAGE, colour = as.factor(BEVERAGE))) + geom_line() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab('Year') + ylab('Pure Alcohol by Litre') + ggtitle('Alcohol Consumption per Capita') + facet_wrap(~GEO)
```
From coast to coast, beer is by far the most favorite drink of Canadian. Not all provinces favor the same drink. Yukon leads nation in total alcohol consumption and beer consumption. Quebec is the biggest per capita consumer of wine,
Explore the data in details for the year 2013.
```{r}
ggplot(aes(x = reorder(GEO, Value), y = Value), data = drink[(drink$Ref_Date== 2013), ] ) +
geom_bar(stat = 'identity') +
theme_tufte() +
theme(axis.text = element_text(size = 10, face = 'bold')) +
xlab('') +
ylab('Pure Alcohol by litre per capita') +
coord_flip() +
ggtitle("How Much Canadians Drink in 2013") + theme_solarized() +
facet_wrap(~BEVERAGE)
```
While Yukon leads the nation in total alcohol, beer and spirits consumption, New Brunswick alcohol consumption are the lowest in Canada, and there is no category where New Brunswick consumed more than the national average. And no province drinks more wine than Quebec, but consumption of spirits in Quebec is lower than the other categories and the other provinces.
### Canada & Ontario
Because I live in Toronto, it would be interesting to see the evolution of alcohol consumption in Canada and Ontario.
```{r}
canada <- drink[drink$GEO == 'Canada', ]
ontario <- drink[drink$GEO == 'Ontario', ]
canada_ontario <- rbind(canada, ontario)
ggplot(aes(x = Ref_Date, y = Value, color = GEO), data = canada_ontario) +
geom_line() + theme_solarized() +
xlab("Year") + ylab("Pure Alcohol by Litre per Capita") +
ggtitle('Evolution of Alcohol Consumption in Canada and Ontario') + facet_wrap(~BEVERAGE)
```
After decreasing until around 1997, alcohol consumption in Ontario began to rise. It has been steady since 2010. Spirits is the only category that Ontario has reached national consumption level.
According to [Statistics Canada](http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/health80a-eng.htm), Heavy drinking refers to males who reported having 5 or more drinks, or women who reported having 4 or more drinks, on one occasion, at least once a month in the past year.
### How heavy do we drink?
My second data source comes from [Statistics Canada - Percentage who reported heavy drinking from 2001 to 2014](http://www.statcan.gc.ca/eng/start). I will have to scrape the data from Statistics Canada's web site, clean it up and re-arrange the format.
```{r}
library(XML)
theurl = "http://www.statcan.gc.ca/pub/82-625-x/2015001/article/14183/c-g/desc/desc01-eng.htm"
theurl.table = readHTMLTable(theurl, header=T, which=1,stringsAsFactors=F)
theurl.table <- setNames(theurl.table, c("Year","Male","Female"))
theurl.table$Year <-as.numeric(theurl.table$Year)
theurl.table$Male <- as.numeric(theurl.table$Male)
theurl.table$Female <- as.numeric(theurl.table$Female)
```
```{r}
# melt the data from wide to long format and combine two plots into one.
library(reshape2)
theurl.table1 <- melt(data = theurl.table, id.vars = "Year")
ggplot(data = theurl.table1, aes(x = Year, y = value, colour = variable)) + geom_line() + theme_solarized() + ylab('Percent') +
ggtitle('Percentage who reported heavy drinking by Gender')
```
Males are more likely to replot heavy drinking than females. When we see males report less on heavy drinking during the recentl years, females' report on heavy drinking is on the rise. This is worrisome.
```{r}
url = "http://www.statcan.gc.ca/pub/82-625-x/2015001/article/14183/c-g/desc/desc02-eng.htm"
url.table = readHTMLTable(url, header=T, which=1,stringsAsFactors=F)
url.table <- setNames(url.table, c("Age","Male","Female"))
url.table$Age <-as.factor(url.table$Age)
url.table$Male <- as.numeric(url.table$Male)
url.table$Female <- as.numeric(url.table$Female)
```
```{r}
url.table1 <- melt(data = url.table, id.vars = "Age")
url.table1$Age <- ordered(url.table1$Age, levels=c('Total (12 or older)', '12 to 15', '16 to 17', '18 to 19', '20 to 34', '35 to 44', '45 to 54', '55 to 64', '65 or older'))
ggplot(data = url.table1, aes(x = Age, y = value, fill = variable)) + geom_bar(stat = 'identity', position = position_dodge()) + theme_solarized() + ylab('Percent') +
ggtitle('Percentage who reported heavy drinking by Gender and Age Group')
```
Gender differences in alcohol consumption are found everywhere but closed among 16 to 19 years old. In general, more men binge drink than women, but heavy drinking among young Canadian women should cause alarm.
I would like to drill down into each province, after googling around, I found a small dataset from a [Health Canada report - Rates of Heavy Drinking in 2014 by Province and Gender](http://healthycanadians.gc.ca/publications/department-ministere/state-public-health-alcohol-2015-etat-sante-publique-alcool/alt/state-phac-alcohol-2015-etat-aspc-alcool-eng.pdf). Because it is in a PDF format, I will have to manully create my own.
```{r}
library(xlsx)
alcohol <- read.xlsx('alcohol_2014.xlsx', sheetIndex = 1, header = TRUE, stringsAsFactors = F)
# Again, change the format from wide to long
alcohol <- melt(data = alcohol, id.vars = "GEO")
ggplot(data = alcohol, aes(x = GEO, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = position_dodge()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab('Percent') +
ggtitle('Percentage who reported heavy drinking by Gender and Province in 2014')
```
Northwest Territories, Yukon, and Newfoundland & Labrador had the highest heavy drinking rate in 2014, while British Columbia, Nunavut and Ontario reported lower then overall Canadian rate.
### A Quick Comparison
It's time to have a quick comparison with our south border and United Kingdom and Australia, how much alcohol do they drink?
I downloaded a dataset from [Word Health Organization(WHO) - Recorded alcohol per capita consumption, from 2000 TO 2014](http://apps.who.int/gho/data/node.main.A1026?lang=en). The dataset need quite some cleaning.
```{r echo=FALSE, warning=FALSE, message=FALSE}
data <- read.csv('data.csv', stringsAsFactors = F)
data$X <- NULL
data$X.1 <- NULL
cols.num <- c("X2010","X2009", "X2008", "X2007", "X2006", "X2005", "X2004", "X2003", "X2002", "X2001", "X2000")
data[cols.num] <- sapply(data[cols.num],as.numeric)
data$X2015 <- NULL
data_can <- data[data$Country == 'Canada', ]
data_usa <- data[data$Country == 'United States of America', ]
data_uk <- data[data$Country == 'United Kingdom of Great Britain and Northern Ireland', ]
data_au <- data[data$Country == 'Australia', ]
data_can_usa_uk_au <- rbind(data_can, data_usa, data_uk, data_au)
data_can_usa_uk_au <- melt(data = data_can_usa_uk_au, id.vars = "Country")
```
```{r}
ggplot(data = data_can_usa_uk_au, aes(x = variable, y = value, fill = Country)) +
geom_bar(stat = 'identity', position = position_dodge()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab('Pure Litre per Capita') +
xlab('Year') +
ggtitle('Alcohol Consumption from 2000 to 2014')
```
Here it is. Now I believe that The UK has love affair With alcohol.
### Where is Canada in the global map?
Again, I am using data from the World Health Organisation, the map below colour-codes countries according to the amount of pure alcohol the average citizen aged 15 or over consumes in a year for 2011.
```{r}
alcohol_global <- read.csv('alcohol_global.csv', header = T, stringsAsFactors = F)
```
```{r echo=FALSE, warning=FALSE, message=FALSE, comment=NULL, results='hide'}
library(rworldmap)
# create a map shaped window
mapDevice('x11')
al_df <- joinCountryData2Map(alcohol_global, joinCode="NAME", nameJoinColumn="Country")
```
```{r echo=FALSE, comment=NULL, message=FALSE, warning=FALSE}
mapCountryData(al_df, nameColumnToPlot="Pure_Alcohol_by_Litre_per_Capita", catMethod="fixedWidth")
```
It looks like we drink similar amount of alcohol with the US, Brazil, Argentina, Germany and Poland.
Here is the top 10:
```{r}
library(dplyr)
head(arrange(alcohol_global,desc(Pure_Alcohol_by_Litre_per_Capita)), 10)
```
Resources:
1. [Statistics Canada](http://www.statcan.gc.ca/eng/start)
2. [Health Canada](http://www.hc-sc.gc.ca/index-eng.php)
3. [Government of Canada](http://open.canada.ca/en/open-data)
4. [World Health Organization](http://www.who.int/en/)