-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathseminar04.qmd
372 lines (247 loc) · 11.5 KB
/
seminar04.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
---
title: "Seminar 04"
subtitle: "MA22004"
date: "2024-10-09"
author: "Dr Eric Hall • ehall001@dundee.ac.uk"
format:
revealjs:
chalkboard: true
html-math-method: katex
theme: [default, resources/custom.scss]
css: resources/fonts.css
logo: resources/logo.png
footer: ""
template-partials:
- resources/title-slide.html
transition: slide
background-transition: fade
from: markdown+emoji
lang: en
---
```{r}
#| include: false
knitr::opts_chunk$set(echo = FALSE, comment = "", fig.asp = .5)
library(tidyverse)
library(latex2exp)
library(knitr)
library(kableExtra)
library(janitor)
library(fontawesome)
library(latex2exp)
```
# Announcements {.mySegue .center}
:::{.hidden}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\E}{\mathbf{E}}
\DeclareMathOperator{\Cov}{Cov}
\DeclareMathOperator{\corr}{corr}
\newcommand{\se}{\mathsf{se}}
\DeclareMathOperator{\sd}{sd}
:::
## Attendance
::: {layout="[[-1], [1], [-1]]"}
![](images/seats.png){fig-align="center" fig-alt="Register your attendance using SEAtS"}
:::
## Reminders
- Discuss labs, R, and RStudio at Thu workshop.
- Discuss worksheet 3 at Fri workshop.
- Lab 2 due **Fri 2024-10-11** at **17:00**.
## Outline of today `r fa("compass")`
1. Inference for means
2. Inference for proportions
3. How much water? `r fa("lightbulb")`
# Inference for means {.mySegue .center}
## Point estimate for mean
Consider data $x_1, \dots, x_m$.
(Assuming it exists)
A point estimate for the population mean, $\mu$, is:
$$\hat{\mu} = \frac{1}{m} \sum_{i=1}^{m} x_i \,.$$
## Consider the _population_ {.smaller}
We consider inferences for the population mean assuming that samples are drawn for one of three different populations.
1. Normal population with known $\sigma$
2. Any population with unknown $\sigma$, when $m$ large
3. Normal population with unknown $\sigma$, $m$ may be small
:::{.callout-important}
## What ...
... changes?
... stays the same?
:::
## Confidence intervals for mean
The basic form is awlays:
$$\text{pt est} \pm \underbrace{(\text{critical value}) \cdot (\text{precision of pt est})}_\text{margin of error}$$
:::{.callout-warning}
## What changes?
Margin of error!
:::
## Hypothesis tests for means
Tests on equality of mean $\mu$, i.e., $H_0 : \mu = \mu_0$ against one of the alternatives:
- $H_a : \mu \neq \mu_0$
- $H_a : \mu > \mu_0$
- $H_a : \mu < \mu_0$
:::{.callout-warning}
## What changes?
Test statistic!
:::
:::{.notes}
- Does the available data provides sufficient evidence to reject a null hypothesis $H_0$ which is assumed to be true?
- Note the equality is part of the null hypothesis.
- If the observations disagree with $H_0$, then we reject the null hypothesis.
- If the sample evidence does not strongly contradict $H_0$, then we continue to believe $H_0$.
- $\mu_0$ is the **null value** that separate the null and alternative
:::
## Forms for $100(1-\alpha)\%$ CIs {.smaller}
1. Normal population with known $\sigma$:
$$\overline{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{m}}$$
2. Any population with unknown $\sigma$, when $m$ large
$$\overline{x} \pm z_{\alpha/2} \frac{s}{\sqrt{m}}$$
3. Normal population with unknown $\sigma$, $m$ may be small
$$\overline{x} \pm t_{\alpha/2, m-1} \frac{s}{\sqrt{m}}$$
## Hypothesis test statistics {.smaller}
1. Normal population with known $\sigma$:
$$Z = \frac{\overline{X} - \mu_0}{\sigma / \sqrt{m}} \sim \mathsf{N}(0,1)$$
2. Any population with unknown $\sigma$, when $m$ large
$$Z = \frac{\overline{X} - \mu_0}{S / \sqrt{m}} \sim \mathsf{N}(0,1)$$
3. Normal population with unknown $\sigma$, $m$ may be small
$$T = \frac{\overline{X} - \mu_0}{S / \sqrt{m}} \sim \mathsf{t}(m-1)$$
:::{.notes}
- Once we have sample data we make point estimate!
:::
# `r fa("dumbbell")` Freshman 15 {.mySegue .center}
> A common belief among the lay public is that body weight increases after entry into [university], and the phrase "freshman 15" has been coined to describe the 15 pounds that students presumably gain over their freshman [i.e. first] year. (from Devore, exm #8.2)
## Example (2/n) {.smaller}
Let $\mu$ denote the true average weight gain of women over the course of their first year at university. The foregoing quote suggests that we should test the hypotheses $H_0: \mu = 15$ versus $H_a : \mu \neq 15$.
Suppose that a random sample of $m$ such individuals is selected and the weight gain of each one is determined, resulting in a sample mean weight gain $\overline{x}$ and a sample standard deviation $s$.
:::{.notes}
- Before data is obtained, the sample mean weight gain is a random variable $X$ and the sample standard deviation is also a random variable $S$!
:::
## Example (3/n)
:::{.notes}
- $$Z = \frac{\overline{X} - \mu}{\sigma/ \sqrt{m}}$$
- Assuming $H_0$ is true: $$Z = \frac{\overline{X} - 15}{\sigma/ \sqrt{m}}$$
- But don't know $\sigma$:
$$Z = \frac{\overline{X} - 15}{S/ \sqrt{m}} \sim \mathsf{N}(0,1)$$
:::
## Example (4/n)
Suppose $\overline{x} = 13.7$ and plugging in values for $s$ and $\sqrt{n}$ yields $z = -2.8$.
Which values of the test statistic are at least as contradictory to $H_0$ as $-2.80$ itself?
:::{.notes}
- E.g., $m = 40$ and $s = 2.936401$
- Possible values: 13.5, 13.0, any value smaller than 13.7 is more contradictory to $H_0$ than 13.7
- 16.3 is just as contradictory to $H_0$ as is 13.7; both fall the same distance from the null value 15
- Just as values of $\overline{x}$ that are at most 13.7 correspond to $z\leq -2.8$, values of $\overline{x}$ that are at least 16.3 correspond to $z \geq 2$
:::
## Example (5/n)
$P( Z \leq -2.80 \quad \text{or}\quad Z \geq 2.80 \quad \text{assuming } H_0 \text{ true})$
:::{.notes}
$\approx 2 \cdot$ (area under the $z$ curve ot the right of 2.80)
$= 2[1 - \Phi(2.80)] = 2[1 - 0.9974] = 0.0052
- If the null hypothesis is in fact true, only about one half of one percent of all samples would result in a test statistic value at least as contradictory to the null hypothesis as is our value. Clearly -2.80 is among the possible test statistic values that are most contradictory to $H_0$.
- The evidence suggest to reject $H_0$.
:::
## Example (6/n) {.smaller}
The article *Freshman 15: Fact or Fiction* (Obesity, 2006: 1438– 1443), for $m = 137$ students, the $\overline{x} = 2.42$ lb with $s = 5.72$ lb.
This gives $$z = (2.42 - 15)/(5.72 /\sqrt{137}) = -25.7\,.$$
The probability of observing a value at least this extreme in either direction is essentially zero!
The data very strongly contradicts the null hypothesis, and there is substantial evidence that true average weight gain is much closer to $0$ than to $15$.
:::{.notes}
- Note this implies some students lost weight!
:::
# Inference for proportions {.mySegue .center}
## (Derivation) point estimate for $p$ {.smaller}
Let $X$ be the count of members with a given property based on a sample of size $m$ from a population where a proportion $p$ share the property.
Then $\widehat{p} = X / m$ is an estimator of $p$.
Assume $m p_0 \geq 10$ and $m (1-p_0) \geq 10$.
### For example
$p\in (0,1)$ and consider iid Bernoulli trials $X_1, \dots, X_m \sim \mathsf{Bernoulli}(p)$.
$\widehat{p} = \frac{1}{m} \sum_{i=1}^m X_i\,, \qquad \mu_{\widehat{p}} = \E X_i = p\,, \qquad \sigma_{\widehat{p}} = \sqrt{\frac{\Var{X_i}}{m}} = \sqrt{\frac{p(1-p)}{m}}$
## $100(1-\alpha)\%$ CI for $p$
$$\widehat{p} \pm z_{\alpha/2} \sqrt{\frac{\widehat{p}(1-\widehat{p})}{m}}$$
:::{.callout-important}
## Compare to CI for population mean...
How is the standard error different from the standard error for estimate of mean?
:::
## Hypothesis test for $p$ at level $\alpha$ {.smaller}
Consider $H_0 : p = p_0$. The test statistic is
$$
Z = \frac{\widehat{p} - p_0}{\sqrt{p_0 (1-p_0) / m}} \,.
$$
For a hypothesis test at level $\alpha$, we use the following procedure:
- If $H_a : p > p_0$, then $P$-value is the area under $\mathsf{N}(0,1)$ to the right of $z$.
- If $H_a : p < p_0$, then $P$-value is the area under $\mathsf{N}(0,1)$ to the left of $z$.
- If $H_a : p \neq p_0$, then $P$-value is twice the area under $\mathsf{N}(0,1)$ to the right of $|z|$.
# How much water? `r fa("lightbulb")` {.mySegue .center}
## The question {.mySegue .center}
How might you estimate the proportion of the earth covered by water?
## Sampling
If we were to take a random sample of points on the globe, then the proportion that touched water would be a reasonable estimate of the overall proportion of water covering the earth.
:::{.callout-warning}
## Remember the candy!
Are random samples always representative? How can we take a random representative sample from globe (sphere)?
:::
## `r fa("lightbulb")` The task
:::: {.columns}
::: {.column width="60%"}
1. The globe will be tossed around.
2. Hit the globe with the tip of your index finger.
3. Shout:
- "water!" if your finger touches water, or
- "land!" if your finger touches land.
4. We will record your observation.
:::
::: {.column width="40%"}
![](images/globe.jpg){height=100% fig.alt="Photo of large inflatable globe (topographic)."}
**Do not flinch ;)**
:::
::::
## The result
Let $p$ be the proportion of earth's surface covered with water.
- Point estimate $\hat{p}$
- Confidence interval for $p$
- Hypothesis test: $p$ is 0.7?
## Challenge:
How could one use random numbers from the computer to pick a random spot on the globe?
Any point is characterized by a latitude (which must be between 90◦ North and 90◦ South) and a longitude (between 180◦ West and 180◦ East).
:::{.importantblock}
Can you pick a random spot on the globe by picking a latitude between −90 to 90 at random, and a longitude between −180 and 180 at random?
:::
# `r fa("dumbbell")` Churchill {.mySegue .center}
## Example: Churchill 1/n
Churchill claims that he will receive half the votes for the House of Commons seat for the constituency of Dundee.
We are skeptical that he is as popular as he says. Suppose 116 out of 263 Dundonians polled claimed that they intended to vote for Churchill.
Can it be concluded at significance level $0.10$ that more than half of all eligible Dundonains will vote for Churchill?
## Example: Churhill 2/n
:::{.tipblock}
What is the parameter of interest? What are the null and alternative hypotheses?
:::
## Example: Churhill 3/n
The parameter of interest is $p$, the proportion of votes for Churchill.
The null hypothesis is $H_0 : p = 0.5$.
The alternative hypothesis is $H_a : p < 0.5$.
:::{.warningblock}
Since $263(0.5) = 131.5 > 10$, we satisfy the assumptions to carry out the hypothesis test using the statistic described earlier.
:::
## Example: Churhill 4/n
Based on the sample, $\widehat{p} = 116 / 263 = 0.4411$.
The test statistic value is
$$
\begin{aligned}
z &= \frac{\widehat{p} - p_0}{\sqrt{p_0 (1-p_0) / m}} \\
&= \frac{0.4411 - 0.5}{\sqrt{0.5 (1-0.5) / 263}}\\
&= −1.91 \,.
\end{aligned}
$$
The $P$-value for this lower-tailed $z$ test is $P = \Phi(−1.91) = 0.028$.
## Example: Churhill 5/n
Since $P < 0.10 = \alpha$, we reject the null hypothesis at the $0.05$ level.
:::{.warningblock}
What does this mean in words? Always place the result of the hypothesis test in the context of the problem!
:::
## Example: Churhill 6/n (conclusion)
The evidence for concluding that the true proportion is different from $p_0 = 0.5$ at the $0.10$ level is compelling.
# Summary
Today we discussed: inferences for population means (and proportions, which are like population means...)
We engaged in an activity about inferences for proportions that highlighted importance of representative sampling.
:::{.callout-tip}
## Today's materials
Slides posted to <https://dundeemath.github.io/MA22004-seminar04>.
:::