Skip to content

Commit

Permalink
add week 12 R materials
Browse files Browse the repository at this point in the history
  • Loading branch information
mgyliu committed Apr 8, 2024
1 parent fb886dd commit c836bc8
Show file tree
Hide file tree
Showing 7 changed files with 12,447 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,376 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "30fb11e091925d78d9f8e5fea8b8d954",
"grade": false,
"grade_id": "cell-24f9bfcfc0c028b2",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# DSCI 100: Introduction to Data Science\n",
"\n",
"## Tutorial 12 - Bootstrapping and Confidence Intervals"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "1c4488762a1c90482585a8fd664723ea",
"grade": false,
"grade_id": "cell-a30d7f12f24db0c2",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"First, load the necessary libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "9d71259d39d0e6aac1e365d392a08265",
"grade": false,
"grade_id": "cell-862ad148734f02d5",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"### Run this cell before continuing\n",
"library(tidyverse)\n",
"library(repr)\n",
"library(digest)\n",
"library(infer)\n",
"library(gridExtra)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "d1c3cb6df74096826c48d2e7ceccf32e",
"grade": false,
"grade_id": "cell-7128b905ae8e4202",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Let's revisit our students from last week."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "4847686358d586fd820d755d21b8097d",
"grade": false,
"grade_id": "cell-411f5c32f74e8702",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# run this cell to simulate a finite population\n",
"set.seed(12341)\n",
"students_pop <- tibble(grade = (rnorm(mean = 70, sd = 8, n = 10000)))\n",
"head(students_pop)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "0a9574261feee64562357b584b045381",
"grade": false,
"grade_id": "cell-8dbbe8cbaae55ef9",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"To figure out the class average, you ask the 50 classmates in your groupchat. Draw a single sample of size 50 from the population. Drop the `replicate` column by using `ungroup()` followed by `select()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "7acd83508797cc62ba1957020fed45d8",
"grade": true,
"grade_id": "cell-1ca3281e90b09d1f",
"locked": false,
"points": 0,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"set.seed(12341)\n",
"\n",
"# your code here\n",
"fail() # No Answer - remove if you provide an answer\n",
"\n",
"head(sample)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "21dbbe739216af18fc1d7e04041868ac",
"grade": false,
"grade_id": "cell-0aae1b00b224ee46",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"You don't know anyone else in the class, so the best way to estimate the class average is by bootstrapping. Take 100 bootstrap samples from the sample you just drew. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "33de555a5daedc1d6ab623ce8c914a47",
"grade": true,
"grade_id": "cell-2759d44e9c66a7d4",
"locked": false,
"points": 0,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"set.seed(12341)\n",
"\n",
"# your code here\n",
"fail() # No Answer - remove if you provide an answer\n",
"\n",
"head(bootstrap)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "0fcf96cbb9af4bea364886d739d745cf",
"grade": false,
"grade_id": "cell-cd6602003e295cda",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Calculate the mean for each set of bootstrap samples and visualize the distribution using a histogram with `binwidth = 0.5`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "963dcf1a4501564cdfd6660ab7353cff",
"grade": true,
"grade_id": "cell-1e6adb2a7f93ea0c",
"locked": false,
"points": 0,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# your code here\n",
"fail() # No Answer - remove if you provide an answer\n",
"\n",
"bootstrap_dist"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "78240c51a927234c8342a509863ae3a3",
"grade": false,
"grade_id": "cell-197073db13b4c44c",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Using this bootstrap distribution, calculate 95% and 80% confidence intervals. Which interval has a wider range?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "dabec1e4157690580a87cd7d1dd186fc",
"grade": true,
"grade_id": "cell-418ddd4dae62425a",
"locked": false,
"points": 0,
"schema_version": 3,
"solution": true,
"task": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# your code here\n",
"fail() # No Answer - remove if you provide an answer\n",
"\n",
"bootstrap_95\n",
"bootstrap_80"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "8223c38cc1d43b48891ae0b8e528678f",
"grade": false,
"grade_id": "cell-a8c888ca6e7be087",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"Finally, calculate the mean of the original population. Which of our confidence intervals encapsulate the true population mean?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mean(students_pop$grade)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.1.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
1 change: 1 addition & 0 deletions materials/R/worksheet_inference2/cleanup.R
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# clean up data files that students output
Loading

0 comments on commit c836bc8

Please sign in to comment.