Add initial files

jamesmbaazam · Mar 23, 2024 · 4c35148 · 4c35148
1 parent eb07d56
commit 4c35148
Show file tree

Hide file tree

Showing 138 changed files with 9,454 additions and 28 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,4 +4,3 @@
 .Ruserdata
 
 /.quarto/
-/_site/
diff --git a/README.md b/README.md
@@ -1 +1,61 @@
-# My Quarto Presentation Template
+# Intro to Arrow Workshop
+
+by Steph Hazlitt & Nic Crane
+
+
+### Workshop Website
+
+This repository contains materials for the **Intro to Arrow** workshop.
+
+### Workshop Overview
+
+This workshop will focus on using the arrow R package---a mature R interface to Apache Arrow---to process larger-than-memory files and multi-file datasets with arrow using familiar dplyr syntax. You'll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R.
+
+*This course is for you if you:*
+
+-   want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow
+-   want to learn about Parquet and other file formats that are powerful alternatives to CSV files
+-   want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow
+
+### Workshop Prework
+
+Detailed instructions for software requirements and data sources are shown below.
+
+#### Packages
+
+To install the required core packages for the workshop, run the following:
+
+```{r}
+install.packages(c(
+  "arrow", "dplyr", "stringr", "lubridate", "tictoc"
+))
+```
+#### Seattle Checkouts by Title Data
+
+This is the data we will use in the workshop. It's a good-sized, single CSV file---*9GB* on-disk in total, which can be downloaded from an AWS S3 bucket via https:
+
+```{r}
+options(timeout = 1800)
+download.file(
+  url = "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv",
+  destfile = "./data/seattle-library-checkouts.csv"
+)
+```
+
+#### Tiny Data Option
+
+If you don't have time or disk space to download the 9Gb dataset (and still have disk space to do the exercises), you can run the code in the workshop with "tiny" version of this data. Although the focus in this course is working with larger-than-memory data, you can still learn about the concepts and workflows with smaller data---although note you may not see the same performance improvements that you would get when working with larger data.
+
+```{r}
+options(timeout = 1800)
+download.file(
+  url = "https://github.com/posit-conf-2023/arrow/releases/download/v0.1.0/seattle-library-checkouts-tiny.csv",
+  destfile = "./data/seattle-library-checkouts-tiny.csv"
+)
+```
+
+If you want to participate in the coding exercise or follow along, please try your very best to begin the workshop ready with the required software & packages installed and the data downloaded on to your laptop.
+
+------------------------------------------------------------------------
+
+![](https://i.creativecommons.org/l/by/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json
@@ -0,0 +1,14 @@
+{
+  "hash": "21ece2c7abde2d9f3883038a17ddad68",
+  "result": {
+    "markdown": "---\ntitle: Intro to Arrow in R\nsubtitle: A short workshop\neditor: source\n---\n\n\n![](images/logo.png){width=\"30%\" fig-align=\"center\"}\n\n### Workshop Overview\n\nThis workshop will focus on using the arrow R package to process larger-than-memory files and multi-file datasets with arrow using familiar dplyr syntax. You'll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R.\n\n*This course is for you if you:*\n\n-   want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow\n-   want to learn about Parquet and other file formats that are powerful alternatives to CSV files\n-   want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow\n\n### Workshop Prework\n\nDetailed instructions for software requirements and data sources are show below.\n\n#### Packages\n\nTo install the required core packages for the workshop, run the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(c(\n  \"arrow\", \"dplyr\", \"stringr\", \"lubridate\", \"tictoc\"\n))\n```\n:::\n\n\n\n#### Seattle Checkouts by Title Data\n\nThis is the data we will use in the workshop. It's a good-sized, single CSV file---*9GB* on-disk in total, which can be downloaded from an AWS S3 bucket via https:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(timeout = 1800)\ndownload.file(\n  url = \"https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv\",\n  destfile = \"./data/seattle-library-checkouts.csv\"\n)\n```\n:::\n\n\n#### Tiny Data Option\n\nIf you don't have time or disk space to download the 9Gb dataset (and still have disk space to do the exercises), you can run the code in the workshop with the \"tiny\" version of this data. Although the focus in this course is working with larger-than-memory data, you can still learn about the concepts and workflows with smaller data---although note you may not see the same performance improvements that you would get when working with larger data.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(timeout = 1800)\ndownload.file(\n  url = \"https://github.com/posit-conf-2023/arrow/releases/download/v0.1.0/seattle-library-checkouts-tiny.csv\",\n  destfile = \"./data/seattle-library-checkouts-tiny.csv\"\n)\n```\n:::\n\n\nIf you want to participate in the coding exercise or follow along, please try your very best to begin the workshop ready with the required software & packages installed and the data downloaded on to your laptop.\n\n------------------------------------------------------------------------\n\n![](https://i.creativecommons.org/l/by/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/site_libs/clipboard/clipboard.min.js b/_freeze/site_libs/clipboard/clipboard.min.js
diff --git a/_freeze/site_libs/revealjs/dist/reset.css b/_freeze/site_libs/revealjs/dist/reset.css
@@ -0,0 +1,30 @@
+/* http://meyerweb.com/eric/tools/css/reset/
+   v4.0 | 20180602
+   License: none (public domain)
+*/
+
+html, body, div, span, applet, object, iframe,
+h1, h2, h3, h4, h5, h6, p, blockquote, pre,
+a, abbr, acronym, address, big, cite, code,
+del, dfn, em, img, ins, kbd, q, s, samp,
+small, strike, strong, sub, sup, tt, var,
+b, u, i, center,
+dl, dt, dd, ol, ul, li,
+fieldset, form, label, legend,
+table, caption, tbody, tfoot, thead, tr, th, td,
+article, aside, canvas, details, embed,
+figure, figcaption, footer, header, hgroup,
+main, menu, nav, output, ruby, section, summary,
+time, mark, audio, video {
+  margin: 0;
+  padding: 0;
+  border: 0;
+  font-size: 100%;
+  font: inherit;
+  vertical-align: baseline;
+}
+/* HTML5 display-role reset for older browsers */
+article, aside, details, figcaption, figure,
+footer, header, hgroup, main, menu, nav, section {
+  display: block;
+}
diff --git a/_freeze/site_libs/revealjs/dist/reveal.css b/_freeze/site_libs/revealjs/dist/reveal.css
diff --git a/_freeze/site_libs/revealjs/dist/reveal.esm.js b/_freeze/site_libs/revealjs/dist/reveal.esm.js
diff --git a/_freeze/site_libs/revealjs/dist/reveal.esm.js.map b/_freeze/site_libs/revealjs/dist/reveal.esm.js.map
diff --git a/_freeze/site_libs/revealjs/dist/reveal.js b/_freeze/site_libs/revealjs/dist/reveal.js
diff --git a/_freeze/site_libs/revealjs/dist/reveal.js.map b/_freeze/site_libs/revealjs/dist/reveal.js.map
diff --git a/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/LICENSE b/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/LICENSE
@@ -0,0 +1,2 @@
+SIL Open Font License (OFL)
+http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL
diff --git a/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.css b/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.css
@@ -0,0 +1,10 @@
+@font-face {
+    font-family: 'League Gothic';
+    src: url('./league-gothic.eot');
+    src: url('./league-gothic.eot?#iefix') format('embedded-opentype'),
+         url('./league-gothic.woff') format('woff'),
+         url('./league-gothic.ttf') format('truetype');
+
+    font-weight: normal;
+    font-style: normal;
+}
diff --git a/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.eot b/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.eot
diff --git a/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.ttf b/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.ttf
diff --git a/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.woff b/_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.woff
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		SIL Open Font License (OFL)
		http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL