generated from jamesmbaazam/QuartoPresentationTemplate
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
eb07d56
commit 4c35148
Showing
138 changed files
with
9,454 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,4 +4,3 @@ | |
.Ruserdata | ||
|
||
/.quarto/ | ||
/_site/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,61 @@ | ||
# My Quarto Presentation Template | ||
# Intro to Arrow Workshop | ||
|
||
by Steph Hazlitt & Nic Crane | ||
|
||
|
||
### Workshop Website | ||
|
||
This repository contains materials for the **Intro to Arrow** workshop. | ||
|
||
### Workshop Overview | ||
|
||
This workshop will focus on using the arrow R package---a mature R interface to Apache Arrow---to process larger-than-memory files and multi-file datasets with arrow using familiar dplyr syntax. You'll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R. | ||
|
||
*This course is for you if you:* | ||
|
||
- want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow | ||
- want to learn about Parquet and other file formats that are powerful alternatives to CSV files | ||
- want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow | ||
|
||
### Workshop Prework | ||
|
||
Detailed instructions for software requirements and data sources are shown below. | ||
|
||
#### Packages | ||
|
||
To install the required core packages for the workshop, run the following: | ||
|
||
```{r} | ||
install.packages(c( | ||
"arrow", "dplyr", "stringr", "lubridate", "tictoc" | ||
)) | ||
``` | ||
#### Seattle Checkouts by Title Data | ||
|
||
This is the data we will use in the workshop. It's a good-sized, single CSV file---*9GB* on-disk in total, which can be downloaded from an AWS S3 bucket via https: | ||
|
||
```{r} | ||
options(timeout = 1800) | ||
download.file( | ||
url = "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv", | ||
destfile = "./data/seattle-library-checkouts.csv" | ||
) | ||
``` | ||
|
||
#### Tiny Data Option | ||
|
||
If you don't have time or disk space to download the 9Gb dataset (and still have disk space to do the exercises), you can run the code in the workshop with "tiny" version of this data. Although the focus in this course is working with larger-than-memory data, you can still learn about the concepts and workflows with smaller data---although note you may not see the same performance improvements that you would get when working with larger data. | ||
|
||
```{r} | ||
options(timeout = 1800) | ||
download.file( | ||
url = "https://github.com/posit-conf-2023/arrow/releases/download/v0.1.0/seattle-library-checkouts-tiny.csv", | ||
destfile = "./data/seattle-library-checkouts-tiny.csv" | ||
) | ||
``` | ||
|
||
If you want to participate in the coding exercise or follow along, please try your very best to begin the workshop ready with the required software & packages installed and the data downloaded on to your laptop. | ||
|
||
------------------------------------------------------------------------ | ||
|
||
![](https://i.creativecommons.org/l/by/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"hash": "21ece2c7abde2d9f3883038a17ddad68", | ||
"result": { | ||
"markdown": "---\ntitle: Intro to Arrow in R\nsubtitle: A short workshop\neditor: source\n---\n\n\n![](images/logo.png){width=\"30%\" fig-align=\"center\"}\n\n### Workshop Overview\n\nThis workshop will focus on using the arrow R package to process larger-than-memory files and multi-file datasets with arrow using familiar dplyr syntax. You'll learn to create and use interoperable data file formats like Parquet for efficient data storage and access, and also how to exercise fine control over data types to avoid common large data pipeline problems. This workshop will provide a foundation for using Arrow, giving you access to a powerful suite of tools for performant analysis of larger-than-memory data in R.\n\n*This course is for you if you:*\n\n- want to learn how to work with tabular data that is too large to fit in memory using existing R and tidyverse syntax implemented in Arrow\n- want to learn about Parquet and other file formats that are powerful alternatives to CSV files\n- want to learn how to engineer your tabular data storage for more performant access and analysis with Apache Arrow\n\n### Workshop Prework\n\nDetailed instructions for software requirements and data sources are show below.\n\n#### Packages\n\nTo install the required core packages for the workshop, run the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(c(\n \"arrow\", \"dplyr\", \"stringr\", \"lubridate\", \"tictoc\"\n))\n```\n:::\n\n\n\n#### Seattle Checkouts by Title Data\n\nThis is the data we will use in the workshop. It's a good-sized, single CSV file---*9GB* on-disk in total, which can be downloaded from an AWS S3 bucket via https:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(timeout = 1800)\ndownload.file(\n url = \"https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv\",\n destfile = \"./data/seattle-library-checkouts.csv\"\n)\n```\n:::\n\n\n#### Tiny Data Option\n\nIf you don't have time or disk space to download the 9Gb dataset (and still have disk space to do the exercises), you can run the code in the workshop with the \"tiny\" version of this data. Although the focus in this course is working with larger-than-memory data, you can still learn about the concepts and workflows with smaller data---although note you may not see the same performance improvements that you would get when working with larger data.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(timeout = 1800)\ndownload.file(\n url = \"https://github.com/posit-conf-2023/arrow/releases/download/v0.1.0/seattle-library-checkouts-tiny.csv\",\n destfile = \"./data/seattle-library-checkouts-tiny.csv\"\n)\n```\n:::\n\n\nIf you want to participate in the coding exercise or follow along, please try your very best to begin the workshop ready with the required software & packages installed and the data downloaded on to your laptop.\n\n------------------------------------------------------------------------\n\n![](https://i.creativecommons.org/l/by/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
/* http://meyerweb.com/eric/tools/css/reset/ | ||
v4.0 | 20180602 | ||
License: none (public domain) | ||
*/ | ||
|
||
html, body, div, span, applet, object, iframe, | ||
h1, h2, h3, h4, h5, h6, p, blockquote, pre, | ||
a, abbr, acronym, address, big, cite, code, | ||
del, dfn, em, img, ins, kbd, q, s, samp, | ||
small, strike, strong, sub, sup, tt, var, | ||
b, u, i, center, | ||
dl, dt, dd, ol, ul, li, | ||
fieldset, form, label, legend, | ||
table, caption, tbody, tfoot, thead, tr, th, td, | ||
article, aside, canvas, details, embed, | ||
figure, figcaption, footer, header, hgroup, | ||
main, menu, nav, output, ruby, section, summary, | ||
time, mark, audio, video { | ||
margin: 0; | ||
padding: 0; | ||
border: 0; | ||
font-size: 100%; | ||
font: inherit; | ||
vertical-align: baseline; | ||
} | ||
/* HTML5 display-role reset for older browsers */ | ||
article, aside, details, figcaption, figure, | ||
footer, header, hgroup, main, menu, nav, section { | ||
display: block; | ||
} |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
2 changes: 2 additions & 0 deletions
2
_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/LICENSE
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
SIL Open Font License (OFL) | ||
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL |
10 changes: 10 additions & 0 deletions
10
_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.css
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
@font-face { | ||
font-family: 'League Gothic'; | ||
src: url('./league-gothic.eot'); | ||
src: url('./league-gothic.eot?#iefix') format('embedded-opentype'), | ||
url('./league-gothic.woff') format('woff'), | ||
url('./league-gothic.ttf') format('truetype'); | ||
|
||
font-weight: normal; | ||
font-style: normal; | ||
} |
Binary file added
BIN
+25.1 KB
_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.eot
Binary file not shown.
Binary file added
BIN
+62.8 KB
_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.ttf
Binary file not shown.
Binary file added
BIN
+30 KB
_freeze/site_libs/revealjs/dist/theme/fonts/league-gothic/league-gothic.woff
Binary file not shown.
Oops, something went wrong.