From c1f374d0c065ecf206e104e07355c41683c065c3 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Wed, 21 Jun 2023 15:44:27 -0500 Subject: [PATCH 01/22] Removed PLACEHOLDER text --- docs/code_of_conduct.md | 2 +- docs/contribute.md | 2 +- docs/index.md | 10 +++++----- docs/instructors.md | 2 +- docs/reference.md | 2 +- docs/waiver.md | 2 +- mkdocs.yml | 6 +++--- 7 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/code_of_conduct.md b/docs/code_of_conduct.md index 29de375..cac745e 100644 --- a/docs/code_of_conduct.md +++ b/docs/code_of_conduct.md @@ -1,5 +1,5 @@ --- -title: Code of Conduct for Pumas-AI Workshop PLACEHOLDER +title: Code of Conduct for Pumas-AI Data Wrangling Workshop description: Participants and Instructors must follow this at all times. --- diff --git a/docs/contribute.md b/docs/contribute.md index 29f23fc..57807ab 100644 --- a/docs/contribute.md +++ b/docs/contribute.md @@ -5,7 +5,7 @@ title: How to Contribute [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) If you want to contribute to this workshop, -please open a pull request at [`PumasAI-Labs/PLACEHOLDER`](https://github.com/PumasAI-Labs/PLACEHOLDER). +please open a pull request at [`PumasAI-Labs/Data-Wrangling`](https://github.com/PumasAI-Labs/Data-Wrangling). By submitting a pull request, you are in accordance that your contribution will be licensed under [Creative Commons Attribution-ShareAlike 4.0 International](http://creativecommons.org/licenses/by-sa/4.0/). diff --git a/docs/index.md b/docs/index.md index 8d4caa1..453ee43 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,6 +1,6 @@ --- -title: Pumas-AI Workshop PLACEHOLDER -description: CHANGE ME. +title: Pumas-AI Data Wrangling Workshop +description: Data wrangling workshop covering data I/O and the use of DataFramesMeta. --- [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) @@ -9,10 +9,10 @@ Short summary about the workshop. !!! success "Prerequisites" - This workshop does PLACEHOLDER and PLACEHOLDER. - We recommend users being familiar with PLACEHOLDER, especially PLACEHOLDER. + We recommend users being familiar with Julia syntax, especially variables and types. - The formal requirements are the PLACEHOLDER WORKSHOP WITH LINK. + The formal requirements are the [Julia Syntax Workshop](https://pumasai-labs.github.io/Julia-Workshop/) + and it's prerequisites. ## Schedule diff --git a/docs/instructors.md b/docs/instructors.md index d39958f..29b0513 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -1,5 +1,5 @@ --- -title: Instructor's Notes for Pumas-AI Workshop PLACEHOLDER +title: Instructor's Notes for Pumas-AI Data Wrangling Workshop --- [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) diff --git a/docs/reference.md b/docs/reference.md index ec839ee..5af3ec2 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -1,5 +1,5 @@ --- -title: Reference Sheets for Pumas-AI Workshop PLACEHOLDER +title: Reference Sheets for Pumas-AI Data Wrangling Workshop --- [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) diff --git a/docs/waiver.md b/docs/waiver.md index 017ff3c..bf94153 100644 --- a/docs/waiver.md +++ b/docs/waiver.md @@ -1,5 +1,5 @@ --- -title: Waiver of Liability for Pumas-AI Workshop PLACEHOLDER +title: Waiver of Liability for Pumas-AI Data Wrangling Workshop --- [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) diff --git a/mkdocs.yml b/mkdocs.yml index dab0a34..e85e76d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,7 +1,7 @@ # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json -site_name: Pumas-AI Workshop PLACEHOLDER -repo_name: PumasAI-Labs/Workshop-PLACEHOLDER -repo_url: https://github.com/PumasAI-Labs/Workshop-PLACEHOLDER +site_name: Pumas-AI Data Wrangling Workshop +repo_name: PumasAI-Labs/Data-Wrangling +repo_url: https://github.com/PumasAI-Labs/Data-Wrangling copyright: Copyright © 2023 Pumas-AI, Inc. plugins: - search From 27831aa330a6584cedfc0a5905fe29b2ac5569b9 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Wed, 21 Jun 2023 17:02:38 -0500 Subject: [PATCH 02/22] Updated README file --- README.md | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 950f9a3..bb1682a 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,7 @@ -# Pumas-AI Workshop Templates +# Pumas-AI Data Wrangling Workshop [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) -## How to use this template - -1. Click on the green button `Use this template` -1. Edit all the `PLACEHOLDER` in `mkdocs.yml` with respect to `site_name`, `repo_name` and `repo_url`. -1. Edit all the `PLACEHOLDER` in `docs/index.md`, `docs/reference.md` and `docs/instructor.md`. -1. Add appropriate content to `docs/index.md`, `docs/reference.md` and `docs/instructor.md`. - ## How to contribute We use [Material for MkDocs](https://github.com/squidfunk/mkdocs-material) @@ -22,9 +15,7 @@ We use [Material for MkDocs](https://github.com/squidfunk/mkdocs-material) ## Authors -- Author 1 - -- Author 2 - -- Author 3 - +- Juan José González Oneto - ## License From 19aa87c8639f0984adad5f87595954a0ef39fbf4 Mon Sep 17 00:00:00 2001 From: Juan Jose Gonzalez Oneto Date: Mon, 26 Jun 2023 22:15:04 +0000 Subject: [PATCH 03/22] Added code examples --- 01-files.jl | 95 +++++++++++++++++++++++++++++++++++++++++++++++++ 02-select.jl | 61 +++++++++++++++++++++++++++++++ 03-transform.jl | 40 +++++++++++++++++++++ 04-grouping.jl | 36 +++++++++++++++++++ 05-chaining.jl | 36 +++++++++++++++++++ 5 files changed, 268 insertions(+) create mode 100644 01-files.jl create mode 100644 02-select.jl create mode 100644 03-transform.jl create mode 100644 04-grouping.jl create mode 100644 05-chaining.jl diff --git a/01-files.jl b/01-files.jl new file mode 100644 index 0000000..ba5c4ce --- /dev/null +++ b/01-files.jl @@ -0,0 +1,95 @@ +# Reading and writing external files + +## CSV: probably the most common type of data file you will find +using CSV +using DataFrames + +# Note: go to the workshop directory before reading the CSV file +df = CSV.read("demographics.csv", DataFrame) # read(, ) + +# As an example, let's change some column names and then save it +renamed_df = rename( + df, + Dict("AGE" => "AGE (years)", "WEIGHT" => "WEIGHT (kg)") +) + +## Tip: you can rename columns programmatically by passing a function +lowercase_df = rename(lowercase, df) # Make all columns be lowercase + +# Now we are ready to save the new file +CSV.write("demographics_new.csv", renamed_df) # write(, ) +# CSV.write("demographics.csv", renamed_df) # Watch out: This would overwrite our original dataset + +CSV.read("demographics_new.csv", DataFrame) # Check our new files + +## Tip: you can read/save data to a folder +CSV.write("data/demographics_new.csv", renamed_df) +CSV.read("data/demographics_new.csv", DataFrame) + +## Custom specifications (keyword arguments): +readlines("demographics_eu.csv")[1:3] +readlines("demographics.csv")[1:3] # Standard format + +# - delim: CSV files are separated by commas most of the time, but sometimes other +# characters like ";" or "\t" are used. +CSV.read("demographics_eu.csv", DataFrame; delim = ';') # Works, but the numbers were parsed as strings + +# - decimal: if the file contains Floats and they are separated by something different than +# "." (e.g 3.14), you must specify which character is used. If you ever need to use this, +# it will probably be because decimals are separated by commas (e.g 3,14) +CSV.read("demographics_eu.csv", DataFrame; delim = ';', decimal = ',') + +# You can also use these keyword arguments to write files +CSV.write("demographics_eu_new.csv", renamed_df; delim = ';', decimal = ',') + +# Many more options: https://csv.juliadata.org/stable/reading.html#CSV.read + +## Excel (.xlsx) +using XLSX + +# Reading files +excel_file = XLSX.readtable("demographics.xlsx", "Sheet1") # readtable(, ) +df_excel = DataFrame(excel_file) # You will most definitely want to convert it to a DataFrame + +## Tip: get all sheets from an Excel file +file = XLSX.readxlsx("demographics.xlsx") # You can see Sheet1 here +XLSX.sheetnames(file) # You can get a vector of sheet names too + +## Tip: you can also use index numbers to refer to sheets +DataFrame(XLSX.readtable("demographics.xlsx", 1)) # We get the first sheet + +# You can also read XLSX files from a folder +DataFrame(XLSX.readtable("data/demographics.xlsx", "Sheet1")) + +# Allow XLSX to infer types (columns will be Any by default) +DataFrame(XLSX.readtable("demographics.xlsx", "Sheet1"; infer_eltypes=true)) + +# Writing files +XLSX.writetable("demographics_new.xlsx", renamed_df) # Same syntax as CSV.write (, ) +XLSX.writetable("data/demographics_new.xlsx", renamed_df) + +## Watch out: if you try to write a file that already exists, you will get an error +XLSX.writetable("demographics_new.xlsx", lowercase_df) # Won't overwrite, like CSV would + +## SAS tables +using ReadStatTables + +# Reading files +## .sas7bdat +DataFrame(readstat("iv_bolus_sd.sas7bdat")) +## .xpt +DataFrame(readstat("iv_bolus_sd.xpt")) + +## Note: ReadStatTables supports other file formats: +## https://junyuan-chen.github.io/ReadStatTables.jl/stable/#Supported-File-Formats + +# Writing files +## Currently, ReadStatTables only supports reading files (writing is experimental only) + +############################################################################################## +# Optional: run this to delete all the files created in the examples +begin + root_files = filter(contains("new"), readdir()) + data_files = joinpath.("data", filter(contains("new"), readdir("data"))) + foreach(rm, vcat(root_files, data_files)) +end \ No newline at end of file diff --git a/02-select.jl b/02-select.jl new file mode 100644 index 0000000..f3e3d41 --- /dev/null +++ b/02-select.jl @@ -0,0 +1,61 @@ +# We often want to retrieve only certain parts of a DataFrame +df = CSV.read("demographics.csv", DataFrame) # Load the demographics dataset from before + +# Columns +names(df) # Get all column names + +## Get a single column as a vector +df.AGE # DataFrame.column_name +df.WEIGHT + +df[!, "AGE"] # Indexing, as if it was a matrix +df[!, "WEIGHT"] + +## Get multiple columns +df[!, ["AGE", "WEIGHT"]] # This gets messy quickly + +### @select macro +using DataFramesMeta # You don't need to import DataFrames if you import DataFramesMeta + +@select df :AGE :WEIGHT # We use Symbols instead of Strings +@select(df, :AGE, :WEIGHT) # We can also call it in a similar way to functions + +@select df begin # block syntax, probably the best alternative for multiple columns + :ID + :AGE + :WEIGHT +end + +## Tip: select columns the other way around +@select df $(Not([:AGE, :WEIGHT])) # Get all columns, except the ones we specify + +# Rows +## Indexing +df[1:10, ["AGE", "WEIGHT"]] # Get the first 10 rows +df[4:16, All()] # Get rows 4 to 16 for all columns + +## The @subset macro +## Allows selecting rows based on conditional statements +@subset df :AGE .> 60 # Get all subjects that are more than 60 years old + +# You can also have multiple conditions +@subset df begin + :AGE .> 60 + :ISMALE .== 1 # Get males only + :WEIGHT .< 50 # Get subjects that weigh less than 50 kg +end + +## Tip: use rsubset instead of broadcasting everything (.>, .==, etc.) +@rsubset df begin + :AGE > 60 + :ISMALE == 1 + :WEIGHT < 50 +end + +## You don't always want to use rsubset +@rsubset df :WEIGHT > mean(:WEIGHT) +@subset df :WEIGHT .> mean(:WEIGHT) + +## Common use case: remove rows that have missing values in one column +df_iv = DataFrame(readstat("iv_bolus_sd.xpt")) +@rsubset df_iv !ismissing(:conc) \ No newline at end of file diff --git a/03-transform.jl b/03-transform.jl new file mode 100644 index 0000000..3fb875b --- /dev/null +++ b/03-transform.jl @@ -0,0 +1,40 @@ +# Apply some transformation to one or more columns in our data +include("02-select.jl") + +# Change the sex encoding (ISMALE) +df +@transform df :SEX = [i == 0 ? "Female" : "Male" for i in :ISMALE] # Create a new column +@transform df :ISMALE = [i == 0 ? "Female" : "Male" for i in :ISMALE] # Modify an existing column + +## Tip: use @rtransform to avoid specifying the entire column at once +@rtransform df :SEX = :ISMALE == 0 ? "Female" : "Male" +@rtransform df :ISMALE = :ISMALE == 0 ? "Female" : "Male" + +# You can also apply multiple transformations at once +@rtransform df begin + :ISMALE = :ISMALE == 0 ? " Female" : "Male" + :AGE = Int(round(:AGE, digits=0)) # Round age to an integer + :AGE_months = :AGE * 12 # Calculate age in months +end + +# Notice that our age in months was not computed from the rounded version of the AGE column +## We have to use @astable to be able to use intermediate results +@rtransform df @astable begin + :AGE = Int(round(:AGE, digits=0)) + :AGE_months = :AGE * 12 +end + +# Modify the original DataFrame +@rtransform df :SEX = :ISMALE == 0 ? "Female" : "Male" # Creates a new DataFrame +df # Our original DataFrame remains unchanged + +@rtransform! df :SEX = :ISMALE == 0 ? "Female" : "Male" # Use ! at the end to modify the source +df # Watch out: we lost the original DataFrame (we would have to reread our source file) + +## Tip: this works for all of DataFramesMeta's macros +@rsubset! df :SEX == "Female" +df # Now we only have female subjects + +@select! df :AGE :WEIGHT :SEX +df # Now we lost the rest of the columns + diff --git a/04-grouping.jl b/04-grouping.jl new file mode 100644 index 0000000..1afc1cb --- /dev/null +++ b/04-grouping.jl @@ -0,0 +1,36 @@ +# Some times we want to group our data and apply operations according to that grouping +df = CSV.read("demographics.csv", DataFrame) # Load a fresh copy of our dataset + +# The groupby function +groupby(df, :ISMALE) # Group subjects according to sex + +## More complicated example: transform + group +@rtransform! df :WEIGHT_cat = :WEIGHT > 70 ? "Over 70 kg" : "Under 70 kg" +groupby(df, :WEIGHT_cat) + +## Tip: groupby can take multiple columns +groupby(df, [:ISMALE, :WEIGHT_cat]) # Now we get 4 groups + +# Combining +## A common thing to do after grouping data is to combine it back with some operation. + +# Example: mean age for each sex group +grouped_df = groupby(df, :ISMALE) +@combine grouped_df :AGE = mean(:AGE) + +# You can also use DataFrames that have been grouped with multiple columns +combined_df = @combine groupby(df, [:WEIGHT_cat, :ISMALE]) :AGE = mean(:AGE) +@orderby combined_df :ISMALE # Fix awkward ordering with @orderby +@orderby combined_df :ISMALE :WEIGHT_cat # Use multiple columns in @orderby + +## Tip: you can include multiple calculations inside of @combine +@combine grouped_df begin + :AGE = mean(:AGE) + :WEIGHT = mean(:WEIGHT) +end + +# the @by macro: groupby + @combine in one call +@by df :ISMALE begin + :AGE = mean(:AGE) + :WEIGHT = mean(:WEIGHT) +end \ No newline at end of file diff --git a/05-chaining.jl b/05-chaining.jl new file mode 100644 index 0000000..835758f --- /dev/null +++ b/05-chaining.jl @@ -0,0 +1,36 @@ +# Perform all your data wrangling operations in one block with @chain +df = CSV.read("demographics.csv", DataFrame) + +# Get ages for all female subjects +female_ages = @chain df begin + @rsubset :ISMALE == 0 + @select :ID :AGE +end + +# More complicated example +@chain df begin + + @rtransform begin + :SEX = :ISMALE == 0 ? "Female" : "Male" # Create the new sex column + :WEIGHT_cat = :WEIGHT > 70 ? "Over 70 kg" : "Under 70 kg" # Create weight categories + end + + @by [:SEX, :WEIGHT_cat] begin # Calculate mean values for each column + :AGE = mean(:AGE) + :SCR = mean(:SCR) + :eGFR = mean(:eGFR) + end + + @orderby :SEX :WEIGHT_cat # Fix ordering + + # Make column names more readable + rename( + Dict( + :SEX => :Sex, + :WEIGHT_cat => :Weight, + :AGE => :Age + ) + ) + +end + From 57aeea1d0fbf1ae4ca7d945501f24eacce23df9b Mon Sep 17 00:00:00 2001 From: Juan Jose Gonzalez Oneto Date: Mon, 26 Jun 2023 22:16:30 +0000 Subject: [PATCH 04/22] Added data source files used in the examples --- data/demographics.csv | 101 +++++++++++++++++++++++++++++++++++++++++ data/demographics.xlsx | Bin 0 -> 10124 bytes demographics.csv | 101 +++++++++++++++++++++++++++++++++++++++++ demographics.xlsx | Bin 0 -> 10124 bytes demographics_eu.csv | 101 +++++++++++++++++++++++++++++++++++++++++ iv_bolus_sd.sas7bdat | Bin 0 -> 28672 bytes iv_bolus_sd.xpt | Bin 0 -> 14240 bytes 7 files changed, 303 insertions(+) create mode 100644 data/demographics.csv create mode 100644 data/demographics.xlsx create mode 100644 demographics.csv create mode 100644 demographics.xlsx create mode 100644 demographics_eu.csv create mode 100644 iv_bolus_sd.sas7bdat create mode 100644 iv_bolus_sd.xpt diff --git a/data/demographics.csv b/data/demographics.csv new file mode 100644 index 0000000..5f23f09 --- /dev/null +++ b/data/demographics.csv @@ -0,0 +1,101 @@ +ID,AGE,WEIGHT,SCR,ISMALE,eGFR +1,34.823,38.212,1.1129,0,42.635 +2,32.765,74.838,0.8846,1,126.0 +3,35.974,37.303,1.1004,1,48.981 +4,38.206,32.969,1.1972,1,38.934 +5,33.559,47.139,1.5924,0,37.198 +6,53.758,50.819,1.6769,0,30.855 +7,25.306,59.304,0.97666,0,82.217 +8,39.897,26.452,1.0817,0,28.899 +9,54.975,26.931,0.90926,0,29.731 +10,40.732,50.878,0.87513,1,80.156 +11,38.603,76.539,0.9541,0,96.028 +12,48.539,113.91,1.0889,0,112.95 +13,28.818,65.829,1.3689,0,63.118 +14,60.933,63.769,0.94223,1,74.322 +15,57.703,37.54,1.3098,1,32.758 +16,32.41,70.069,0.84661,0,105.12 +17,33.799,49.636,1.0998,1,66.572 +18,48.453,59.107,1.1961,0,53.408 +19,40.826,74.732,1.1407,0,76.702 +20,71.319,56.43,0.8041,0,56.9 +21,42.783,92.953,1.3023,1,96.374 +22,33.136,52.328,1.4894,1,52.148 +23,48.292,57.359,1.1164,0,55.624 +24,33.338,58.429,1.2144,0,60.586 +25,29.682,43.66,1.1875,0,47.883 +26,31.032,27.411,0.95849,1,43.282 +27,51.144,72.894,1.3975,0,54.714 +28,35.21,48.149,0.96081,0,61.995 +29,79.292,39.792,1.1153,1,30.084 +30,36.843,45.248,0.93339,0,59.037 +31,19.187,47.021,0.77409,1,101.93 +32,55.554,60.889,0.98239,1,72.695 +33,41.275,98.419,1.2289,1,109.82 +34,49.346,33.442,0.89148,0,40.147 +35,75.654,60.327,0.99383,1,54.249 +36,49.241,53.166,1.1897,1,56.333 +37,32.226,27.841,1.1696,0,30.288 +38,47.13,65.791,0.91439,1,92.807 +39,44.668,65.805,0.76055,0,97.378 +40,32.888,63.813,1.4133,0,57.096 +41,37.504,61.231,1.0209,1,85.382 +42,42.549,45.632,1.3726,0,38.247 +43,56.144,67.582,0.79364,1,99.178 +44,34.77,57.067,0.98974,0,71.629 +45,31.832,38.174,1.4367,0,33.93 +46,30.435,48.515,1.563,1,47.233 +47,41.047,93.403,1.1259,0,96.911 +48,53.297,37.35,1.22,0,31.337 +49,35.503,43.038,0.97497,1,64.067 +50,41.404,47.253,1.3553,1,47.742 +51,37.669,41.304,1.0758,0,46.381 +52,41.054,45.695,1.1881,1,52.855 +53,43.664,24.765,0.88557,0,31.805 +54,43.823,47.549,1.0754,1,59.065 +55,58.715,20.427,1.2121,1,19.026 +56,47.693,44.151,0.99899,0,48.161 +57,44.787,37.563,1.3257,1,37.469 +58,60.371,31.4,1.1461,1,30.3 +59,25.971,64.709,1.2816,1,79.966 +60,27.241,42.836,0.97056,0,58.752 +61,46.255,40.175,1.0276,1,50.904 +62,26.975,56.213,1.2438,1,70.949 +63,41.258,38.281,1.1347,0,39.329 +64,74.954,52.581,1.1033,1,43.053 +65,28.897,55.729,1.3016,0,56.16 +66,35.299,49.359,1.1729,0,52.014 +67,46.802,63.445,1.3773,0,50.682 +68,43.82,63.29,1.1943,1,70.787 +69,35.988,52.836,1.185,0,54.75 +70,21.981,56.093,0.97028,0,80.548 +71,32.63,71.382,1.0415,0,86.872 +72,50.332,43.704,1.0128,0,45.678 +73,51.046,40.91,0.97496,0,44.066 +74,42.821,35.829,1.2623,0,32.564 +75,50.054,98.706,1.2275,0,85.385 +76,40.431,42.513,0.90545,0,55.191 +77,71.791,42.516,1.61,1,25.018 +78,39.383,58.177,1.4767,1,55.054 +79,52.219,26.453,1.102,1,29.265 +80,45.822,98.223,0.8463,0,129.04 +81,66.551,38.448,0.93366,1,42.009 +82,44.64,35.093,1.4036,1,33.114 +83,46.244,39.316,1.0477,0,41.534 +84,27.477,58.911,1.1565,1,79.608 +85,34.575,54.387,1.0842,1,73.449 +86,43.471,55.549,1.4283,1,52.141 +87,62.199,49.899,1.612,1,33.448 +88,41.567,42.826,1.3073,0,38.069 +89,41.272,54.868,1.3139,0,48.674 +90,38.507,49.696,0.99456,0,59.871 +91,30.317,44.18,1.06,1,63.496 +92,44.131,36.558,1.395,1,34.894 +93,20.009,54.23,0.96673,0,79.463 +94,69.412,52.667,1.0861,0,40.409 +95,43.751,48.553,1.2586,1,51.569 +96,31.245,58.631,1.0564,1,83.832 +97,45.48,53.108,1.7202,1,40.53 +98,61.124,27.529,1.0137,0,25.286 +99,33.803,22.206,0.87682,1,37.354 +100,40.145,58.733,1.2478,0,55.487 diff --git a/data/demographics.xlsx b/data/demographics.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..2396c0fa260f0a71dd467d7d238bc1d57eefb68f GIT binary patch literal 10124 zcmZ{K1ymf{(k=<^65QRLpaB9Qg9i7&;1YCjClDM4cY+3YcXto&1b3Ig^^tS__n&+6 z?(12rd#zsktEpYPWcOEFK?WKI8wv^v0g5wZMvaW~YkLSB6jU!c6x5rSw=R}!4quF| zjQ_rKu)0`T9%;&g7J2YnuhkJ6o1&Qq$V5{+YrEKs+mVXrjNj9WfJ^Q%v#7m-c&4dKv>Y^u+i?qaJa11Q*NOIee@;R% z^|9v$`LRsbmlh$R069PWm{lS}xr+-ZRDkQmY^U~PaYQO`v%g(RJt_GcslSvqZEvc~xkKrqU?@A!M@sY>vDgBQK25O$r=JenO+b-em=no0*~R_YBCZ>(9Gpz11awW!V%XK7ZcnuS z6l8oP9GaPhTOCs~&f}*%C&)fulig z*Z1HL{)Qe&{>mdf7Rr4X&-*$TYRHC!{`;x^Fq1!u3ll$9J?WioXC#cvEIQ08DMh`Q z!JgA`t_4nAj9;3^86U&?F|-Y{)}p4Z4O_5FKUI=+U*PMX(8DdLg7_bDv#(-LG*#2_DKc;6c%ZI0sDMyCSER2Ej1I4IK6(>1=x_w3^ z>IaHQV$RdN>4O)g-U|KD7Stk3C;0K3ok(IV3?>&fiT!);@~I_sOrr#u1Ni3G1JKkh zG2NB1-XDki7)YV40v_)V5#Z0t{uWNktj}-DFWmv1?1u_UKW5AKKPI`bS13pEh+f67nMl1y@NGqn$ zmZWqGqqh53lir9R6ghm#iY7*8OE%Qr-W4)h%`tY-Z!5QggA#LXM}Y#dXJ}JLfwT8w zgK3l+DiW(@c)u_`%6WI%C#T2N_C39|8kMkc;aS;_jfs*!1etJmzk~6(HL9>?a^BIGDE)BisCYLxqStUb2(_0G<`q4+lba@ls zPhZab7|w6EEcsvU^pH2H8FaRa48Ok+?M~pR#1RxwK2oHijIEP1FE}o~l=czg95h8W zd{T%(hN?~3DVg<1<27eaFPTn|Kf&Fvijn}s9DdsCk;2A(BbA7SZ3R>mD7$x57uU-LPdBj+nK9V7l??Hrir8J$cJ zgn#qej%K2X3_%W_dMz9MaXgr_TgrpXMr_`ZihC77v9^30p~rt!=d*2&_D>~6m-)(1 zir!RY!*a0(9HVRVmp*d(WCWAbKr_xaXf02)a!U|@wGfkmS4wOp24p9ZY~`F=#g=9z z7Ca;TG{O%e^a>XJ1xa@1#-@F6L_VE%+>jJ5gUP%O^C@%xR!*XlmTB59D$-lvatUqX z_`}*91E?+a_P)Ov%cu&nZ#NiAEzWXTVO__PXZ{YSMgN*H-fS-}V2OvxYLO=_&1!c^ z?h=s~2?E_UGJ6d1x;k^Xnh;Lbb+;uO);c<1MSL55+3}e+rKO&?aEt92r<(;*uXe=y z(W2ePqe z)3>qtUAI!Bf65)OWB6YS0yc#lUnBe&OnF~*df8kyvUJ~c)2jez?`9zyGur!O6t?}s2VXj-q;MI8yn#cG{X zoG;WT2M^NFb!VF*AF}HaI|xhYfl9&zBtm!JE1OE-m~I>xqQ{XgB;HoXG^|dl6<1f5 zOZHZ(;fkJ~d~<=CHmy~m&BP}8`EG2}FkqW2$EmlH*MDpywpe+bUc!=Xkyp$LS}FF~ z!=n$*TjA7}FIM@eng_2EAEtm0_Z*XYx-ZB=4EMA_r%o7eVYIaX`M(so~&;h#+;!7x%|ZMQiA3QHcF`+wH-V z?$EkjeEQDq=~Q6S>WZ#Ln#ZeuFA}ko-T2UJD5(CIJqG&U`wM4~orM7iWbu2AN>qoe zv)Hkhu9+gQv4`K~XG?!541UX1c&8ms%-(^d*UKYYSokbFOHF-L4K8H)P+LaUw{t_< zHGf-M8Vb)yy&>y!S`db^U?gbzlToye7th2Gr9d(!CWQB`XoN~8@jigPy$Q{{WlbjfAEl5&U{lSG?;AbL5u#yO)T4q@wAW!L*HN zJWe9ogq|gA^S2*+AJ#I?G&>mG>g-EJ7p`~I{7bb>4-FNL3~>Z_y}jji*rGhX_dpgvzNPR$9{_)F6H4ARmT}i7(WW8LB}`aXNag zf#$|U4{lQF^uD5eL?o!EMUO;h7jJDVMdmx*9i1G$T-viFOnUObOC3Q;-TBjD0TNDe=j7kDu!}JW?G?MsXJ?XWWAT;qHF= zn&O2hDJ4Z;4OT;)YV))%l?z*+M%Z-d@acQfgb?^KuMavYSGnFl^YE<+_8o% zsL9zz`*NO81O1&UBD?+T^FwTR`jzq14aITFMd{ofe~G%n;+1q&+dc;^ zW*mP`Mq+h(78)rP#3^=))Vnb`P|tJjCKG_S@1G2r3{z8a+2r3af~qz8!2EdhtcE@( z8#bkbJ0YY>5Ma95^gfw<*T^Z+dgFGsI{eX^*?gbk*~c0gbr+BA95Qlh`aJP5LgYXQ zFTChI0-VY8#pgxk9k%UqBz8u%$rhWUA^iaN`!_{5I=u<`Y!4Yr425;+kJ*_OWgUdq zmcXVfwkML47WZuTF0KcN?UR0^3-h`j)(Bn4Za7WH+axw0qYHLj<`6UdK=if;h0O6` zB$4mRRdi=V%L!{mvUF{aU&-C`gVqr^R zv-M`(8Y)qz-WYAy8~HHt#I`kFbw=vlC(!o0Vr4&b^@G0t;j-b~u^9?4^)wRZ?@vK3 zF_5)`v9*J)imQ#WJ@Bt|9V=_y`Mad1P@d!V1h5tsQkL(IRstA>youz4{0+6zzI7hG zllr3i1NUyhh6+s7H(UhRWpCarn`@kX%;c40Dzag zf%%^s8bgqs@o)2oMFHhI*|7soZgFHz>YQLv&1$3a*O0K@uw$Kkzk-^)q_T26Jvj&Cc{6fW;t%xSV%sP{_LB@08!SDj}oU})WVST zli1PHQY%(Jpb7_59wn92EClKi^y(an+NJ|O^x0UXtW%IKp=q%S-=MM6Y?3k|XrH&+ z(tSG8`}kg(0WRm!L6$A@wHP##!ztgWorgX&-xdh>VgRRL-=^B;jZh%8=jkO>tyOZ( z25A^%$E8ZU_%1KE=2(peO4T3x6W?k_y?K>}RpoVVySnqz6X9_er?s1wuLs|HX)f6r z%(qVl%1(r5Db3+z=`;2!iqZ1Qb_q(2w6U^wJh`CMXZ1%w{@2Eb6-k05ZGuss!gep= zRzlA|+ZyGumPhl3WSMlsQZ(eB>#Y~P631x@{%#>Drki;!od4AZq-CJKY zsq^~P>QP_q+t57M{LAWuOjp>e{$ch#XeDk4{HXtE*YESHzdZn&u z!gqrGy`{!bR2tcd)8gaR8hd-pmzW2)%R-%!lLg79_q?H~ zr?728BW5Ilv*&^yPMID=Zxoy zrl&iX=e6^;4YoQHI}ymDx( zh@v%66IlM>(d3u#4}%F)s=HO zjQ(l|cahpb*WJeb($#&>@t5u=N1+o`_taIQrJtR;B1q^*YbPMGpZ7%|y(yA?>*f8! z25XNZ`<=#VtA7Qa$ z-28F1W+O1W19iYT^r~=NC%GvZu)h?kY$VQBh_cd$XL6CrDaJs-k=4S(TSxC`#@_f- zw1)pn+a9X)h62~Ul!VGw2hr6OUj%oO%-E6|tIaZBOo$O_F4$h%o0M5Cd5U;?G@d^RqNh#n& z@KEb^>f78hYE=QCz*F=30OP$LC6vC_n28!bf=- z_>F9u1&mf~scLFGS!I3s&R?_7=MVhUjhPpb5L(I+ts5Wb|CQfYJcW8~DX!7m`A! zHNu?%)IkcL0aeR603PGViC`_=FI{T<5J37ZAf?@PEA31`i{*9HB+k~V)d%dNTqPAf z+3nt0T<*c7SwxwxTZ+VyN?QSgrA}T_`K1hEdR0C~iKz~_H!xb`F21M_U|Bf{X4C{d zXGfFey&y;AWnu~DL9Ro=rylVsK#-+N$Qq%3m>%|C{gG9*oY95=!)_-JUG!Ky={9Ah z!sgm1Z2d-yQt^7JyzQpC(zWRwxzaU5-oJOa_G>fI2>~|)hV^+^cJ)7)GQf4hmG&*Fh zfW~#=TKPVc)rSbH;#(HX^gMZJMB`c5W`)BC>AcBlnorU7XTcv1Rl0`_)Ar%h$V-UF zr*YF@Kgr3f0EgU$l9wEhpeT!Fm z@sJ&^9=ONQ$27QcSmMe6=QxjQpj=A4Hm%No>9#IS20L8d!d{>4!(g2Bv#5tJ}rV#bfRwkfYuHv8#wR=aKK3SIx~{$NEG*dilf5a zsvQaU)jH`H#c$15OXHkN@x^@ccbbZCag(-ut;)wagRFJSZ{AUk@`_L8G?Wh|wB&sU zvB-^h=ItL0z-Av;&QcsyL=I7a&!O5elPjO7w+e=cBR#uo(k z1dZ+|UM7~<4M+d#ILAQ_97AbUaUt$3mQ_ue+)d1h;KZPEST`s$oS~BAH&Og5$go@^zyYd(Z*r4C$)6vJv7il6! zcEgz-rCm)BoYWLMzBK2Z3OG& zS&DSp4#H)!b!+~*nZq?(PedwF(C)4{$9Cw?)t3QONeCVKgs^I^ECe@5uU!=1#~Oi1s5he0I0|V^bUleK*VHo)jHt8qp<6Je)83zQf6$ zx9r?__bad+@3pjDmol^gPo{swV0YyJIb!dOph>Itx+9?rY#E znY}E4=F#s-fPPU2FD1V!6Kz6_juAc>>fkl{LT)xRZjY<=_}<}6HZ+f70w(X13N$=MKY z?%3oxd)`!Q{o?D8uWJtbQ=4+am@$FO;Zc(X<5G>ua-%lpfRMDh@w4yc7Ks};TM*lf zFF4XAZp@=6%0(F96KBSc{EEWRa1>{mV6i>5h3&T|08~UN-L^upTmDiC;ih8Z*R_|? zE6Oj=WHk>OJi@O%-M5)3>T0-T-jQQYacnd>G+uWAG*p>h6OCQW#2LBdHw=Py2S=y! ziYK}2c-H3*Y6k>>T(vOvjhVkjXZ7mOO1VM7SjN^isA`{@(^@j?+h_JKFO5k`lfvIO z9Hlu$;sJ!GI&}IeFGotibA!RN;I7+PlAiE7iFd<+Re!qF^QxRQmc*M(CE|O}y zjW;YrninEK|I=v%-N|TmTNB|I9q5HggD#q7ccjs3VZbu{N%7*R2HW=Um@!`&TMsaK znCW}&n8@WycFoA0e|Ky)In-Qtp=pSNkE7ZTp8*%pHHSQ~9TZqtUweNlvlTwD`Pt8d zHv8*%H)!&jL%9*T0-4R#sMDf<{UgPJOU~N*{A-~$telM+h2wq@fZI=CJWy+RF91`Q zJ7+CULQHwQYIdS@uU@C5VAq4Co5$MS!icfCT{si1wx~X~obhUkZ`4zE{4B%(dFJ}O zEpwxW{1=~&qA+h2#BD-j0X2>RkjD7?`n3! zu~b*LjE$5(6IGmbItjV5alUXa&xOKVI~P;kEY2 z^xdF0C(YrecGYRoj+Fla}aLz)-5lSBYEbWDa3Cf8<%wi zZx?5#x5j&zr=G0VIFz=oA1T}3;&7?pD`qd}EKOkc9H{cIseXDvE6G_owWwx6YF1G- zUV?oznLe0CSAH#buB;|b@Z&b{0y6u7M9p-zWQ6Tx-=hS1$>m@Jbui*6Hbt?9CdJ?M zut?hDa{cOBTv~Oe1QY#iP(M5AH~};mkBSeWx;Mr77>UdkMp|qW3Mm#f6hKZil%W@w|Ev1boJ)SGC3OzO@*gCoM73i+vKi1cey1l~ zFO*Zq#`Bta<@EdCLc#zq*(30yU*G{%;r56rRV`Bo_q(W%s;Q+HQYO2pbWsH0R z6c3u=kXXnNVRue#aP>M8r|0soBOEu~Oh_%FSXna0NizGDLxM;v>xP5P;ApCv)o=s1 zs&*DnEu<4gX3ZQ!^{$BbB*_ivwfx}Na{q8s+2*n!NW21oU39B!F5bU6>AJeSVLPo# zuMOa<avST5qzgC(wQN%xQo&n=H;OilK+(4C+& z3E>|iobwN6R)rIlBho-0CD$ut9x^8?MAg{$m~XwCN!oNc(5@xt4bK9AitAVc^T(!O=2ZH6gJ!vi0c5g8}9xv(t5RE#r}(NKD`(T z`Q`byh5pAXKYv@r`?poxv#NbL#8Y*8GwL`i;m0hu0;%IGmRLORT>~p%8z;KL<8$QG z{haPy38&u$&VeV(8Ke~sL+6%j2z3s2E}`=i`!xxBDgwA9=G2RpkRxL4WDniQR0Hbs zf5;5Hrxc$&`&fZ7%KH2ki&pxU;PZtMoH;?Av1`N{FLWWSGD&fOO}TT%e$|fF+{D|K zH9C&GU|{3eYgf5L=EWNH2H|XB^el5E7CKb5VU9qm=xcyDv<}b0vh~GvHJ>e|Oz@cG z>VdE>!K@$8b3>+aZ^Tc*^+3NhS)&`LjviE%BKd#!Em+JRWU60Gf%?Kj{yru2avsS3 zi@x1UUUjfDvo`&!ePJ9S4}M{Rdrs~dI)8Cb`vy^C+SX`|4HX{p&uJ2UpknM2@kAL)uFj6v0Ez+i-dd-?LtD;Ye9+BN33pHJmAcRq5m*m&a|-*nRw0^{z}Q2F;AycVjLRPjVZw0AH^4 z5c#Ygg2%Enqc~8&uBq~-ZXbnEaa&+M-=TU_oxt{DLiRCw1H6LFtJm0n&FH_MdwQAb ze_a3N6x5#r{_H;ggKm0h`TW_7{uBLYTl62a9>QNn|IaY}6Z~gy=^wDhix>Z1vH$Hf z{R#b(&V)BeAV^`G!RN!Wkj`!Ce&i|W7U{r^z1 ze@gi?x&0$$j^ZyV|DRO%r<_0I{~tMzl>f>Ge**uEl7E00)PJwQpE08#1N-6@P*6xO PKkt`xbM-O~3hMs=r(*|? literal 0 HcmV?d00001 diff --git a/demographics.csv b/demographics.csv new file mode 100644 index 0000000..5f23f09 --- /dev/null +++ b/demographics.csv @@ -0,0 +1,101 @@ +ID,AGE,WEIGHT,SCR,ISMALE,eGFR +1,34.823,38.212,1.1129,0,42.635 +2,32.765,74.838,0.8846,1,126.0 +3,35.974,37.303,1.1004,1,48.981 +4,38.206,32.969,1.1972,1,38.934 +5,33.559,47.139,1.5924,0,37.198 +6,53.758,50.819,1.6769,0,30.855 +7,25.306,59.304,0.97666,0,82.217 +8,39.897,26.452,1.0817,0,28.899 +9,54.975,26.931,0.90926,0,29.731 +10,40.732,50.878,0.87513,1,80.156 +11,38.603,76.539,0.9541,0,96.028 +12,48.539,113.91,1.0889,0,112.95 +13,28.818,65.829,1.3689,0,63.118 +14,60.933,63.769,0.94223,1,74.322 +15,57.703,37.54,1.3098,1,32.758 +16,32.41,70.069,0.84661,0,105.12 +17,33.799,49.636,1.0998,1,66.572 +18,48.453,59.107,1.1961,0,53.408 +19,40.826,74.732,1.1407,0,76.702 +20,71.319,56.43,0.8041,0,56.9 +21,42.783,92.953,1.3023,1,96.374 +22,33.136,52.328,1.4894,1,52.148 +23,48.292,57.359,1.1164,0,55.624 +24,33.338,58.429,1.2144,0,60.586 +25,29.682,43.66,1.1875,0,47.883 +26,31.032,27.411,0.95849,1,43.282 +27,51.144,72.894,1.3975,0,54.714 +28,35.21,48.149,0.96081,0,61.995 +29,79.292,39.792,1.1153,1,30.084 +30,36.843,45.248,0.93339,0,59.037 +31,19.187,47.021,0.77409,1,101.93 +32,55.554,60.889,0.98239,1,72.695 +33,41.275,98.419,1.2289,1,109.82 +34,49.346,33.442,0.89148,0,40.147 +35,75.654,60.327,0.99383,1,54.249 +36,49.241,53.166,1.1897,1,56.333 +37,32.226,27.841,1.1696,0,30.288 +38,47.13,65.791,0.91439,1,92.807 +39,44.668,65.805,0.76055,0,97.378 +40,32.888,63.813,1.4133,0,57.096 +41,37.504,61.231,1.0209,1,85.382 +42,42.549,45.632,1.3726,0,38.247 +43,56.144,67.582,0.79364,1,99.178 +44,34.77,57.067,0.98974,0,71.629 +45,31.832,38.174,1.4367,0,33.93 +46,30.435,48.515,1.563,1,47.233 +47,41.047,93.403,1.1259,0,96.911 +48,53.297,37.35,1.22,0,31.337 +49,35.503,43.038,0.97497,1,64.067 +50,41.404,47.253,1.3553,1,47.742 +51,37.669,41.304,1.0758,0,46.381 +52,41.054,45.695,1.1881,1,52.855 +53,43.664,24.765,0.88557,0,31.805 +54,43.823,47.549,1.0754,1,59.065 +55,58.715,20.427,1.2121,1,19.026 +56,47.693,44.151,0.99899,0,48.161 +57,44.787,37.563,1.3257,1,37.469 +58,60.371,31.4,1.1461,1,30.3 +59,25.971,64.709,1.2816,1,79.966 +60,27.241,42.836,0.97056,0,58.752 +61,46.255,40.175,1.0276,1,50.904 +62,26.975,56.213,1.2438,1,70.949 +63,41.258,38.281,1.1347,0,39.329 +64,74.954,52.581,1.1033,1,43.053 +65,28.897,55.729,1.3016,0,56.16 +66,35.299,49.359,1.1729,0,52.014 +67,46.802,63.445,1.3773,0,50.682 +68,43.82,63.29,1.1943,1,70.787 +69,35.988,52.836,1.185,0,54.75 +70,21.981,56.093,0.97028,0,80.548 +71,32.63,71.382,1.0415,0,86.872 +72,50.332,43.704,1.0128,0,45.678 +73,51.046,40.91,0.97496,0,44.066 +74,42.821,35.829,1.2623,0,32.564 +75,50.054,98.706,1.2275,0,85.385 +76,40.431,42.513,0.90545,0,55.191 +77,71.791,42.516,1.61,1,25.018 +78,39.383,58.177,1.4767,1,55.054 +79,52.219,26.453,1.102,1,29.265 +80,45.822,98.223,0.8463,0,129.04 +81,66.551,38.448,0.93366,1,42.009 +82,44.64,35.093,1.4036,1,33.114 +83,46.244,39.316,1.0477,0,41.534 +84,27.477,58.911,1.1565,1,79.608 +85,34.575,54.387,1.0842,1,73.449 +86,43.471,55.549,1.4283,1,52.141 +87,62.199,49.899,1.612,1,33.448 +88,41.567,42.826,1.3073,0,38.069 +89,41.272,54.868,1.3139,0,48.674 +90,38.507,49.696,0.99456,0,59.871 +91,30.317,44.18,1.06,1,63.496 +92,44.131,36.558,1.395,1,34.894 +93,20.009,54.23,0.96673,0,79.463 +94,69.412,52.667,1.0861,0,40.409 +95,43.751,48.553,1.2586,1,51.569 +96,31.245,58.631,1.0564,1,83.832 +97,45.48,53.108,1.7202,1,40.53 +98,61.124,27.529,1.0137,0,25.286 +99,33.803,22.206,0.87682,1,37.354 +100,40.145,58.733,1.2478,0,55.487 diff --git a/demographics.xlsx b/demographics.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..2396c0fa260f0a71dd467d7d238bc1d57eefb68f GIT binary patch literal 10124 zcmZ{K1ymf{(k=<^65QRLpaB9Qg9i7&;1YCjClDM4cY+3YcXto&1b3Ig^^tS__n&+6 z?(12rd#zsktEpYPWcOEFK?WKI8wv^v0g5wZMvaW~YkLSB6jU!c6x5rSw=R}!4quF| zjQ_rKu)0`T9%;&g7J2YnuhkJ6o1&Qq$V5{+YrEKs+mVXrjNj9WfJ^Q%v#7m-c&4dKv>Y^u+i?qaJa11Q*NOIee@;R% z^|9v$`LRsbmlh$R069PWm{lS}xr+-ZRDkQmY^U~PaYQO`v%g(RJt_GcslSvqZEvc~xkKrqU?@A!M@sY>vDgBQK25O$r=JenO+b-em=no0*~R_YBCZ>(9Gpz11awW!V%XK7ZcnuS z6l8oP9GaPhTOCs~&f}*%C&)fulig z*Z1HL{)Qe&{>mdf7Rr4X&-*$TYRHC!{`;x^Fq1!u3ll$9J?WioXC#cvEIQ08DMh`Q z!JgA`t_4nAj9;3^86U&?F|-Y{)}p4Z4O_5FKUI=+U*PMX(8DdLg7_bDv#(-LG*#2_DKc;6c%ZI0sDMyCSER2Ej1I4IK6(>1=x_w3^ z>IaHQV$RdN>4O)g-U|KD7Stk3C;0K3ok(IV3?>&fiT!);@~I_sOrr#u1Ni3G1JKkh zG2NB1-XDki7)YV40v_)V5#Z0t{uWNktj}-DFWmv1?1u_UKW5AKKPI`bS13pEh+f67nMl1y@NGqn$ zmZWqGqqh53lir9R6ghm#iY7*8OE%Qr-W4)h%`tY-Z!5QggA#LXM}Y#dXJ}JLfwT8w zgK3l+DiW(@c)u_`%6WI%C#T2N_C39|8kMkc;aS;_jfs*!1etJmzk~6(HL9>?a^BIGDE)BisCYLxqStUb2(_0G<`q4+lba@ls zPhZab7|w6EEcsvU^pH2H8FaRa48Ok+?M~pR#1RxwK2oHijIEP1FE}o~l=czg95h8W zd{T%(hN?~3DVg<1<27eaFPTn|Kf&Fvijn}s9DdsCk;2A(BbA7SZ3R>mD7$x57uU-LPdBj+nK9V7l??Hrir8J$cJ zgn#qej%K2X3_%W_dMz9MaXgr_TgrpXMr_`ZihC77v9^30p~rt!=d*2&_D>~6m-)(1 zir!RY!*a0(9HVRVmp*d(WCWAbKr_xaXf02)a!U|@wGfkmS4wOp24p9ZY~`F=#g=9z z7Ca;TG{O%e^a>XJ1xa@1#-@F6L_VE%+>jJ5gUP%O^C@%xR!*XlmTB59D$-lvatUqX z_`}*91E?+a_P)Ov%cu&nZ#NiAEzWXTVO__PXZ{YSMgN*H-fS-}V2OvxYLO=_&1!c^ z?h=s~2?E_UGJ6d1x;k^Xnh;Lbb+;uO);c<1MSL55+3}e+rKO&?aEt92r<(;*uXe=y z(W2ePqe z)3>qtUAI!Bf65)OWB6YS0yc#lUnBe&OnF~*df8kyvUJ~c)2jez?`9zyGur!O6t?}s2VXj-q;MI8yn#cG{X zoG;WT2M^NFb!VF*AF}HaI|xhYfl9&zBtm!JE1OE-m~I>xqQ{XgB;HoXG^|dl6<1f5 zOZHZ(;fkJ~d~<=CHmy~m&BP}8`EG2}FkqW2$EmlH*MDpywpe+bUc!=Xkyp$LS}FF~ z!=n$*TjA7}FIM@eng_2EAEtm0_Z*XYx-ZB=4EMA_r%o7eVYIaX`M(so~&;h#+;!7x%|ZMQiA3QHcF`+wH-V z?$EkjeEQDq=~Q6S>WZ#Ln#ZeuFA}ko-T2UJD5(CIJqG&U`wM4~orM7iWbu2AN>qoe zv)Hkhu9+gQv4`K~XG?!541UX1c&8ms%-(^d*UKYYSokbFOHF-L4K8H)P+LaUw{t_< zHGf-M8Vb)yy&>y!S`db^U?gbzlToye7th2Gr9d(!CWQB`XoN~8@jigPy$Q{{WlbjfAEl5&U{lSG?;AbL5u#yO)T4q@wAW!L*HN zJWe9ogq|gA^S2*+AJ#I?G&>mG>g-EJ7p`~I{7bb>4-FNL3~>Z_y}jji*rGhX_dpgvzNPR$9{_)F6H4ARmT}i7(WW8LB}`aXNag zf#$|U4{lQF^uD5eL?o!EMUO;h7jJDVMdmx*9i1G$T-viFOnUObOC3Q;-TBjD0TNDe=j7kDu!}JW?G?MsXJ?XWWAT;qHF= zn&O2hDJ4Z;4OT;)YV))%l?z*+M%Z-d@acQfgb?^KuMavYSGnFl^YE<+_8o% zsL9zz`*NO81O1&UBD?+T^FwTR`jzq14aITFMd{ofe~G%n;+1q&+dc;^ zW*mP`Mq+h(78)rP#3^=))Vnb`P|tJjCKG_S@1G2r3{z8a+2r3af~qz8!2EdhtcE@( z8#bkbJ0YY>5Ma95^gfw<*T^Z+dgFGsI{eX^*?gbk*~c0gbr+BA95Qlh`aJP5LgYXQ zFTChI0-VY8#pgxk9k%UqBz8u%$rhWUA^iaN`!_{5I=u<`Y!4Yr425;+kJ*_OWgUdq zmcXVfwkML47WZuTF0KcN?UR0^3-h`j)(Bn4Za7WH+axw0qYHLj<`6UdK=if;h0O6` zB$4mRRdi=V%L!{mvUF{aU&-C`gVqr^R zv-M`(8Y)qz-WYAy8~HHt#I`kFbw=vlC(!o0Vr4&b^@G0t;j-b~u^9?4^)wRZ?@vK3 zF_5)`v9*J)imQ#WJ@Bt|9V=_y`Mad1P@d!V1h5tsQkL(IRstA>youz4{0+6zzI7hG zllr3i1NUyhh6+s7H(UhRWpCarn`@kX%;c40Dzag zf%%^s8bgqs@o)2oMFHhI*|7soZgFHz>YQLv&1$3a*O0K@uw$Kkzk-^)q_T26Jvj&Cc{6fW;t%xSV%sP{_LB@08!SDj}oU})WVST zli1PHQY%(Jpb7_59wn92EClKi^y(an+NJ|O^x0UXtW%IKp=q%S-=MM6Y?3k|XrH&+ z(tSG8`}kg(0WRm!L6$A@wHP##!ztgWorgX&-xdh>VgRRL-=^B;jZh%8=jkO>tyOZ( z25A^%$E8ZU_%1KE=2(peO4T3x6W?k_y?K>}RpoVVySnqz6X9_er?s1wuLs|HX)f6r z%(qVl%1(r5Db3+z=`;2!iqZ1Qb_q(2w6U^wJh`CMXZ1%w{@2Eb6-k05ZGuss!gep= zRzlA|+ZyGumPhl3WSMlsQZ(eB>#Y~P631x@{%#>Drki;!od4AZq-CJKY zsq^~P>QP_q+t57M{LAWuOjp>e{$ch#XeDk4{HXtE*YESHzdZn&u z!gqrGy`{!bR2tcd)8gaR8hd-pmzW2)%R-%!lLg79_q?H~ zr?728BW5Ilv*&^yPMID=Zxoy zrl&iX=e6^;4YoQHI}ymDx( zh@v%66IlM>(d3u#4}%F)s=HO zjQ(l|cahpb*WJeb($#&>@t5u=N1+o`_taIQrJtR;B1q^*YbPMGpZ7%|y(yA?>*f8! z25XNZ`<=#VtA7Qa$ z-28F1W+O1W19iYT^r~=NC%GvZu)h?kY$VQBh_cd$XL6CrDaJs-k=4S(TSxC`#@_f- zw1)pn+a9X)h62~Ul!VGw2hr6OUj%oO%-E6|tIaZBOo$O_F4$h%o0M5Cd5U;?G@d^RqNh#n& z@KEb^>f78hYE=QCz*F=30OP$LC6vC_n28!bf=- z_>F9u1&mf~scLFGS!I3s&R?_7=MVhUjhPpb5L(I+ts5Wb|CQfYJcW8~DX!7m`A! zHNu?%)IkcL0aeR603PGViC`_=FI{T<5J37ZAf?@PEA31`i{*9HB+k~V)d%dNTqPAf z+3nt0T<*c7SwxwxTZ+VyN?QSgrA}T_`K1hEdR0C~iKz~_H!xb`F21M_U|Bf{X4C{d zXGfFey&y;AWnu~DL9Ro=rylVsK#-+N$Qq%3m>%|C{gG9*oY95=!)_-JUG!Ky={9Ah z!sgm1Z2d-yQt^7JyzQpC(zWRwxzaU5-oJOa_G>fI2>~|)hV^+^cJ)7)GQf4hmG&*Fh zfW~#=TKPVc)rSbH;#(HX^gMZJMB`c5W`)BC>AcBlnorU7XTcv1Rl0`_)Ar%h$V-UF zr*YF@Kgr3f0EgU$l9wEhpeT!Fm z@sJ&^9=ONQ$27QcSmMe6=QxjQpj=A4Hm%No>9#IS20L8d!d{>4!(g2Bv#5tJ}rV#bfRwkfYuHv8#wR=aKK3SIx~{$NEG*dilf5a zsvQaU)jH`H#c$15OXHkN@x^@ccbbZCag(-ut;)wagRFJSZ{AUk@`_L8G?Wh|wB&sU zvB-^h=ItL0z-Av;&QcsyL=I7a&!O5elPjO7w+e=cBR#uo(k z1dZ+|UM7~<4M+d#ILAQ_97AbUaUt$3mQ_ue+)d1h;KZPEST`s$oS~BAH&Og5$go@^zyYd(Z*r4C$)6vJv7il6! zcEgz-rCm)BoYWLMzBK2Z3OG& zS&DSp4#H)!b!+~*nZq?(PedwF(C)4{$9Cw?)t3QONeCVKgs^I^ECe@5uU!=1#~Oi1s5he0I0|V^bUleK*VHo)jHt8qp<6Je)83zQf6$ zx9r?__bad+@3pjDmol^gPo{swV0YyJIb!dOph>Itx+9?rY#E znY}E4=F#s-fPPU2FD1V!6Kz6_juAc>>fkl{LT)xRZjY<=_}<}6HZ+f70w(X13N$=MKY z?%3oxd)`!Q{o?D8uWJtbQ=4+am@$FO;Zc(X<5G>ua-%lpfRMDh@w4yc7Ks};TM*lf zFF4XAZp@=6%0(F96KBSc{EEWRa1>{mV6i>5h3&T|08~UN-L^upTmDiC;ih8Z*R_|? zE6Oj=WHk>OJi@O%-M5)3>T0-T-jQQYacnd>G+uWAG*p>h6OCQW#2LBdHw=Py2S=y! ziYK}2c-H3*Y6k>>T(vOvjhVkjXZ7mOO1VM7SjN^isA`{@(^@j?+h_JKFO5k`lfvIO z9Hlu$;sJ!GI&}IeFGotibA!RN;I7+PlAiE7iFd<+Re!qF^QxRQmc*M(CE|O}y zjW;YrninEK|I=v%-N|TmTNB|I9q5HggD#q7ccjs3VZbu{N%7*R2HW=Um@!`&TMsaK znCW}&n8@WycFoA0e|Ky)In-Qtp=pSNkE7ZTp8*%pHHSQ~9TZqtUweNlvlTwD`Pt8d zHv8*%H)!&jL%9*T0-4R#sMDf<{UgPJOU~N*{A-~$telM+h2wq@fZI=CJWy+RF91`Q zJ7+CULQHwQYIdS@uU@C5VAq4Co5$MS!icfCT{si1wx~X~obhUkZ`4zE{4B%(dFJ}O zEpwxW{1=~&qA+h2#BD-j0X2>RkjD7?`n3! zu~b*LjE$5(6IGmbItjV5alUXa&xOKVI~P;kEY2 z^xdF0C(YrecGYRoj+Fla}aLz)-5lSBYEbWDa3Cf8<%wi zZx?5#x5j&zr=G0VIFz=oA1T}3;&7?pD`qd}EKOkc9H{cIseXDvE6G_owWwx6YF1G- zUV?oznLe0CSAH#buB;|b@Z&b{0y6u7M9p-zWQ6Tx-=hS1$>m@Jbui*6Hbt?9CdJ?M zut?hDa{cOBTv~Oe1QY#iP(M5AH~};mkBSeWx;Mr77>UdkMp|qW3Mm#f6hKZil%W@w|Ev1boJ)SGC3OzO@*gCoM73i+vKi1cey1l~ zFO*Zq#`Bta<@EdCLc#zq*(30yU*G{%;r56rRV`Bo_q(W%s;Q+HQYO2pbWsH0R z6c3u=kXXnNVRue#aP>M8r|0soBOEu~Oh_%FSXna0NizGDLxM;v>xP5P;ApCv)o=s1 zs&*DnEu<4gX3ZQ!^{$BbB*_ivwfx}Na{q8s+2*n!NW21oU39B!F5bU6>AJeSVLPo# zuMOa<avST5qzgC(wQN%xQo&n=H;OilK+(4C+& z3E>|iobwN6R)rIlBho-0CD$ut9x^8?MAg{$m~XwCN!oNc(5@xt4bK9AitAVc^T(!O=2ZH6gJ!vi0c5g8}9xv(t5RE#r}(NKD`(T z`Q`byh5pAXKYv@r`?poxv#NbL#8Y*8GwL`i;m0hu0;%IGmRLORT>~p%8z;KL<8$QG z{haPy38&u$&VeV(8Ke~sL+6%j2z3s2E}`=i`!xxBDgwA9=G2RpkRxL4WDniQR0Hbs zf5;5Hrxc$&`&fZ7%KH2ki&pxU;PZtMoH;?Av1`N{FLWWSGD&fOO}TT%e$|fF+{D|K zH9C&GU|{3eYgf5L=EWNH2H|XB^el5E7CKb5VU9qm=xcyDv<}b0vh~GvHJ>e|Oz@cG z>VdE>!K@$8b3>+aZ^Tc*^+3NhS)&`LjviE%BKd#!Em+JRWU60Gf%?Kj{yru2avsS3 zi@x1UUUjfDvo`&!ePJ9S4}M{Rdrs~dI)8Cb`vy^C+SX`|4HX{p&uJ2UpknM2@kAL)uFj6v0Ez+i-dd-?LtD;Ye9+BN33pHJmAcRq5m*m&a|-*nRw0^{z}Q2F;AycVjLRPjVZw0AH^4 z5c#Ygg2%Enqc~8&uBq~-ZXbnEaa&+M-=TU_oxt{DLiRCw1H6LFtJm0n&FH_MdwQAb ze_a3N6x5#r{_H;ggKm0h`TW_7{uBLYTl62a9>QNn|IaY}6Z~gy=^wDhix>Z1vH$Hf z{R#b(&V)BeAV^`G!RN!Wkj`!Ce&i|W7U{r^z1 ze@gi?x&0$$j^ZyV|DRO%r<_0I{~tMzl>f>Ge**uEl7E00)PJwQpE08#1N-6@P*6xO PKkt`xbM-O~3hMs=r(*|? literal 0 HcmV?d00001 diff --git a/demographics_eu.csv b/demographics_eu.csv new file mode 100644 index 0000000..050e338 --- /dev/null +++ b/demographics_eu.csv @@ -0,0 +1,101 @@ +ID;AGE;WEIGHT;SCR;ISMALE;eGFR +1;34,823;38,212;1,1129;0;42,635 +2;32,765;74,838;0,8846;1;126,0 +3;35,974;37,303;1,1004;1;48,981 +4;38,206;32,969;1,1972;1;38,934 +5;33,559;47,139;1,5924;0;37,198 +6;53,758;50,819;1,6769;0;30,855 +7;25,306;59,304;0,97666;0;82,217 +8;39,897;26,452;1,0817;0;28,899 +9;54,975;26,931;0,90926;0;29,731 +10;40,732;50,878;0,87513;1;80,156 +11;38,603;76,539;0,9541;0;96,028 +12;48,539;113,91;1,0889;0;112,95 +13;28,818;65,829;1,3689;0;63,118 +14;60,933;63,769;0,94223;1;74,322 +15;57,703;37,54;1,3098;1;32,758 +16;32,41;70,069;0,84661;0;105,12 +17;33,799;49,636;1,0998;1;66,572 +18;48,453;59,107;1,1961;0;53,408 +19;40,826;74,732;1,1407;0;76,702 +20;71,319;56,43;0,8041;0;56,9 +21;42,783;92,953;1,3023;1;96,374 +22;33,136;52,328;1,4894;1;52,148 +23;48,292;57,359;1,1164;0;55,624 +24;33,338;58,429;1,2144;0;60,586 +25;29,682;43,66;1,1875;0;47,883 +26;31,032;27,411;0,95849;1;43,282 +27;51,144;72,894;1,3975;0;54,714 +28;35,21;48,149;0,96081;0;61,995 +29;79,292;39,792;1,1153;1;30,084 +30;36,843;45,248;0,93339;0;59,037 +31;19,187;47,021;0,77409;1;101,93 +32;55,554;60,889;0,98239;1;72,695 +33;41,275;98,419;1,2289;1;109,82 +34;49,346;33,442;0,89148;0;40,147 +35;75,654;60,327;0,99383;1;54,249 +36;49,241;53,166;1,1897;1;56,333 +37;32,226;27,841;1,1696;0;30,288 +38;47,13;65,791;0,91439;1;92,807 +39;44,668;65,805;0,76055;0;97,378 +40;32,888;63,813;1,4133;0;57,096 +41;37,504;61,231;1,0209;1;85,382 +42;42,549;45,632;1,3726;0;38,247 +43;56,144;67,582;0,79364;1;99,178 +44;34,77;57,067;0,98974;0;71,629 +45;31,832;38,174;1,4367;0;33,93 +46;30,435;48,515;1,563;1;47,233 +47;41,047;93,403;1,1259;0;96,911 +48;53,297;37,35;1,22;0;31,337 +49;35,503;43,038;0,97497;1;64,067 +50;41,404;47,253;1,3553;1;47,742 +51;37,669;41,304;1,0758;0;46,381 +52;41,054;45,695;1,1881;1;52,855 +53;43,664;24,765;0,88557;0;31,805 +54;43,823;47,549;1,0754;1;59,065 +55;58,715;20,427;1,2121;1;19,026 +56;47,693;44,151;0,99899;0;48,161 +57;44,787;37,563;1,3257;1;37,469 +58;60,371;31,4;1,1461;1;30,3 +59;25,971;64,709;1,2816;1;79,966 +60;27,241;42,836;0,97056;0;58,752 +61;46,255;40,175;1,0276;1;50,904 +62;26,975;56,213;1,2438;1;70,949 +63;41,258;38,281;1,1347;0;39,329 +64;74,954;52,581;1,1033;1;43,053 +65;28,897;55,729;1,3016;0;56,16 +66;35,299;49,359;1,1729;0;52,014 +67;46,802;63,445;1,3773;0;50,682 +68;43,82;63,29;1,1943;1;70,787 +69;35,988;52,836;1,185;0;54,75 +70;21,981;56,093;0,97028;0;80,548 +71;32,63;71,382;1,0415;0;86,872 +72;50,332;43,704;1,0128;0;45,678 +73;51,046;40,91;0,97496;0;44,066 +74;42,821;35,829;1,2623;0;32,564 +75;50,054;98,706;1,2275;0;85,385 +76;40,431;42,513;0,90545;0;55,191 +77;71,791;42,516;1,61;1;25,018 +78;39,383;58,177;1,4767;1;55,054 +79;52,219;26,453;1,102;1;29,265 +80;45,822;98,223;0,8463;0;129,04 +81;66,551;38,448;0,93366;1;42,009 +82;44,64;35,093;1,4036;1;33,114 +83;46,244;39,316;1,0477;0;41,534 +84;27,477;58,911;1,1565;1;79,608 +85;34,575;54,387;1,0842;1;73,449 +86;43,471;55,549;1,4283;1;52,141 +87;62,199;49,899;1,612;1;33,448 +88;41,567;42,826;1,3073;0;38,069 +89;41,272;54,868;1,3139;0;48,674 +90;38,507;49,696;0,99456;0;59,871 +91;30,317;44,18;1,06;1;63,496 +92;44,131;36,558;1,395;1;34,894 +93;20,009;54,23;0,96673;0;79,463 +94;69,412;52,667;1,0861;0;40,409 +95;43,751;48,553;1,2586;1;51,569 +96;31,245;58,631;1,0564;1;83,832 +97;45,48;53,108;1,7202;1;40,53 +98;61,124;27,529;1,0137;0;25,286 +99;33,803;22,206;0,87682;1;37,354 +100;40,145;58,733;1,2478;0;55,487 diff --git a/iv_bolus_sd.sas7bdat b/iv_bolus_sd.sas7bdat new file mode 100644 index 0000000000000000000000000000000000000000..6734842549cdc75ec054440ac1e56bc44941ca97 GIT binary patch literal 28672 zcmeI3Xn0ynVz27uG+3XT5-#X zTd!e(vm&DEFzT+VfU1D1fU1D1fU1D1fU1D1fU1D1fU1D1fU1D1fU1D1fU1D1fU1D1 zfU1D1fU1D1fU1D1fU1D1z<-AV&6s9lM~iF>y>)H$dau)^diM2dW|346SZ`+dz)AJP z>eTPAK8aZ78#avi&OuFylIj8Lr7cg?Ke09db@}~7eH2m86ZLhPq3)^*s0yeGs0yeG zs0yeG{MRV(@>x|)`QLU}J^ufu-*dy^RnzMjU?uK178lRe+viy|i1ikxpI3*H*R{kEkwA;<435BW+z7!|j5*^lmOE)z?5Ze_lwz z+>KQXzAubbfm0YJJaFdJ$sw~Q2h0p-LPH|Lr;1-r2??9}SS;0?CBo9SQY>|AOrPsHqo^ph%rCzO-4)f?01Ul6qnNg@XP8k4cdmQq8*djXk{m!QyZHs%k1CQp#BFIkN*BA6BpE{w z7^&3bk&*Q6WnSfZ<4v;fNa}>YeR94e7ii^?)C-SKZ%as?>!KKwfxelR`SYS36iF7m zoc8Q;-eRdpYQTCAyRvZ@?Gd(93y-&5ymd`mq^Ktc_o_xZ^OtFlFeVRQBwiZq8kj=H zv@-O7kxD%t8Oh?R!^0c(bt3zYq)s^L(JHX{`HqUDUg!|!btd%@t4K0%=-HfKPPL)) z2+z)fU)0&`)J=^PV`|_`PntF=Cj-%icTn!JcRk| zJ-qJ5Zj!Wbut!GHYgT344<)poNQxdM*xgLbr1J<#z2LWY{^eCxj-;MQG7$CBVV*~C zI**WK;kU$YYkWJ?V;zzjXbSCm-Mc8wxF+T!B(<>AX2^G4w6sS^aca1;~Ht-gi1*2g~$h8 zqq4ixZX(IR#JEp~+1_tWdW0kk{`a!ZjO<2_bx3Mp-MJB)$NWlO#758<0mvvg*aj@`z-5wT7fl$nv{#Y@8FF9Z2d0 z|L^1a?>I)U(vf7Korg{LNp`eHNV4FWz>b`~p3WmAHE^eJ(g&L~m870XYN2$@G46Rn zu_DRA#BrI={B7wx!k9c1|5F!k9z*AmOM~;sNZ#Havv~0*u7)E`WCxNuA);^N4P7VD zZX&4{wme>95#cUful2-apz$!SFvy#pk1!?+txhHsel?_-Vmmc3cl;BbWzYjsd5o!r zk3&n{23yi=ZzMTboW0JT<>@s8k~}P6_l@c0v6no%#?S*sDxx|ypG@4t8p+u!Q{rsG z>1!TIozP-gQskx%E>IKGgz`x0g)wo52e*$&*?@QL&}qv`nwW3q7jrdx8bCB0^- zY1NC?bpm^<&z9rQ+!aZ^ zQ2LwW>~U?BNd`PMV;glXmygugcRV`_k4`*@&nuPBe050;yy&&()<2dqitQBduO^+! z_c(rCQBMxqEx+}(;Cfb(6z_pYFLd44HiJ!5N#&8$fNkrvkkP}f z6-h1ZU}mz9Bjk5t=sU*bU~fV@UN}^$s5}qnKi&UF`dr#g>B?<@M@BNPtkdZsvGllv zq)ynETA33g&~75B7n)xBs>eldsfQ*!LXv^cEH94^&`A6BkSq*#@#*L}O8!Ix^~9JO zXg06?%g=(WNRN=z!Uv9(x$$dgkC5bGV91l?#h;Xr9wEuY;Fy3**F&;M($mnc26$v7 zW8Z9YKKCw1_6SLxu%Tq-mFZSp6iK}hTHJrphQGY6NHXxvQU9;^+@j|rJUa{N4o|jZ za`b$Jqy}Q&eehuHPx3o4bQ4J}JOod4oxVdSUtQqY zrE9za9vR6dX5NYK9j3DbNu6+YtnI+Bg5DvOM^Z1eI;tJy{t@jak_^n-bv$WkF+J8H z$-=aWo!_`-MbAe_iuYH$25(rcYpU2zEkvBQar>9seNuUh$w8aor*9k#rdJn8@=!23 z%;|1`{B9XNl18iscw{6GR_qxww;w&rAgL2R9yNLM<9DRim>8Fk)C(CeiqoUEd5|h0 zDgOSH)3R6Z1JX!QPfQl9+pp*pP$s_ zImoH}L$~U%e3f1|rZ^uRd71k~JiP+!W>BR89vMlidjZ$JI3Tsggi1*21ak}5F&V$p z9wDh0wzkOj>uE=42a*izJvgJ~+-tO(NV0ICJJ&YvL`%|lBsFko{l9-mIM7&;)PmdK zb-P9mx=DJ3F*!Kva3L)+RDQRN9wEs?W`tSK&FeGCm{K-0z#}8MyN&s@tQFE2ZNeiY zb%J-w$a(We&~73r&PVOf&l+)$9u<*f!1m5$izClyH<4uFa`uw~>($az(t6sdfr{m~ zu082wPI`nfwea_ZM@7*C=sZG_gGMvg%r5kx^9V^E<~{$%-OBIfPuS5-Y36T$M@Di* zWSi?Y1ElM~MBg#kpV+>3|Bq)9pSw@#Heh%Xorg&31&58nesgEgBOsCttmC)6u=7$T zSs42K+k4<`PdbM&H8ATfZhi+Bb45}M9x;`D+ApF##+V!|`7!gOf7$1g*@Ywzo>P3T zI+mr8q;w~Xc8HGEeEf6?KBo{r8NM`s8L1P(eBRjDswJ;T>V;0N z7Wfr8^i(7n__k$%?^tW;Dz4}3V*Zssz5VqOC&ic=nAqu`gNKy-MUfPrQ{==6QP%+(0nPg$o$60FDvvQa zC^Bo)^->x=)*;EmN0FiNSwEj4J(9+u26$v7=L9=6KXX+edxWG;NVsw8oc$hptV2?K zPSNB;*Tft;(H0%5t2H= z@A{P42g9U&6W2VFdcpF`xk=W=bRHqez=|><@b`E+kC0?x>gv<%%|`UP4oM9h3tgZ5 zV-Fj}9%*6ii{s0R*3t73#^hk9_rhnkBk4Rsl84pv@^AL}kj^7%&D8*pjO5z5Cj0Ik zq(?9$bwc8F|B{DGY2}gB3v)`2=LYSbmLgpTCb9!bo$xYm`Hd?@^a>D3z0l&~IFH}MX^)U(;97k8{G6XUV*Vq z9BkFz56y2R`~6DRCLTuE+Pwe#^J8S2q!AC#QqNdvB<)kDMEg9It|JruL{ca8zu>(r z%$N2UN%6g=7gN~>9?B#GrQT*1C!2s9RM5RRL82RRL82RRL82 SRRL82RRL82Re}Fh1^x%4QJ47u literal 0 HcmV?d00001 diff --git a/iv_bolus_sd.xpt b/iv_bolus_sd.xpt new file mode 100644 index 0000000000000000000000000000000000000000..e2a40e73774d810238cfa4fd962618c9d817502c GIT binary patch literal 14240 zcmd6uXI-3u2FJ%Y)Ax+h(|<&CKCj~i$g_RPZkXjqwXqQZ`o<8OZT^5wm(!e&C~ySyPm4< z9!w?#p}?RpE-WZuYFOZKVVg3M4?|(oxe4OmIS89ObK|&h6wW={MEj?49Ov!k?)~ie z?D;w`H&2dJ$ZYN-QChmEhp&gH?;m}9y}YEKpJn_n+wZOkUvF?wF#mK6UhaB5ykP`9 zyxb+{;Jm=_fUwYru$Rk&w66W{#;xlzpN?Z7ia`I@;}?R0gTh~Qy=2(L7yNV{^msZE zOcXs5b%IsczsNEFw8v+R!c3S*z{^almpd%f(sMy#@|pS8hHtcK0$x!R8?()GrE(+(Q4zA z>zh4`&~iK{@7kM9NcH>24ELaXT7kn5CyxC9Mth5)I@r1^ZK4c5h` z^1TyB+G88{%-B`OAc>;`Y5cZ-*E2}$$p8LCeq4=_JB4%?>Vp;tiX#b$mbUiG=y*b_ z6O+VI_LNaOrtEiuY>`b&5=U|8Zuy))y?{Y-csyfgI?rjj+sYudm>e=OW8K!)G6t!~ zq-E7C{gtF2oTjU6n#L7qhwj0&ONkCO9&`v9;{m z;Q5ryP@#8dfuJ~&kZ5J|+n*eqw-?S3jXWlaqwGHK?cmdCHZe&Y=?2DRv}MqqGC*=d zr6~6HH@BmHZItgFX));%vdeo8qOA_d)MHXvd(XGO=2iJj;z(CsHp|J=he3iPr`pUN zz0P|wNN}W%J`^*zxQ0UFD-o(xD1e|il8|U|pMw>P&1s{AN#bb9+@+s>wBt+27TLrk zakS#jjFHJTX$+E6bZscrurL3((Io~6+3q`^gqM9zTOB~?yCN0HYE;hqrL!N%XA(z= zzlGg7Rae3w!I78xJ>|4yD+URUERwEn+&#I1LZ%7zK??-Mk%UBl%bV9nYpH}YL?e$$ z;wU}k(4H}|x8+IV=-rGenLsNgOThd2Wd56WUD1ByqIJe4lB^njHB|#3{Nq6pdD$b}y#At>s`s zi}xOWcl91Vv5ZVTCi^<=2%CNGjyy>mrOaeWUhUFQ$;VKA}qH^GHIXOoM00fk@gD0h7eh+B=Tl1TI`9-#c-XSv_yy z!(iH+$C-#zbZsbkddn*V_AX~+YBA|zc0MBVY#oEt3#8_OO-e+QEp0Agf;fs8GTo&# z;hKCVapZ8O*GkJm+FXK6?8v|U*e7!366N>OtUt{)pPPBIO0_yS6qxXe8(U;p;B~cL)ct*d(mA{S|@q0n^rpS>ut-n zFf#QxbI_^0%95L+86wRjj&j_BR~$DmdMA$Djx3MNxRxT{I}hu~_FO;hq~@O}WVuiu zv_L?0qLG)7yycS^d~@%?ytk~pfGG9&MD zyn#V-LZzr;uF-&TW(o$W#pK+I;0?c))7B=wBBaNp*Oo6EvRv%sGr31Q92~b7%-Vvwh+EA0D>9;q$Oc$!8Kr^9uTW(R9d~L4qTfhBL*-jMh@fR-ryuGh7Q77VrO3~rBc7z+7Y2|w- z&!f<@eKt2n(N-MvuE&|qf8N(}eG$zOCc%-WtnZ{T8_RO!dnb;jrMr%F)H3QM&m)H; z(T30ilobcP3ROOTuj}qeO7hCGKz%@u-reg2k~q2;JIOt(f-;wYB#s`QnmuZz$tL+s z#NqKgaUpV?YcAu8h@)u7j~5nZP~I6x?|Pj1#*sJpb#WKudnae8hOZrNb~=MW!uO;1 z=G1kZ)s`?wSVz9cE@`fRRYW0GLVeHzL2)D@c{R$4DsDRnXFOj=JSM>ruXJwzG;Z1q zd6GD~Gx>yZ!(AnV%=a(%{0#1V^^uqNGDs~ZC);Zh3_ps#%fPik zok)+#Ve?1DjPPG5pGh43;Jf03HWbPrVI9pnb)h37Y7Bz}M??H4rT4tGl0vG5`k(~@ zu0u4Q5)wVw`G$}35arpyW0E*hO?`J!=?w&~;HDoP%XVXXZr-& zc;ep4y>7P7IsB@l=r4P4ZBQq1dp=I_mtS;>)Y6}>ywMk z7$i6vH2&V@*E-uNB=-FhjwB>+9%OdC?>isMXA(!sqL3g@owGbi99<2s`fZMqc2CB= z6Gu&Z|9WTQ@m5Br7TcWm`YUDQ|3$mkL8cy)PJzCi36n0#XA(!p6-_PQsS+3@&ejBm zq4+*!whR&+4QQ~r8mx1pkd{Jy&;k+Q*4TCE>72(u@e-1^{ri5y=(oqh8Qn?Z$l6@J z-A??kSJF&y#M@Zb&0MS=ET4(EuCf%o+B7=6+I$^@)Cwe8@hB?3Rw;h3lUxy5N25#f z*7chr`unG}PH=>jQy0c9t5nMOPS#PjTSCpRW#XAE>7BgSxwKgE{u3rpGOdIwfffj; zPBiiolDFQzBRr<90M6)65=XXi!;hZk#rF`X6K4`fR-c*18hkg%XCh9hlvjDD&l;EX z3xm`OBwCx5P@j^J#vt{W9P8T<`qm%BPi;w^#F08b&u0JB$MTuP(Fa9e*gPmSFi2QO z-U+ui#v&bs>?_mi8!HBUhSRG7F&%Nq*fqNmT!8hP39^FsmEl{mZhI*-WNaLC3O-vx>>KkfV-s`aVJC~$R#NkZyTLq_VK5V0m5@c!x61`v3;N2Wd zyVn7!$K)74yAZn*qW|(idPT&M@?gNNcto2^IFmR^`hH$xc~5xod3W%Yjqig zv=Qoq76_0Ab{zCXo+OSedhJ`c z?dTK+368erEsHBOVf0Sc5poFG9&Pp=C9}U!AGAPF97#ytZa`J$Y9rbc7L&wL|M>_1 zqPsp5vPCvANgVaPXj+uD%9ufNLZ!S#!OcYf2HN`)WNHNx<=9U+p{pFq$kb!fCv|3| zUo-8=g)@mG%dK@EegC*nK9k(*R_A12Y#WfnAYmPiDat&%y~C42+6whS3k1cHgygLc z-rL`MJ*`em5=VVMd~ivdpoeUcBTN!UmS3k<9==3dn?Q0xrM!B}==PsoM=&zA0*NwC zUYt=k@eza6V-ndGethLM6L}IGX*$#gs((=LWstCr^3QKE9@V>?L4uN5fw2%(u3qy@S~a^+5~uQ5;D~-u7W|w(bD!S&Yf&j>6}E6i`EZc3_e?vU0s| zJ?Am)?hPa-RLYz8@ZNOn&3dR;G(&*Y3M5*g+cjiZvW7wGG3nFe@$^k!SILv$2q_z~ rcZ^TG%OGJLExeq2ve&P)??<>!@;vf;qvdt~^Q*gOO8>tMxzGOrmC{v{ literal 0 HcmV?d00001 From c300a7a12e8d24c5a805b33804b52ff4b7de8285 Mon Sep 17 00:00:00 2001 From: Juan Jose Gonzalez Oneto Date: Mon, 26 Jun 2023 22:20:18 +0000 Subject: [PATCH 05/22] Updated `.gitignore` file Ignore all files created when running the code examples --- .gitignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.gitignore b/.gitignore index 3c1ee6a..6859a07 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,6 @@ # Python env env/ .venv/ + +# Files created when running the code +*_new* From fd8a67f3f3bf0911bf4aa296dc33f5cb1c3ded53 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 27 Jun 2023 11:38:54 -0500 Subject: [PATCH 06/22] Added content to the homepage --- docs/index.md | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/docs/index.md b/docs/index.md index 453ee43..5a57e94 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,7 +5,25 @@ description: Data wrangling workshop covering data I/O and the use of DataFrames [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) -Short summary about the workshop. +This workshop is an **introduction to data wrangling in Julia** with a focus on data I/O and `DataFramesMeta`. We will cover the following topics: + +- **Reading and writing data**: + - CSV files + - Excel (.xlsx) files + - SAS (.sasb7dat and .xpt) + +- **Select**: + - Selecting specific columns and rows using `@select` and `@subset` macros + +- **Transform**: + - Applying transformations to one or more columns using the `@transform` macro + +- **Grouping and combining**: + - Grouping data using the `groupby` function + - Combining groups and summarizing data using the `@combine` and `@by` macros + +- **Chaining**: + - Perform all data wrangling operations in a single block using the `@chain` macro !!! success "Prerequisites" @@ -27,9 +45,7 @@ please send an email to . ## Authors -- Author 1 - -- Author 2 - -- Author 3 - +- Juan José González Oneto - ## License From 73a98fe595ee77b019faa579114ebf1b2544cc6f Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 27 Jun 2023 16:27:08 -0500 Subject: [PATCH 07/22] Added schedule --- docs/index.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/docs/index.md b/docs/index.md index 5a57e94..9b83a0b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -34,9 +34,15 @@ This workshop is an **introduction to data wrangling in Julia** with a focus on ## Schedule -| Time (HH:MM) | Activity | Description | -| ------------ | -------- | ---------------------------------------- | -| 00:00 | Setup | Download files required for the workshop | +| Time (HH:MM) | Activity | Description | +|--------------|--------------------------|---------------------------------------------| +| 00:00 | Setup | Download files required for the workshop | +| 00:25 | Reading and writing data | Showcase `01-files.jl` | +| 00:40 | Select | Showcase `02-select.jl` | +| 00:50 | Transform | Showcase `03-transform.jl` | +| 01:00 | Grouping and combining | Showcase `04-grouping.jl` | +| 01:05 | Chaining | Showcase `05-chaining.jl` | +| 01:15 | Closing remarks | See if there are any questions and feedback | ## Get in touch From c69110e29f7c79d2adf11bd0180d402bc2d3a5be Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 27 Jun 2023 16:27:28 -0500 Subject: [PATCH 08/22] Small corrections to code examples --- 01-files.jl | 14 ++++++++------ 04-grouping.jl | 7 +++++-- 05-chaining.jl | 5 +++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/01-files.jl b/01-files.jl index ba5c4ce..38bd954 100644 --- a/01-files.jl +++ b/01-files.jl @@ -7,7 +7,8 @@ using DataFrames # Note: go to the workshop directory before reading the CSV file df = CSV.read("demographics.csv", DataFrame) # read(, ) -# As an example, let's change some column names and then save it +# Writing files +## As an example, let's change some column names and then save it renamed_df = rename( df, Dict("AGE" => "AGE (years)", "WEIGHT" => "WEIGHT (kg)") @@ -20,7 +21,7 @@ lowercase_df = rename(lowercase, df) # Make all columns be lowercase CSV.write("demographics_new.csv", renamed_df) # write(, ) # CSV.write("demographics.csv", renamed_df) # Watch out: This would overwrite our original dataset -CSV.read("demographics_new.csv", DataFrame) # Check our new files +# Check our new files using VS Code ## Tip: you can read/save data to a folder CSV.write("data/demographics_new.csv", renamed_df) @@ -41,8 +42,9 @@ CSV.read("demographics_eu.csv", DataFrame; delim = ';', decimal = ',') # You can also use these keyword arguments to write files CSV.write("demographics_eu_new.csv", renamed_df; delim = ';', decimal = ',') +readlines("demographics_eu_new.csv")[1:3] -# Many more options: https://csv.juliadata.org/stable/reading.html#CSV.read +# There are many more options: https://csv.juliadata.org/stable/reading.html#CSV.read ## Excel (.xlsx) using XLSX @@ -66,12 +68,12 @@ DataFrame(XLSX.readtable("demographics.xlsx", "Sheet1"; infer_eltypes=true)) # Writing files XLSX.writetable("demographics_new.xlsx", renamed_df) # Same syntax as CSV.write (, ) -XLSX.writetable("data/demographics_new.xlsx", renamed_df) +XLSX.writetable("data/demographics_new.xlsx", renamed_df) # Save to a folder ## Watch out: if you try to write a file that already exists, you will get an error XLSX.writetable("demographics_new.xlsx", lowercase_df) # Won't overwrite, like CSV would -## SAS tables +## SAS files using ReadStatTables # Reading files @@ -92,4 +94,4 @@ begin root_files = filter(contains("new"), readdir()) data_files = joinpath.("data", filter(contains("new"), readdir("data"))) foreach(rm, vcat(root_files, data_files)) -end \ No newline at end of file +end diff --git a/04-grouping.jl b/04-grouping.jl index 1afc1cb..a0f4b83 100644 --- a/04-grouping.jl +++ b/04-grouping.jl @@ -11,12 +11,13 @@ groupby(df, :WEIGHT_cat) ## Tip: groupby can take multiple columns groupby(df, [:ISMALE, :WEIGHT_cat]) # Now we get 4 groups -# Combining +# Combining (@combine) ## A common thing to do after grouping data is to combine it back with some operation. # Example: mean age for each sex group grouped_df = groupby(df, :ISMALE) @combine grouped_df :AGE = mean(:AGE) +mean((@rsubset df :ISMALE == 0).AGE) # Check the results # You can also use DataFrames that have been grouped with multiple columns combined_df = @combine groupby(df, [:WEIGHT_cat, :ISMALE]) :AGE = mean(:AGE) @@ -27,10 +28,12 @@ combined_df = @combine groupby(df, [:WEIGHT_cat, :ISMALE]) :AGE = mean(:AGE) @combine grouped_df begin :AGE = mean(:AGE) :WEIGHT = mean(:WEIGHT) + :n = length(:AGE) # Calculate the number of subjects for each group end # the @by macro: groupby + @combine in one call @by df :ISMALE begin :AGE = mean(:AGE) :WEIGHT = mean(:WEIGHT) -end \ No newline at end of file + :n = length(:AGE) +end diff --git a/05-chaining.jl b/05-chaining.jl index 835758f..34f80a6 100644 --- a/05-chaining.jl +++ b/05-chaining.jl @@ -2,9 +2,9 @@ df = CSV.read("demographics.csv", DataFrame) # Get ages for all female subjects -female_ages = @chain df begin +@chain df begin @rsubset :ISMALE == 0 - @select :ID :AGE + @select :ID :AGE # We didn't have to pass df as an argument end # More complicated example @@ -19,6 +19,7 @@ end :AGE = mean(:AGE) :SCR = mean(:SCR) :eGFR = mean(:eGFR) + :n = length(:AGE) end @orderby :SEX :WEIGHT_cat # Fix ordering From d8fc7445bd4b4733c39a13532dd09c33027db19d Mon Sep 17 00:00:00 2001 From: jotas6 Date: Wed, 28 Jun 2023 15:38:42 -0500 Subject: [PATCH 09/22] Added reference sheet --- docs/reference.md | 83 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 75 insertions(+), 8 deletions(-) diff --git a/docs/reference.md b/docs/reference.md index 5af3ec2..aca3b60 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -6,23 +6,90 @@ title: Reference Sheets for Pumas-AI Data Wrangling Workshop ## Key Points -This can be either a markdown table or a list. +- Before reading or writing data, make sure that you are in the correct directory. You can check the present working directory with the `pwd` function, +and you can navigate to another directory using `cd`. +- You can enter the `shell>` mode in the REPL by typing `;`, which enables you to execute system shell commands (e.g., `pwd`, `cd`, `mkdir`, etc.). +- There are various data formats, but CSV is one of the most commonly used formats. +- To read a CSV file, you can use the `CSV.read` function, and to create or write a CSV file, you can use the `CSV.write` function. +- The `CSV.read` function takes two arguments: a file path and a sink. In most cases, you will use `DataFrame` as the sink. +- The `rename` function allows you to change the column names of a `DataFrame`. +- In some cases, CSV files may not use commas for separation. If that is the case, you can use the `delim` keyword argument to specify +the character used in your file. +- In some regions, commas are used to separate decimals instead of dots (e.g., `3,14` instead of `3.14`). In such cases, columns containing +`Float`s will be interpreted as `String`s. To avoid this, you can use the `decimal` keyword argument. +- The `XLSX` package enables reading and writing of Excel files (.xlsx). To read a file, you can use the `XLSX.readtable` function, and to write a file, +you can use `XLSX.writetable`. +- When using `XLSX.readtable`, you need to specify the sheet you want to read from since Excel files can have multiple sheets. If you are unsure +about the sheets in the Excel file, you can use `XLSX.readxlsx` and `XLSX.sheetnames` to obtain a `Vector` containing all the sheet names. +- SAS files (.sasb7dat and .xpt) can be read using the `readstat` function provided by the `ReadStatTables` package. +- Currently, `ReadStatTables` only supports reading files. Write support is experimental and not fully developed. +- You can read and write files from different locations by providing the full or relative path instead of just the file name. For more information on +specifying robust and complex file paths, refer to the [Filesystem](https://docs.julialang.org/en/v1/base/file/#Filesystem) section in the Julia documentation. +- To obtain a `Vector` with all the column names of a `DataFrame`, you can use the `names` function. This is particularly useful when examining +`DataFrames` with a large number of columns. +- `DataFramesMeta` imports `DataFrames`, allowing you to import only `DataFramesMeta` and still have access to the functions from `DataFrames`. +- You can select a group of columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta`. +- Instead of specifying which columns you want to select, you can specify the columns that you **don't** want to select using the `Not` operator, +which need to be called with `$()` (e.g. `@select $(Not(column_name))`). +- You can select the rows in a `DataFrame` that satisfy a condition using the `@[r]subset` macro. +- The row version of a `DataFramesMeta` macro can be accessed by adding an `r` before the macro name (e.g., `@rsubset`, `@rtransform`, etc.). +These versions are useful as they eliminate the need to broadcast all operations inside the call, but there are cases where it is not possible to do so. +- To remove rows that have `missing` values in a column, you can use `@rsubset !ismissing(:column_name)`. +- The `@[r]transform` macro allows you to create a new column or modify an existing one. +- The `@astable` macro enables access to intermediate calculations within a `DataFramesMeta` macro call and allows operations on multiple columns simultaneously. +- By appending `!` at the end of a macro call (e.g., `@[r]transform!` or `select!`), you can modify the original `DataFrame` instead of creating a new one. +- The `groupby` function is used to group data in a `DataFrame` based on specific columns. When used together with `@combine`, it enables +applying operations on grouped data and generating new aggregated results. +- The `@by` macro provides a concise alternative to using `groupby` and `@combine`. It allows grouping data and applying operations in a single call. +- Including `length(:column)` in a `@combine` or `@by` call will return the number of rows in each grouped `DataFrame` as part of the aggregated results. +The column name used does not affect the results. +- You can perform all your data wrangling operations in a single block using `@chain`. This block can include both `DataFramesMeta` macros and functions +such as `rename`. Additionally, `@chain` passes the `DataFrame` as an argument to every function and macro call. For example, inside a `@chain` block, +you can write `@groupby ` instead of `@groupby `. ## Summary of Basic Commands -| Action | Command | Observations | -| ----------- | ------------- | --------------------- | -| placeholder | `placeholder` | this is a placeholder | +| Action | Command | Observations | +|-------------------------------------------|--------------------------------------|-------------------------------------------------| +| Get the current working directory | `pwd()` | Equivalent to running `pwd` in the shell | +| Change the current working directory | `cd()` | Equivalent to running `cd ` in the shell | +| Enter the `shell>` mode in the Julia REPL | Type `;` in the REPL | | +| Read a CSV file | `CSV.read(, )` | The sink argument will be a `DataFrame` most of the time | +| Write a CSV file | `CSV.write(, )` | | +| Change the column names | `rename(, )` or `rename(, )` | Using the function version can be useful to apply the same type of change to all the columns in the `DataFrame` | +| Read an Excel file | `DataFrame(XLSX.readtable(, ))` | | +| Write an Excel file | `XLSX.writetable(, )` | | +| Inspect the sheet names of an Excel file | `XLSX.readxlsx()` and `XLSX.sheetnames()` (optional) | The result of `XLSX.readxlsx` will print a table containing the sheet names. You can optionally then run `XLSX.sheetnames` on the result of `readxlsx` to get a `Vector` with all the sheet names | +| Read a SAS file (.sasb7dat and .xpt) | `DataFrame(readstat())` | | | +| Get the column names of a `DataFrame` | `names()` | | | +| Get the values from a `DataFrame`'s column | `DataFrame.column_name` or `DataFrame[!, column_name]` | The dot syntax is more readable and easier to type, but the indexing syntax could be more intuitive for some users | +| Select one or more columns from a `DataFrame` | `@select column1 column2 ...` | Can also be done through indexing, but the `@select` macro is more convenient and expressive | +| Use the row version of a `DataFramesMeta` macro | `@r` (e.g `@rsubset`, `@rtransform`, etc.) | | +| Filter rows in a `DataFrame` using a boolean expression | `@[r]subset ` | | +| Determine whether a variable is of `Type` `Missing` | `ismissing()` | Can be used with `@[r]subset` to remove missing values from a `DataFrame` | +| Create or modify a column | `@[r]transform ` | The expression is written in the assignment form (e.g. `:column_name = `). If you want to create a new column, then the assignment should be for a column name that doesn't exist in the `DataFrame`. If you use an existing column name, `@[r]transform` will modify that column. | +| Access intermediate calculations and manipulate multiple columns at the same time | Include `@astable` inside a macro call | Should be included before the expressions corresponding to the macro call (e.g. `@[r]transform @astable `) | +| Use the in-place (mutating) version of a macro | Add `!` add the end (e.g `@[r]transform!`) | This will apply the changes to the original `DataFrame`, instead of creating a new one | +| Group data in a `DataFrame` according to one or more columns | `groupby(, )` | If you want to use more than one column, `` should be a `Vector` of column names | +| Apply operations on a grouped `DataFrame` to create aggregated results | `@combine ` | | +| Group a `DataFrame` and apply operations to create aggregated results | `@by ` | It is equivalent to `groupby(, )` and then `@combine ` | +| Perform all data wrangling operations in a single block | `@chain ` | It is not necessary to pass the `DataFrame` as an argument to the macros and functions used inside of the `@chain` block | ## Glossary -`term1` +CSV files -: Definition of the term one above. +: CSV stands for **C**omma-**S**eparated **V**alues. It is a popular file format that uses lines to represent rows (observations) +and commas (`,`) to separate values (although other characters such as `;` can also be used). -`term2` +`DataFrame` -: Definition of the term two above. +: `DataFrame`s are a versatile and widely used data structure that represents tabular data. You can use them in Julia through the `DataFrames` package. + +`DataFramesMeta` + +: A powerful package in Julia that extends the functionality of `DataFrames`, enabling advanced data manipulation and transformation. +It provides a concise and expressive syntax for defining data transformations through the use of macros. ## Get in touch From 38ffa1b27777a8a6988e7ea41bdd0367d0e1db88 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Thu, 29 Jun 2023 09:44:43 -0500 Subject: [PATCH 10/22] Added instructor notes --- docs/instructors.md | 59 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 58 insertions(+), 1 deletion(-) diff --git a/docs/instructors.md b/docs/instructors.md index 29b0513..bb1c98a 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -4,7 +4,64 @@ title: Instructor's Notes for Pumas-AI Data Wrangling Workshop [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) -PLACEHOLDER. +Start with `01-files.jl`, which covers file handling in Julia. Begin by emphasizing the significance of working in the correct +directory before reading or writing data and how omitting this consideration could lead to errors. Show how to use the `pwd` function to verify the present +working directory and how to use `cd` to navigate to another directory if needed. If there are participants who know how to use shell commands, you can mention +how to enter the `shell>` mode in the REPL by typing `;`. Next, focus on the CSV format. Make sure to highlight the importance of this format and provide an +in-depth explanation of how to read and write CSV files to the present working directory and to a different data folder. One of the examples provided involves +using the `rename` function, so make sure to go over how it can be used to change column names in a `DataFrame`. + +Next, go over the use of the `XLSX` package to read Excel files. Start by explaining how to read an Excel file using `XLSX.readtable`, emphasizing that it is +required to provide the sheet name as an argument and that most of the time, you will want to convert the output from `XLSX.readtable` to a `DataFrame`. +There may be questions about what to do if the user doesn't know the sheet names, which you can address by showing how to use `XLSX.readxlsx` and +`XLSX.sheetnames` to obtain a list of sheet names in an Excel file. You might also find it useful to demonstrate how to open an Excel file inside of +VS Code. Once you have covered how to read files, show how to write files. Make sure to mention that `XLSX` will not override an existing file like `CSV` +would. Instead, you will get an error if you try to create a file that already exists. + +The last topic for `01-files.jl` is SAS files (.sasb7dat and .xpt), which can be read using the `readstat` function from the `ReadStatTables` package. +However, note that the current version of `ReadStatTables` only supports reading files, and write support is still experimental. + +Next, go over the contents of `02-select.jl`. First, discuss the `names` function, which allows us to obtain a `Vector` containing all the column +names of a `DataFrame`, which could be useful when working with `DataFrames` that have a large number of columns. After that, show the different alternatives +that there are to retrieve the contents of a single column (dot syntax such as `DataFrame.column_name` and indexing). Participants might be curious about the +difference between these two methods. If that is the case, you can explain that the dot syntax is simpler and more convenient to type, but that indexing is more +flexible and powerful. Additionally, some users could find the indexing syntax more intuitive, even if it is more verbose. + +Afterward, showcase how to select specific columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta`. This will be the first +time in the workshop in which attendees will use `DataFramesMeta`, so you can take this opportunity to provide a brief overview of the package and its +importance. Make sure to mention that `DataFramesMeta` imports the contents of `DataFrames`, so it's not necessary to import `DataFrames` if `DataFramesMeta` +has already been imported. Lastly, demonstrate the use of the `Not` operator as a means to specify the columns that we **don't** want to select, which might +be useful in cases where there is a large number of columns and we want to select most of them. + +Finally, cover the `@[r]subset` macro, which enables us to filter rows in a `DataFrame` based on specific conditions. Go over the differences between `@subset` +and `@rsubset` in detail, as this concept will be used in the scripts that follow. Finish this part of the lesson by going over the common use case of removing +rows with `missing` observations in a specific column. + +The next script in the workshop is `03-transform.jl`, which focuses on using the `@[r]transform` macro to create a new column in a `DataFrame` or modify an +existing one. Once again, it is important to explain the difference between the column and row versions of the macro (`@transform` and `@rtransform`, +respectively) and demonstrate how the latter provides a more convenient way of specifying column transformations whenever possible. + +After that, introduce the `@astable` macro, which enables accessing intermediate calculations within a `DataFramesMeta` macro call. This macro allows performing +operations on multiple columns simultaneously, making it easier to apply complex transformations and computations that would otherwise be challenging to write +and comprehend. + +Lastly, cover the mutating version of the macros, which allow direct modification of the original `DataFrame`. Make sure to explain that these macros can be +accessed by appending an exclamation mark (`!`) at the end of the macro call, such as `@[r]transform!` or `select!`. This feature is particularly handy when +there is a need to update or transform data in-place, eliminating the requirement for creating additional copies of the `DataFrame`. + +Move on to the `04-grouping.jl` script. Begin by showing the `groupby` function, which allows grouping data based on specific columns. Next, show the common +pattern of using `groupby` together with `@combine` to apply operations on grouped data and generate aggregated results. Make sure to go over the examples and +cover the cases where one or more columns are used to group data. One of the examples includes the use of the `@orderby` macro, so take this opportunity to +provide a detailed explanation of how it works. + +Once participants are comfortable with using `groupby` and `@combine`, you can introduce the `@by` macro, which provides a concise alternative to using +`groupby` and `@combine` by streamlining the process of grouping data and applying operations in a single call. Use the example provided in the script to show a +direct comparison between the methods and mention how using `@by` simplifies the code and enhances readability. + +The last script of the workshop is `05-chaining.jl`. This script provides two examples of how to use the `@chain` macro to perform all data wrangling operations +in a single block. Go over the examples and highlight how it can be more convenient than applying all the data wrangling operations separately. Some important +points to mention here are that it is not necessary to pass the `DataFrame` as an argument inside the `@chain` block, and that it is not restricted to including +`DataFramesMeta` macros (it can also include functions from `DataFrames` such as `rename`). ## Get in touch From 70452cfebb069059b47cbbe4c14f2c02e7b94a0a Mon Sep 17 00:00:00 2001 From: jotas6 Date: Thu, 29 Jun 2023 11:34:49 -0500 Subject: [PATCH 11/22] Updated workshop description --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 9b83a0b..291239a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,6 +1,6 @@ --- title: Pumas-AI Data Wrangling Workshop -description: Data wrangling workshop covering data I/O and the use of DataFramesMeta. +description: Template for data wrangling workshop covering data I/O and the use of DataFramesMeta. --- [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) From dd925741ad4c72326cfb662338b4f5f4fb647181 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Fri, 30 Jun 2023 20:48:14 -0500 Subject: [PATCH 12/22] Typo fix --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 291239a..c20daee 100644 --- a/docs/index.md +++ b/docs/index.md @@ -30,7 +30,7 @@ This workshop is an **introduction to data wrangling in Julia** with a focus on We recommend users being familiar with Julia syntax, especially variables and types. The formal requirements are the [Julia Syntax Workshop](https://pumasai-labs.github.io/Julia-Workshop/) - and it's prerequisites. + and its pre-requisites. ## Schedule From c00eac2b72e81c510c245ecbf558e184441d5c35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Juan=20Jos=C3=A9=20Gonz=C3=A1lez=20Oneto?= <80299581+jotas6@users.noreply.github.com> Date: Tue, 11 Jul 2023 13:37:15 -0500 Subject: [PATCH 13/22] Apply suggestions from code review Co-authored-by: Jose Storopoli --- 01-files.jl | 8 +++++--- 02-select.jl | 4 ++-- 03-transform.jl | 2 +- 04-grouping.jl | 4 ++-- docs/index.md | 6 +++--- docs/instructors.md | 18 +++++++++--------- docs/reference.md | 24 ++++++++++++------------ 7 files changed, 34 insertions(+), 32 deletions(-) diff --git a/01-files.jl b/01-files.jl index 38bd954..00e46c9 100644 --- a/01-files.jl +++ b/01-files.jl @@ -5,6 +5,8 @@ using CSV using DataFrames # Note: go to the workshop directory before reading the CSV file +# by right-clicking on the desired directory and selecting +# `Julia: Change to this directory df = CSV.read("demographics.csv", DataFrame) # read(, ) # Writing files @@ -32,11 +34,11 @@ readlines("demographics_eu.csv")[1:3] readlines("demographics.csv")[1:3] # Standard format # - delim: CSV files are separated by commas most of the time, but sometimes other -# characters like ";" or "\t" are used. +# characters like ';' or '\t' are used. CSV.read("demographics_eu.csv", DataFrame; delim = ';') # Works, but the numbers were parsed as strings # - decimal: if the file contains Floats and they are separated by something different than -# "." (e.g 3.14), you must specify which character is used. If you ever need to use this, +# '.' (e.g 3.14), you must specify which character is used. If you ever need to use this, # it will probably be because decimals are separated by commas (e.g 3,14) CSV.read("demographics_eu.csv", DataFrame; delim = ';', decimal = ',') @@ -64,7 +66,7 @@ DataFrame(XLSX.readtable("demographics.xlsx", 1)) # We get the first sheet DataFrame(XLSX.readtable("data/demographics.xlsx", "Sheet1")) # Allow XLSX to infer types (columns will be Any by default) -DataFrame(XLSX.readtable("demographics.xlsx", "Sheet1"; infer_eltypes=true)) +DataFrame(XLSX.readtable("demographics.xlsx", "Sheet1"; infer_eltypes=true)) # You will most definitely want to infer the columns types # Writing files XLSX.writetable("demographics_new.xlsx", renamed_df) # Same syntax as CSV.write (, ) diff --git a/02-select.jl b/02-select.jl index f3e3d41..ac75bed 100644 --- a/02-select.jl +++ b/02-select.jl @@ -45,14 +45,14 @@ df[4:16, All()] # Get rows 4 to 16 for all columns :WEIGHT .< 50 # Get subjects that weigh less than 50 kg end -## Tip: use rsubset instead of broadcasting everything (.>, .==, etc.) +## Tip: use @rsubset instead of broadcasting everything (.>, .==, etc.) @rsubset df begin :AGE > 60 :ISMALE == 1 :WEIGHT < 50 end -## You don't always want to use rsubset +## You don't always want to use @rsubset @rsubset df :WEIGHT > mean(:WEIGHT) @subset df :WEIGHT .> mean(:WEIGHT) diff --git a/03-transform.jl b/03-transform.jl index 3fb875b..b4e607d 100644 --- a/03-transform.jl +++ b/03-transform.jl @@ -31,7 +31,7 @@ df # Our original DataFrame remains unchanged @rtransform! df :SEX = :ISMALE == 0 ? "Female" : "Male" # Use ! at the end to modify the source df # Watch out: we lost the original DataFrame (we would have to reread our source file) -## Tip: this works for all of DataFramesMeta's macros +## Tip: this works for all of DataFramesMeta.jl's macros @rsubset! df :SEX == "Female" df # Now we only have female subjects diff --git a/04-grouping.jl b/04-grouping.jl index a0f4b83..afe0da5 100644 --- a/04-grouping.jl +++ b/04-grouping.jl @@ -4,11 +4,11 @@ df = CSV.read("demographics.csv", DataFrame) # Load a fresh copy of our dataset # The groupby function groupby(df, :ISMALE) # Group subjects according to sex -## More complicated example: transform + group +## More complicated example: @transform + groupby @rtransform! df :WEIGHT_cat = :WEIGHT > 70 ? "Over 70 kg" : "Under 70 kg" groupby(df, :WEIGHT_cat) -## Tip: groupby can take multiple columns +## Tip: groupby can take multiple columns as grouping keys groupby(df, [:ISMALE, :WEIGHT_cat]) # Now we get 4 groups # Combining (@combine) diff --git a/docs/index.md b/docs/index.md index c20daee..cf6f4e0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,12 +5,12 @@ description: Template for data wrangling workshop covering data I/O and the use [![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](http://creativecommons.org/licenses/by-sa/4.0/) -This workshop is an **introduction to data wrangling in Julia** with a focus on data I/O and `DataFramesMeta`. We will cover the following topics: +This workshop is an **introduction to data wrangling in Julia** with a focus on data I/O and `DataFramesMeta.jl`. We will cover the following topics: - **Reading and writing data**: - CSV files - - Excel (.xlsx) files - - SAS (.sasb7dat and .xpt) + - Excel (`.xlsx`) files + - SAS (`.sasb7dat` and `.xpt`) - **Select**: - Selecting specific columns and rows using `@select` and `@subset` macros diff --git a/docs/instructors.md b/docs/instructors.md index bb1c98a..292fe46 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -11,15 +11,15 @@ how to enter the `shell>` mode in the REPL by typing `;`. Next, focus on the CSV in-depth explanation of how to read and write CSV files to the present working directory and to a different data folder. One of the examples provided involves using the `rename` function, so make sure to go over how it can be used to change column names in a `DataFrame`. -Next, go over the use of the `XLSX` package to read Excel files. Start by explaining how to read an Excel file using `XLSX.readtable`, emphasizing that it is +Next, go over the use of the `XLSX.jl` package to read Excel files. Start by explaining how to read an Excel file using `XLSX.readtable`, emphasizing that it is required to provide the sheet name as an argument and that most of the time, you will want to convert the output from `XLSX.readtable` to a `DataFrame`. There may be questions about what to do if the user doesn't know the sheet names, which you can address by showing how to use `XLSX.readxlsx` and `XLSX.sheetnames` to obtain a list of sheet names in an Excel file. You might also find it useful to demonstrate how to open an Excel file inside of -VS Code. Once you have covered how to read files, show how to write files. Make sure to mention that `XLSX` will not override an existing file like `CSV` +VS Code. Once you have covered how to read files, show how to write files. Make sure to mention that `XLSX.jl` will not override an existing file like `CSV.jl` would. Instead, you will get an error if you try to create a file that already exists. -The last topic for `01-files.jl` is SAS files (.sasb7dat and .xpt), which can be read using the `readstat` function from the `ReadStatTables` package. -However, note that the current version of `ReadStatTables` only supports reading files, and write support is still experimental. +The last topic for `01-files.jl` is SAS files (`.sasb7dat` and `.xpt`), which can be read using the `readstat` function from the `ReadStatTables.jl` package. +However, note that the current version of `ReadStatTables.jl` only supports reading files, and write support is still experimental. Next, go over the contents of `02-select.jl`. First, discuss the `names` function, which allows us to obtain a `Vector` containing all the column names of a `DataFrame`, which could be useful when working with `DataFrames` that have a large number of columns. After that, show the different alternatives @@ -27,9 +27,9 @@ that there are to retrieve the contents of a single column (dot syntax such as ` difference between these two methods. If that is the case, you can explain that the dot syntax is simpler and more convenient to type, but that indexing is more flexible and powerful. Additionally, some users could find the indexing syntax more intuitive, even if it is more verbose. -Afterward, showcase how to select specific columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta`. This will be the first -time in the workshop in which attendees will use `DataFramesMeta`, so you can take this opportunity to provide a brief overview of the package and its -importance. Make sure to mention that `DataFramesMeta` imports the contents of `DataFrames`, so it's not necessary to import `DataFrames` if `DataFramesMeta` +Afterward, showcase how to select specific columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta.jl`. This will be the first +time in the workshop in which attendees will use `DataFramesMeta.jl`, so you can take this opportunity to provide a brief overview of the package and its +importance. Make sure to mention that `DataFramesMeta.jl` imports the contents of `DataFrames.jl`, so it's not necessary to import `DataFrames.jl` if `DataFramesMeta.jl` has already been imported. Lastly, demonstrate the use of the `Not` operator as a means to specify the columns that we **don't** want to select, which might be useful in cases where there is a large number of columns and we want to select most of them. @@ -41,7 +41,7 @@ The next script in the workshop is `03-transform.jl`, which focuses on using the existing one. Once again, it is important to explain the difference between the column and row versions of the macro (`@transform` and `@rtransform`, respectively) and demonstrate how the latter provides a more convenient way of specifying column transformations whenever possible. -After that, introduce the `@astable` macro, which enables accessing intermediate calculations within a `DataFramesMeta` macro call. This macro allows performing +After that, introduce the `@astable` macro, which enables accessing intermediate calculations within a `DataFramesMeta.jl` macro call. This macro allows performing operations on multiple columns simultaneously, making it easier to apply complex transformations and computations that would otherwise be challenging to write and comprehend. @@ -61,7 +61,7 @@ direct comparison between the methods and mention how using `@by` simplifies the The last script of the workshop is `05-chaining.jl`. This script provides two examples of how to use the `@chain` macro to perform all data wrangling operations in a single block. Go over the examples and highlight how it can be more convenient than applying all the data wrangling operations separately. Some important points to mention here are that it is not necessary to pass the `DataFrame` as an argument inside the `@chain` block, and that it is not restricted to including -`DataFramesMeta` macros (it can also include functions from `DataFrames` such as `rename`). +`DataFramesMeta.jl` macros (it can also include functions from `DataFrames.jl` such as `rename`). ## Get in touch diff --git a/docs/reference.md b/docs/reference.md index aca3b60..ce7357b 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -17,33 +17,33 @@ and you can navigate to another directory using `cd`. the character used in your file. - In some regions, commas are used to separate decimals instead of dots (e.g., `3,14` instead of `3.14`). In such cases, columns containing `Float`s will be interpreted as `String`s. To avoid this, you can use the `decimal` keyword argument. -- The `XLSX` package enables reading and writing of Excel files (.xlsx). To read a file, you can use the `XLSX.readtable` function, and to write a file, +- The `XLSX.jl` package enables reading and writing of Excel files (`.xlsx`). To read a file, you can use the `XLSX.readtable` function, and to write a file, you can use `XLSX.writetable`. - When using `XLSX.readtable`, you need to specify the sheet you want to read from since Excel files can have multiple sheets. If you are unsure about the sheets in the Excel file, you can use `XLSX.readxlsx` and `XLSX.sheetnames` to obtain a `Vector` containing all the sheet names. -- SAS files (.sasb7dat and .xpt) can be read using the `readstat` function provided by the `ReadStatTables` package. -- Currently, `ReadStatTables` only supports reading files. Write support is experimental and not fully developed. +- SAS files (`.sasb7dat` and `.xpt`) can be read using the `readstat` function provided by the `ReadStatTables.jl` package. +- Currently, `ReadStatTables.jl` only supports reading files. Write support is experimental and not fully developed. - You can read and write files from different locations by providing the full or relative path instead of just the file name. For more information on specifying robust and complex file paths, refer to the [Filesystem](https://docs.julialang.org/en/v1/base/file/#Filesystem) section in the Julia documentation. - To obtain a `Vector` with all the column names of a `DataFrame`, you can use the `names` function. This is particularly useful when examining `DataFrames` with a large number of columns. -- `DataFramesMeta` imports `DataFrames`, allowing you to import only `DataFramesMeta` and still have access to the functions from `DataFrames`. -- You can select a group of columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta`. +- `DataFramesMeta.jl` imports `DataFrames.jl`, allowing you to import only `DataFramesMeta.jl` and still have access to the functions from `DataFrames.jl`. +- You can select a group of columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta.jl`. - Instead of specifying which columns you want to select, you can specify the columns that you **don't** want to select using the `Not` operator, which need to be called with `$()` (e.g. `@select $(Not(column_name))`). - You can select the rows in a `DataFrame` that satisfy a condition using the `@[r]subset` macro. -- The row version of a `DataFramesMeta` macro can be accessed by adding an `r` before the macro name (e.g., `@rsubset`, `@rtransform`, etc.). +- The row version of a `DataFramesMeta.jl` macro can be accessed by adding an `r` before the macro name (e.g., `@rsubset`, `@rtransform`, etc.). These versions are useful as they eliminate the need to broadcast all operations inside the call, but there are cases where it is not possible to do so. - To remove rows that have `missing` values in a column, you can use `@rsubset !ismissing(:column_name)`. - The `@[r]transform` macro allows you to create a new column or modify an existing one. -- The `@astable` macro enables access to intermediate calculations within a `DataFramesMeta` macro call and allows operations on multiple columns simultaneously. +- The `@astable` macro enables access to intermediate calculations within a `DataFramesMeta.jl` macro call and allows operations on multiple columns simultaneously. - By appending `!` at the end of a macro call (e.g., `@[r]transform!` or `select!`), you can modify the original `DataFrame` instead of creating a new one. - The `groupby` function is used to group data in a `DataFrame` based on specific columns. When used together with `@combine`, it enables applying operations on grouped data and generating new aggregated results. - The `@by` macro provides a concise alternative to using `groupby` and `@combine`. It allows grouping data and applying operations in a single call. - Including `length(:column)` in a `@combine` or `@by` call will return the number of rows in each grouped `DataFrame` as part of the aggregated results. The column name used does not affect the results. -- You can perform all your data wrangling operations in a single block using `@chain`. This block can include both `DataFramesMeta` macros and functions +- You can perform all your data wrangling operations in a single block using `@chain`. This block can include both `DataFramesMeta.jl` macros and functions such as `rename`. Additionally, `@chain` passes the `DataFrame` as an argument to every function and macro call. For example, inside a `@chain` block, you can write `@groupby ` instead of `@groupby `. @@ -64,7 +64,7 @@ you can write `@groupby ` instead of `@groupby `. | Get the column names of a `DataFrame` | `names()` | | | | Get the values from a `DataFrame`'s column | `DataFrame.column_name` or `DataFrame[!, column_name]` | The dot syntax is more readable and easier to type, but the indexing syntax could be more intuitive for some users | | Select one or more columns from a `DataFrame` | `@select column1 column2 ...` | Can also be done through indexing, but the `@select` macro is more convenient and expressive | -| Use the row version of a `DataFramesMeta` macro | `@r` (e.g `@rsubset`, `@rtransform`, etc.) | | +| Use the row version of a `DataFramesMeta.jl` macro | `@r` (e.g `@rsubset`, `@rtransform`, etc.) | | | Filter rows in a `DataFrame` using a boolean expression | `@[r]subset ` | | | Determine whether a variable is of `Type` `Missing` | `ismissing()` | Can be used with `@[r]subset` to remove missing values from a `DataFrame` | | Create or modify a column | `@[r]transform ` | The expression is written in the assignment form (e.g. `:column_name = `). If you want to create a new column, then the assignment should be for a column name that doesn't exist in the `DataFrame`. If you use an existing column name, `@[r]transform` will modify that column. | @@ -84,11 +84,11 @@ and commas (`,`) to separate values (although other characters such as `;` can a `DataFrame` -: `DataFrame`s are a versatile and widely used data structure that represents tabular data. You can use them in Julia through the `DataFrames` package. +: `DataFrame`s are a versatile and widely used data structure that represents tabular data. You can use them in Julia through the `DataFrames.jl` package. -`DataFramesMeta` +`DataFramesMeta.jl` -: A powerful package in Julia that extends the functionality of `DataFrames`, enabling advanced data manipulation and transformation. +: A powerful package in Julia that extends the functionality of `DataFrames.jl`, enabling advanced data manipulation and transformation. It provides a concise and expressive syntax for defining data transformations through the use of macros. ## Get in touch From e1ffaaff761f8f4b408df159cb4571035afcf26a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Juan=20Jos=C3=A9=20Gonz=C3=A1lez=20Oneto?= <80299581+jotas6@users.noreply.github.com> Date: Tue, 11 Jul 2023 13:40:31 -0500 Subject: [PATCH 14/22] Update docs/reference.md Co-authored-by: Jose Storopoli --- docs/reference.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference.md b/docs/reference.md index ce7357b..65ba744 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -26,7 +26,7 @@ about the sheets in the Excel file, you can use `XLSX.readxlsx` and `XLSX.sheetn - You can read and write files from different locations by providing the full or relative path instead of just the file name. For more information on specifying robust and complex file paths, refer to the [Filesystem](https://docs.julialang.org/en/v1/base/file/#Filesystem) section in the Julia documentation. - To obtain a `Vector` with all the column names of a `DataFrame`, you can use the `names` function. This is particularly useful when examining -`DataFrames` with a large number of columns. +`DataFrame`s with a large number of columns. - `DataFramesMeta.jl` imports `DataFrames.jl`, allowing you to import only `DataFramesMeta.jl` and still have access to the functions from `DataFrames.jl`. - You can select a group of columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta.jl`. - Instead of specifying which columns you want to select, you can specify the columns that you **don't** want to select using the `Not` operator, From 2653dc0aad12db340555d5537c8a04cb12888e5f Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 11 Jul 2023 14:45:44 -0500 Subject: [PATCH 15/22] Renamed `02-select.jl` to `02-select_subset.jl` --- 02-select.jl => 02-select_subset.jl | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 02-select.jl => 02-select_subset.jl (100%) diff --git a/02-select.jl b/02-select_subset.jl similarity index 100% rename from 02-select.jl rename to 02-select_subset.jl From b9a3718fc42b266d5539c24f1bf0d0637407605e Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 11 Jul 2023 15:19:50 -0500 Subject: [PATCH 16/22] Added note about right click + change directory --- docs/instructors.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/instructors.md b/docs/instructors.md index 292fe46..c286553 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -6,10 +6,12 @@ title: Instructor's Notes for Pumas-AI Data Wrangling Workshop Start with `01-files.jl`, which covers file handling in Julia. Begin by emphasizing the significance of working in the correct directory before reading or writing data and how omitting this consideration could lead to errors. Show how to use the `pwd` function to verify the present -working directory and how to use `cd` to navigate to another directory if needed. If there are participants who know how to use shell commands, you can mention -how to enter the `shell>` mode in the REPL by typing `;`. Next, focus on the CSV format. Make sure to highlight the importance of this format and provide an -in-depth explanation of how to read and write CSV files to the present working directory and to a different data folder. One of the examples provided involves -using the `rename` function, so make sure to go over how it can be used to change column names in a `DataFrame`. +working directory and how to use `cd` to navigate to another directory if needed. Some users might find it more convenient to right click on the file and use +the `Julia: Change to This Directory` option, which will automatically move the Julia REPL to the directory containing the selected file. If there are +participants who know how to use shell commands, you can mention how to enter the `shell>` mode in the REPL by typing `;`. Next, focus on the CSV format. Make +sure to highlight the importance of this format and provide an in-depth explanation of how to read and write CSV files to the present working directory and +to a different data folder. One of the examples provided involves using the `rename` function, so make sure to go over how it can be used to change column names +in a `DataFrame`. Next, go over the use of the `XLSX.jl` package to read Excel files. Start by explaining how to read an Excel file using `XLSX.readtable`, emphasizing that it is required to provide the sheet name as an argument and that most of the time, you will want to convert the output from `XLSX.readtable` to a `DataFrame`. From 6104e0827aeb761f1fbf70b8ae565388d8a70780 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 11 Jul 2023 15:24:44 -0500 Subject: [PATCH 17/22] Added note about Office Viewer --- docs/instructors.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/instructors.md b/docs/instructors.md index c286553..6bcb19c 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -17,8 +17,9 @@ Next, go over the use of the `XLSX.jl` package to read Excel files. Start by exp required to provide the sheet name as an argument and that most of the time, you will want to convert the output from `XLSX.readtable` to a `DataFrame`. There may be questions about what to do if the user doesn't know the sheet names, which you can address by showing how to use `XLSX.readxlsx` and `XLSX.sheetnames` to obtain a list of sheet names in an Excel file. You might also find it useful to demonstrate how to open an Excel file inside of -VS Code. Once you have covered how to read files, show how to write files. Make sure to mention that `XLSX.jl` will not override an existing file like `CSV.jl` -would. Instead, you will get an error if you try to create a file that already exists. +VS Code (using the Office Viewer extension, which is installed by default in JuliaHub). Once you have covered how to read files, show how to write files. Make +sure to mention that `XLSX.jl` will not override an existing file like `CSV.jl` would. Instead, you will get an error if you try to create a file that +already exists. The last topic for `01-files.jl` is SAS files (`.sasb7dat` and `.xpt`), which can be read using the `readstat` function from the `ReadStatTables.jl` package. However, note that the current version of `ReadStatTables.jl` only supports reading files, and write support is still experimental. From 2e4952d5d5383843aaf6dec194f970eba4f5b0a2 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 11 Jul 2023 15:27:17 -0500 Subject: [PATCH 18/22] Changed "Combining" title to "Summarizing" --- 04-grouping.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/04-grouping.jl b/04-grouping.jl index afe0da5..c6c64b4 100644 --- a/04-grouping.jl +++ b/04-grouping.jl @@ -11,7 +11,7 @@ groupby(df, :WEIGHT_cat) ## Tip: groupby can take multiple columns as grouping keys groupby(df, [:ISMALE, :WEIGHT_cat]) # Now we get 4 groups -# Combining (@combine) +# Summarizing (@combine) ## A common thing to do after grouping data is to combine it back with some operation. # Example: mean age for each sex group From b3d6311a693e3f90788cdc54ee1ea11cf7614a62 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Tue, 11 Jul 2023 16:16:41 -0500 Subject: [PATCH 19/22] Added new entries to the glossary --- docs/reference.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/reference.md b/docs/reference.md index 65ba744..f2d89e5 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -82,10 +82,30 @@ CSV files : CSV stands for **C**omma-**S**eparated **V**alues. It is a popular file format that uses lines to represent rows (observations) and commas (`,`) to separate values (although other characters such as `;` can also be used). +Sink (from `CSV.read`) + +: It is the second positional argument from `CSV.read` and is used to specify where to store or materialize the parsed data from the CSV file. +Most of the time you will want to set use a `DataFrame` (`CSV.read(, DataFrame)`) + +Excel + +: Excel is a widely used spreadsheet program developed by Microsoft. Excel files typically have the `.xls` and `.xlsx` extensions, but the `.xlsx` extension +should be preferred. + +SAS data files + +: Data format used and created by the SAS statistical software. They come in two common extensions: `.sas7bdat` and `.xpt`. These files can be read in Julia +using the `ReadStatTables.jl` package. + `DataFrame` : `DataFrame`s are a versatile and widely used data structure that represents tabular data. You can use them in Julia through the `DataFrames.jl` package. +`DataFrames.jl` + +: Julia package that allows working with `DataFrames` in Julia. It has a similar design and functionality to other well-known packages such as +[`pandas`](https://pandas.pydata.org/) from Python or [`dplyr`](https://dplyr.tidyverse.org/) from R. + `DataFramesMeta.jl` : A powerful package in Julia that extends the functionality of `DataFrames.jl`, enabling advanced data manipulation and transformation. From 618dff11076cb724762f1f07a60e5a997f5c4c6d Mon Sep 17 00:00:00 2001 From: Juan Jose Gonzalez Oneto Date: Tue, 11 Jul 2023 21:37:35 +0000 Subject: [PATCH 20/22] Included explanation about using `!` and `:` when indexing a `DataFrame` --- 02-select_subset.jl | 3 +++ docs/instructors.md | 4 +++- docs/reference.md | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/02-select_subset.jl b/02-select_subset.jl index ac75bed..7cca938 100644 --- a/02-select_subset.jl +++ b/02-select_subset.jl @@ -11,6 +11,9 @@ df.WEIGHT df[!, "AGE"] # Indexing, as if it was a matrix df[!, "WEIGHT"] +## Tip: get a copy of the column (instead of the actual column) +df[:, "AGE"] # If you modify this, you won't be modifying the original DataFrame + ## Get multiple columns df[!, ["AGE", "WEIGHT"]] # This gets messy quickly diff --git a/docs/instructors.md b/docs/instructors.md index 6bcb19c..5a53a40 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -28,7 +28,9 @@ Next, go over the contents of `02-select.jl`. First, discuss the `names` functio names of a `DataFrame`, which could be useful when working with `DataFrames` that have a large number of columns. After that, show the different alternatives that there are to retrieve the contents of a single column (dot syntax such as `DataFrame.column_name` and indexing). Participants might be curious about the difference between these two methods. If that is the case, you can explain that the dot syntax is simpler and more convenient to type, but that indexing is more -flexible and powerful. Additionally, some users could find the indexing syntax more intuitive, even if it is more verbose. +flexible and powerful. Additionally, some users could find the indexing syntax more intuitive, even if it is more verbose. When +going over indexing, make sure to explain the difference between using `!` and using `:` to retrieve all rows from a column (`!` +returns the column, while `:` returns a copy of it). Afterward, showcase how to select specific columns from a `DataFrame` using the `@select` macro provided by `DataFramesMeta.jl`. This will be the first time in the workshop in which attendees will use `DataFramesMeta.jl`, so you can take this opportunity to provide a brief overview of the package and its diff --git a/docs/reference.md b/docs/reference.md index f2d89e5..88bea32 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -62,7 +62,7 @@ you can write `@groupby ` instead of `@groupby `. | Inspect the sheet names of an Excel file | `XLSX.readxlsx()` and `XLSX.sheetnames()` (optional) | The result of `XLSX.readxlsx` will print a table containing the sheet names. You can optionally then run `XLSX.sheetnames` on the result of `readxlsx` to get a `Vector` with all the sheet names | | Read a SAS file (.sasb7dat and .xpt) | `DataFrame(readstat())` | | | | Get the column names of a `DataFrame` | `names()` | | | -| Get the values from a `DataFrame`'s column | `DataFrame.column_name` or `DataFrame[!, column_name]` | The dot syntax is more readable and easier to type, but the indexing syntax could be more intuitive for some users | +| Get the values from a `DataFrame`'s column | `DataFrame.column_name`, `DataFrame[!, column_name]` or `DataFrame[:, column_name]` | The dot syntax is more readable and easier to type, but the indexing syntax could be more intuitive for some users. Using `:` when indexing returns a copy of the column, while using `!` returns the original column from the `DataFrame` (you could use the result of indexing with `!` to modify the source `DataFrame`) | | Select one or more columns from a `DataFrame` | `@select column1 column2 ...` | Can also be done through indexing, but the `@select` macro is more convenient and expressive | | Use the row version of a `DataFramesMeta.jl` macro | `@r` (e.g `@rsubset`, `@rtransform`, etc.) | | | Filter rows in a `DataFrame` using a boolean expression | `@[r]subset ` | | From ac976b833f1221bcca3ea49c1e637b5e9e7013f4 Mon Sep 17 00:00:00 2001 From: jotas6 Date: Fri, 14 Jul 2023 15:23:28 -0500 Subject: [PATCH 21/22] Added note about `GroupedDataFrame`s --- docs/instructors.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/instructors.md b/docs/instructors.md index 5a53a40..d51a8c3 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -54,10 +54,12 @@ Lastly, cover the mutating version of the macros, which allow direct modificatio accessed by appending an exclamation mark (`!`) at the end of the macro call, such as `@[r]transform!` or `select!`. This feature is particularly handy when there is a need to update or transform data in-place, eliminating the requirement for creating additional copies of the `DataFrame`. -Move on to the `04-grouping.jl` script. Begin by showing the `groupby` function, which allows grouping data based on specific columns. Next, show the common -pattern of using `groupby` together with `@combine` to apply operations on grouped data and generate aggregated results. Make sure to go over the examples and -cover the cases where one or more columns are used to group data. One of the examples includes the use of the `@orderby` macro, so take this opportunity to -provide a detailed explanation of how it works. +Move on to the `04-grouping.jl` script. Begin by showing the `groupby` function, which allows grouping data based on specific columns. If users are curious +about the return values of `groupby`, you can mention that it returns a `GroupedDataFrame`, which can be inspected through indexing and manipulated with +`transform` and `select` (you can find more details about it in [`groupby`'s documentation](https://dataframes.juliadata.org/stable/lib/functions/#DataFrames.groupby)). Next, +show the common pattern of using `groupby` with `@combine` to apply operations on grouped data and generate aggregated results. Make sure to go over +the examples and cover the cases where one or more columns are used to group data. One of the examples includes the use of the `@orderby` macro, so take this +opportunity to provide a detailed explanation of how it works. Once participants are comfortable with using `groupby` and `@combine`, you can introduce the `@by` macro, which provides a concise alternative to using `groupby` and `@combine` by streamlining the process of grouping data and applying operations in a single call. Use the example provided in the script to show a From 8f0c384a1e87a8c8bd2a5db9b8a87aafea702cff Mon Sep 17 00:00:00 2001 From: jotas6 Date: Fri, 14 Jul 2023 16:22:11 -0500 Subject: [PATCH 22/22] Update mentions to `02-select_subset.jl` script mend --- docs/index.md | 2 +- docs/instructors.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/index.md b/docs/index.md index cf6f4e0..f8bf0b6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,7 +38,7 @@ This workshop is an **introduction to data wrangling in Julia** with a focus on |--------------|--------------------------|---------------------------------------------| | 00:00 | Setup | Download files required for the workshop | | 00:25 | Reading and writing data | Showcase `01-files.jl` | -| 00:40 | Select | Showcase `02-select.jl` | +| 00:40 | Select | Showcase `02-select_subset.jl` | | 00:50 | Transform | Showcase `03-transform.jl` | | 01:00 | Grouping and combining | Showcase `04-grouping.jl` | | 01:05 | Chaining | Showcase `05-chaining.jl` | diff --git a/docs/instructors.md b/docs/instructors.md index d51a8c3..7958994 100644 --- a/docs/instructors.md +++ b/docs/instructors.md @@ -24,7 +24,7 @@ already exists. The last topic for `01-files.jl` is SAS files (`.sasb7dat` and `.xpt`), which can be read using the `readstat` function from the `ReadStatTables.jl` package. However, note that the current version of `ReadStatTables.jl` only supports reading files, and write support is still experimental. -Next, go over the contents of `02-select.jl`. First, discuss the `names` function, which allows us to obtain a `Vector` containing all the column +Next, go over the contents of `02-select_subset.jl`. First, discuss the `names` function, which allows us to obtain a `Vector` containing all the column names of a `DataFrame`, which could be useful when working with `DataFrames` that have a large number of columns. After that, show the different alternatives that there are to retrieve the contents of a single column (dot syntax such as `DataFrame.column_name` and indexing). Participants might be curious about the difference between these two methods. If that is the case, you can explain that the dot syntax is simpler and more convenient to type, but that indexing is more