-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
115 additions
and
8 deletions.
There are no files selected for viewing
8 changes: 5 additions & 3 deletions
8
_freeze/chapters/01_getting-started/execute-results/html.json
Large diffs are not rendered by default.
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
_freeze/chapters/01_getting-started/execute-results/tex.json
Large diffs are not rendered by default.
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
_freeze/chapters/02_data-structures/execute-results/tex.json
Large diffs are not rendered by default.
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
_freeze/chapters/03_exploring-data/execute-results/tex.json
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
21 changes: 21 additions & 0 deletions
21
_freeze/chapters/04_organizing_code/execute-results/tex.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
{ | ||
"hash": "9520320be7ba4cfd7fe8166742e591b8", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "# Organizing Code\n\n:::{.callout-note}\n##### Learning Goals\n\nAfter completing this chapter, learners should be able to:\n\n* Create code that only runs when a condition is satisfied\n* Create custom functions in order to organize and reuse code\n:::\n\nBy now, you've learned all of the basic skills necessary to explore a data set\nin R. The focus of this chapter is how to organize your code so that it's\nconcise, clear, and easy to automate. This will help you and your collaborators\navoid tedious, redundant work, reproduce results efficiently, and run code in\nspecialized environments for scientific computing, such as high-performance\ncomputing clusters.\n\n\nConditional Expressions\n-----------------------\n\nSometimes you'll need code to do different things, depending on a condition.\n**If-expressions** provide a way to write conditional code.\n\nFor example, suppose we want to greet one person differently from the others:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = \"Nick\"\nif (name == \"Nick\") {\n # If name is Nick:\n message(\"We went down the TRUE branch\")\n msg = \"Hi Nick, nice to see you again!\"\n} else {\n # Anything else:\n msg = \"Nice to meet you!\"\n}\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWe went down the TRUE branch\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nIndent code inside of the if-expression by 2 or 4 spaces. Indentation makes\nyour code easier to read.\n\nThe condition in an if-expression has to be a scalar:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = c(\"Nick\", \"Susan\")\nif (name == \"Nick\") {\n msg = \"Hi Nick!\"\n} else {\n msg = \"Nice to meet you!\"\n}\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in if (name == \"Nick\") {: the condition has length > 1\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nYou can chain together if-expressions:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = \"Susan\"\nif (name == \"Nick\") {\n msg = \"Hi Nick, nice to see you again!\"\n} else if (name == \"Peter\") {\n msg = \"Go away Peter, I'm busy!\"\n} else {\n msg = \"Nice to meet you!\"\n}\nmsg\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Nice to meet you!\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nIf-expressions return the value of the last expression in the evaluated block:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = \"Tom\"\nmsg = if (name == \"Nick\") {\n \"Hi Nick, nice to see you again!\"\n} else {\n \"Nice to meet you!\"\n}\nmsg\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Nice to meet you!\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nCurly braces `{ }` are optional for single-line expressions:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = \"Nick\"\nmsg = if (name == \"Nick\") \"Hi Nick, nice to see you again!\" else\n \"Nice to meet you!\"\nmsg\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Hi Nick, nice to see you again!\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nBut you have to be careful if you don't use them:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# NO GOOD:\nmsg = if (name == \"Nick\")\n \"Hi Nick, nice to see you again!\"\nelse\n \"Nice to meet you!\"\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError: <text>:4:1: unexpected 'else'\n3: \"Hi Nick, nice to see you again!\"\n4: else\n ^\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nThe `else` block is optional:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmsg = \"Hi\"\nname = \"Tom\"\nif (name == \"Nick\")\n msg = \"Hi Nick, nice to see you again!\"\nmsg\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Hi\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nWhen there's no `else` block, the value of the `else` block is `NULL`:\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname = \"Tom\"\nmsg = if (name == \"Nick\")\n \"Hi Nick, nice to see you again!\"\nmsg\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nNULL\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n\nFunctions\n---------\n\nThe main way to interact with R is by calling functions, which was first\nexplained way back in @sec-calling-functions. Since then, you've learned how to\nuse many of R's built-in functions. This section explains how you can write\nyour own functions.\n\nTo start, let's briefly review what functions are, and some of the jargon\nassociated with them. It's useful to think of functions as factories: raw\nmaterials (inputs) go in, products (outputs) come out. We can also represent\nthis visually:\n\n![](/images/functions.png)\n\nProgrammers use several specific terms to describe the parts and usage of\nfunctions:\n\n* **Parameters** are placeholder variables for inputs.\n + **Arguments** are the actual values assigned to the parameters in a call.\n* The **return value** is the output.\n* The **body** is the code inside.\n* **Calling** a function means using a function to compute something.\n\nAlmost every command in R is a function, even the arithmetic operators and the\nparentheses! You can view the body of a function by typing its name without\ntrailing parentheses (in contrast to how you call functions). The body of a\nfunction is usually surrounded by curly braces `{}`, although they're optional\nif the body only contains one line of code. Indenting code inside of curly\nbraces by 2-4 spaces also helps make it visually distinct from other code.\n\nFor example, let's look at the body of the `append` function, which appends a\nvalue to the end of a list or vector:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nappend\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nfunction (x, values, after = length(x)) \n{\n lengx <- length(x)\n if (!after) \n c(values, x)\n else if (after >= lengx) \n c(x, values)\n else c(x[1L:after], values, x[(after + 1L):lengx])\n}\n<bytecode: 0x63d987cde588>\n<environment: namespace:base>\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nDon't worry if you can't understand everything the `append` function's code\ndoes yet. It will make more sense later on, after you've written a few\nfunctions of your own.\n\nMany of R's built-in functions are not entirely written in R code. You can spot\nthese by calls to the special `.Primitive` or `.Internal` functions in their\ncode.\n\nFor instance, the `sum` function is not written in R code:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nfunction (..., na.rm = FALSE) .Primitive(\"sum\")\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nThe `function` keyword creates a new function. Here's the syntax:\n\n```\nfunction(parameter1, parameter2, ...) {\n # Your code goes here\n\n # The result goes here\n}\n```\n\nA function can have any number of parameters, and will automatically return the\nvalue of the last line of its body.\n\nA function is a value, and like any other value, if you want to reuse it, you\nneed to assign it to variable. Choosing descriptive variable names is a good\nhabit. For functions, that means choosing a name that describes what the\nfunction does. It often makes sense to use verbs in function names.\n\nLet's write a function that gets the largest values in a vector. The inputs or\narguments to the function will be the vector in question and also the number of\nvalues to get. Let's call these `vec` and `n`, respectively. The result will be\na vector of the `n` largest elements. Here's one way to write the function:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_largest = function(vec, n) {\n sorted = sort(vec, decreasing = TRUE)\n head(sorted, n)\n}\n```\n:::\n\n\n\n\n\n\n\n\nThe name of the function, `get_largest`, describes what the function does and\nincludes a verb. If this function will be used frequently, a shorter name, such\nas `largest`, might be preferable (compare to the `head` function).\n\n\nAny time you write a function, the first thing you should do afterwards is test\nthat it actually works. Let's try the `get_largest` function on a few test\ncases:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx = c(1, 10, 20, -3)\nget_largest(x, 2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 20 10\n```\n\n\n:::\n\n```{.r .cell-code}\nget_largest(x, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 20 10 1\n```\n\n\n:::\n\n```{.r .cell-code}\ny = c(-1, -2, -3)\nget_largest(y, 2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] -1 -2\n```\n\n\n:::\n\n```{.r .cell-code}\nz = c(\"d\", \"a\", \"t\", \"a\", \"l\", \"a\", \"b\")\nget_largest(z, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"t\" \"l\" \"d\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nNotice that the parameters `vec` and `n` inside the function do not exist as\nvariables outside of the function:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in eval(expr, envir, enclos): object 'vec' not found\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nIn general, R keeps parameters and variables you define inside of a function\nseparate from variables you define outside of a function. You can read more\nabout the specific rules for how R searches for variables in DataLab's\n[Intermediate R reader][intermediate-r].\n\n[intermediate-r]: https://ucdavisdatalab.github.io/workshop_intermediate_r/\n\nAs a function for quickly summarizing data, `get_largest` would be more\nconvenient if the parameter `n` for the number of values to return was optional\n(again, compare to the `head` function). You can make the parameter `n`\noptional by setting a **default argument**: an argument assigned to the\nparameter if no argument is assigned in the call to the function. You can use\n`=` to assign default arguments to parameters when you define a function with\nthe `function` keyword. Here's a new definition of the function with the\ndefault `n = 5`:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_largest = function(vec, n = 5) {\n sorted = sort(vec, decreasing = TRUE)\n head(sorted, n)\n}\n```\n:::\n\n\n\n\n\n\n\n\nAfter making this change, it's a good idea to test the function again:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_largest(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 20 10 1 -3\n```\n\n\n:::\n\n```{.r .cell-code}\nget_largest(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] -1 -2 -3\n```\n\n\n:::\n\n```{.r .cell-code}\nget_largest(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"t\" \"l\" \"d\" \"b\" \"a\"\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n\n### Returning Values\n\nWe've already seen that a function will automatically return the value of its\nlast line.\n\nThe `return` keyword causes a function to return a result immediately, without\nrunning any subsequent code in its body. It only makes sense to use `return`\nfrom inside of an if-expression. If your function doesn't have any\nif-expressions, you don't need to use `return`.\n\nFor example, suppose you want the `get_largest` function to immediately return\n`NULL` if the argument for `vec` is a list. Here's the code, along with some\ntest cases:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_largest = function(vec, n = 5) {\n if (is.list(vec))\n return(NULL)\n\n sorted = sort(vec, decreasing = TRUE)\n head(sorted, n)\n}\n\nget_largest(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 20 10 1 -3\n```\n\n\n:::\n\n```{.r .cell-code}\nget_largest(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"t\" \"l\" \"d\" \"b\" \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nget_largest(list(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nNULL\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nAlternatively, you could make the function raise an error by calling the `stop`\nfunction. Whether it makes more sense to return `NULL` or print an error\ndepends on how you plan to use the `get_largest` function.\n\nNotice that the last line of the `get_largest` function still doesn't use the\n`return` keyword. It's idiomatic to only use `return` when strictly necessary.\n\nA function returns one R object, but sometimes computations have multiple\nresults. In that case, return the results in a vector, list, or other data\nstructure.\n\nFor example, let's make a function that computes the mean and median for a\nvector. We'll return the results in a named list, although we could also use a\nnamed vector:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncompute_mean_med = function(x) {\n m1 = mean(x)\n m2 = median(x)\n list(mean = m1, median = m2)\n}\ncompute_mean_med(c(1, 2, 3, 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n$mean\n[1] 1.75\n\n$median\n[1] 1.5\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nThe names make the result easier to understand for the caller of the function,\nalthough they certainly aren't required here.\n\n\n### Planning Your Functions\n\nBefore you write a function, it's useful to go through several steps:\n\n1. Write down what you want to do, in detail. It can also help to\n draw a picture of what needs to happen.\n\n2. Check whether there's already a built-in function. Search online and in the\n R documentation.\n\n3. Write the code to handle a simple case first. For data science\n problems, use a small dataset at this step.\n\nLet's apply this in one final example: a function that detects leap years. A\nyear is a leap year if either of these conditions is true:\n\n* It is divisible by 4 and not 100\n* It is divisible by 400\n\nThat means the years 2004 and 2000 are leap years, but the year 2200 is not.\nHere's the code and a few test cases:\n\n\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# If year is divisible by 4 and not 100 -> leap\n# If year is divisible by 400 -> leap\nyear = 2004\nis_leap = function(year) {\n if (year %% 4 == 0 & year %% 100 != 0) {\n leap = TRUE\n } else if (year %% 400 == 0) {\n leap = TRUE\n } else {\n leap = FALSE\n }\n leap\n}\nis_leap(400)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nis_leap(1997)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] FALSE\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\nFunctions are the building blocks for solving larger problems. Take a\ndivide-and-conquer approach, breaking large problems into smaller steps. Use a\nshort function for each step. This approach makes it easier to:\n\n* Test that each step works correctly.\n* Modify, reuse, or repurpose a step.\n\n\nExercises\n---------\n\n_These exercises are meant to challenge you, so they're quite difficult\ncompared to the previous ones. Don't get disheartened, and if you're able to\ncomplete them, excellent work!_\n\n\n### Exercise\n\nCreate a function `compute_day` which uses the [Doomsday algorithm][doomsday]\nto compute the day of week for any given date in the 1900s. The function's\nparameters should be `year`, `month`, and `day`. The function's return value\nshould be a day of week, as a string (for example, `\"Saturday\"`).\n\n_Hint: the modulo operator is `%%` in R._\n\n[doomsday]: https://en.wikipedia.org/wiki/Doomsday_rule\n", | ||
"supporting": [ | ||
"04_organizing_code_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": { | ||
"knitr": [ | ||
"{\"type\":\"list\",\"attributes\":{},\"value\":[]}" | ||
] | ||
}, | ||
"preserve": null, | ||
"postProcess": false | ||
} | ||
} |
Oops, something went wrong.