-
Notifications
You must be signed in to change notification settings - Fork 6
/
faq.qmd
225 lines (162 loc) · 11.7 KB
/
faq.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
title: "{{< fa circle-question >}} Frequently asked questions"
---
## How do I succeed in this class?
* Complete readings before the material is covered in class, and then review again afterwards.
* Participate actively in class. If you don't understand something, I can guarantee no one else does either. I have a Ph.D., and I've been doing this for more than 10 years. It's hard for me to remember what it's like to be you and what you don't know. Say something! I want you to learn this stuff, and I love to explain more carefully.
* Come to office hours. Again, I like explaining things.
* Try the Labs again without the help of your classmates.
* Read the examples at the end of the \[ISLR\] chapters. Try the exercises.
* Do not procrastinate --- don’t let a module go by with unanswered questions as it will just make the following module’s material even more difficult to follow.
* Do the [Worksheets](https://ubc-stat.github.io/stat-406-worksheets).
## Git and Github
### Homework/Labs workflow
**Rstudio version** (uses the Git tab. Usually near Environment/History in the upper right)
1. Make sure you are on `main`. Pull in remote changes. Click <i class="fas fa-arrow-down" style="color:blue"></i>.
1. Create a new branch by clicking the think that looks kinda like <i class="fas fa-code-branch" style="color:purple"></i>.
1. Work on your documents and save frequently.
1. Stage your changes by clicking the check boxes.
1. Commit your changes by clicking **Commit**.
1. Repeat 3-5 as necessary.
1. Push to Github <i class="fas fa-arrow-up" style="color:green"></i>
1. When done, go to Github and open a PR.
1. Use the dropdown menu to go back to `main` and avoid future headaches.
**Command line version**
1. (Optional, but useful. Pull in any remote changes.) `git pull`
1. Create a new branch `git branch -b <name-of-branch>`
1. Work on your documents and save frequently.
1. Stage your changes `git add <name-of-document1>` repeat for each changed document. `git add .` stages all changed documents.
1. Commit your changes `git commit -m "some message that is meaningful"`
1. Repeat 3-5 as necessary.
1. Push to Github `git push`. It may suggest a longer form of this command, obey.
1. When done, go to Github and open a PR.
1. Switch back to `main` to avoid future headaches. `git checkout main`.
### Asking for a HW regrade.
::: {.callout-tip}
## To be eligible
1. You must have received >3 points of deductions to be eligible.
1. And they must have been for "content", not penalties.
1. If you fix the errors, you can raise your grade to 7/10.
1. You must make revisions and re-request review within 1 week of your initial
review.
:::
1. Go to the your local branch for this HW. If you don't remember the right
name, you can check the PRs in your repo on GitHub by clicking "Pull Requests"
tab. It might be closed.
1. Make any changes you need to make to the files, commit and push. Make sure
to rerender the `.pdf` if needed.
1. Go to GitHub.com and find the original PR for this assignment. There should
now be additional commits since the previous Review.
1. Add a comment to the TA describing the changes you've made. Be concise and
clear.
1. Under "Reviewers" on the upper right of the screen, you should see a 🔁
button. Once you click that, the TA will be notified to review your changes.
### Fixing common problems
#### `master/main`
"master" has some pretty painful connotations. So as part of an effort to remove racist names from code, the default branch is now "main" on new versions of GitHub. But old versions (like the UBC version) still have "master". Below, I'll use "main", but if you see "master" on what you're doing, that's the one to use.
#### Start from main
Branches should be created from the `main` branch, not the one you used for the last assignment.
```bash
git checkout main
```
This switches to `main`. Then pull and start the new assignment following the workflow above. (In Rstudio, use the dropdown menu.)
#### You forgot to work on a new branch
Ugh, you did some labs before realizing you forgot to create a new branch. Don't stress. There are some things below to try. But if you're confused ASK. We've had practice with this, and soon you will too!
_(1) If you started from `main` and haven't made any commits (but you SAVED!!):_
```bash
git branch -b <new-branch-name>
```
This keeps everything you have and puts you on a new branch. No problem. Commit and proceed as usual.
_(2) If you are on `main` and made some commits:_
```bash
git branch <new-branch-name>
git log
```
The first line makes a new branch with all the stuff you've done. Then we look at the log. Locate the most recent commit before you started working. It's a long string like
`ac2a8365ce0fa220c11e658c98212020fa2ba7d1`. Then,
```bash
git reset ac2a8 --hard
```
This rolls `main` back to that commit. You don't need the whole string, just the first few characters. Finally
```bash
git checkout <new-branch-name>
```
and continue working.
_(3) If you started work on `<some-old-branch>` for work you already submitted:_
This one is harder, and I would suggest getting in touch with the TAs. Here's the procedure.
```bash
git commit -am "uhoh, I need to be on a different branch"
git branch <new-branch-name>
```
Commit your work with a dumb message, then create a new branch. It's got all your stuff.
```bash
git log
```
Locate the most recent commit before you started working. It's a long string like `ac2a8365ce0fa220c11e658c98212020fa2ba7d1`. Then,
```bash
git rebase --onto main ac2a8 <new-branch-name>
git checkout <new-branch-name>
```
This makes the new branch look like `main` but without the differences from `main` that are on `ac2a8` and WITH all the work you did after `ac2a8`. It's pretty cool. And should work. Finally, we switch to our new branch.
### How can I get better at R?
I get this question a lot. The answer is almost never "go read the book _How to learn R fast_" or "watch the video on _FreeRadvice.com_". To learn programming, the only thing to do is to program. Do your tutorialls. Redo your tutorials. Run through the code in the textbook. Ask yourself why we used one function instead of another. Ask questions. Play little coding games. If you find yourself wondering how some bit of code works, run through it step by step. Print out the results and see what it's doing. If you take on these kinds of tasks regularly, you will improve rapidly.
Coding is an _active_ activity just like learning Spanish. You have to practice constantly. For the same reasons that it is difficult/impossible to learn Spanish just from reading a textbook, it is difficult/impossible to learn R just from reading/watching.
When I took German in 7th grade, I remember my teacher saying "to learn a language, you have to constantly tell lies". What he meant was, you don't just say "yesterday I went to the gym". You say "yesterday I went to the market", "yesterday I went to the movies", "today she's going to the gym", etc. The point is to internalize conjugation, vocabulary, and the inner workings of the language. The same is true when coding. Do things different ways. Try automating regular tasks.
Recommended resources
* [Data Science: A first introduction](https://datasciencebook.ca) This is the course textbook for UBC's DSCI 100
* [R4DS](https://r4ds.had.co.nz) written by Hadley Wickham and Garrett Grolemund
* [DSCI 310 Coursenotes](https://ubc-dsci.github.io/reproducible-and-trustworthy-workflows-for-data-science/README.html) by Tiffany A. Timbers, Joel Ostblom, Florencia D’Andrea, and Rodolfo Lourenzutti
* [Happy Git with R](https://happygitwithr.com) by Jenny Bryan
* [Modern Dive: Statistical Inference via Data Science](https://moderndive.com)
* [Stat545](https://stat545.com)
* [Google](https://duckduckgo.com)
### My code doesn't run. What do I do?
This is a constant issue with code, and it happens to everyone. The following is a general workflow for debugging stuck code.
0. If the code is running, but not doing what you want, see [below](#how-to-write-good-code).
1. Read the Error message. It will give you some important hints. Sometimes these are hard to parse, but that's ok.
```{r}
#| error: true
#| collapse: true
set.seed(12345)
y <- rnorm(10)
x <- matrix(rnorm(20), 2)
linmod <- lm(y ~ x)
```
This one is a little difficult. The first stuff before the colon is telling me where the error happened, but I didn't use a function called `model.frame.default`. Nonetheless, after the colon it says `variable lengths differ`. Well `y` is length 10 and `x` has 10 rows right? Oh wait, how many rows does `x` have?
2. Read the documentation for the function in the error message. For the above, I should try `?matrix`.
3. Google!! If the first few steps didn't help, copy the error message into Google. This almost always helps. Best to remove any overly specific information first.
4. Ask your classmates Slack. In order to ask most effectively, you should probably provide them some idea of how the error happened. See the section on [MWEs](#minimal-working-examples) for how to do this.
5. See me or the TA. Note that it is highly likely that I will ask if you did the above steps first. And I will want to see your minimal working example (MWE).
::: {.callout-warning}
If you meet with me, be prepared to show me your code! Or message me your MWE. Or both. But not neither.
:::
If the error cannot be reproduced in my presence, it is very unlikely that I can fix it.
### Minimal working examples
An MWE is a small bit of code which will work on anyone's machine and reproduce the error that you are getting. This is a key component of getting help debugging. When you do your homework, there's lots of stuff going on that will differ from most other students. To allow them (or me, or the TA) to help you, you need to be able to get their machine to reproduce your error (and *only* your error) without much hassle.
I find that, in the process of preparing an MWE, I can often answer my own question. So it is a useful exercise even if you aren’t ready to call in the experts yet. The process of stripping your problem down to its bare essence often reveals where the root issue lies. My above code is an MWE: I set a seed, so we both can use _exactly_ the same data, and it's only a few lines long without calling any custom code that you don't have.
For a good discussion of how to do this, see the [R Lecture](schedule/slides/00-r-review.qmd) or [stackexchange](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example-aka-mcve-minimal-complete-and-ver/5963610#5963610).
### How to write good code
This is covered in much greater detail in the lectures, so see there. Here is my basic advice.
1. Write script files (which you save) and source them. Don't do everything in the console. `R` (and python and Matlab and SAS) is much better as a scripting language than as a calculator.
2. Don't write anything more than once. This has three corollaries:
a. If you are tempted to copy/paste, don't.
b. Don't use _magic numbers_. Define all constants at the top of the script.
c. Write functions.
3. The third is __very important__. Functions are easy to test. You give different inputs and check whether the output is as expected. This helps catch mistakes.
4. There are two kinds of errors: syntax and function.
* The first R can find (missing close parenthesis, wrong arguments, etc.)
* The second you can only catch by thorough testing
5. Don't use __magic numbers__.
6. Use meaningful names. Don't do this:
```r
data("ChickWeight")
out <- lm(weight ~ Time + Chick + Diet, data = ChickWeight)
```
7. Comment things that aren't clear from the (meaningful) names.
8. Comment long formulas that don't immediately make sense:
```r
garbage <- with(
ChickWeight,
by(weight, Chick, function(x) (x^2 + 23) / length(x))
) ## WTF???
```