-
Notifications
You must be signed in to change notification settings - Fork 0
/
rstudio.qmd
223 lines (138 loc) · 15.8 KB
/
rstudio.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "RStudio"
---
## Module Learning Objectives
By the end of this module, you will be able to:
- <u>Describe</u> the computer-to-GitHub order of operations
- <u>Define</u> fundamental Git vocabulary
- <u>Create</u> a local version-controlled repository that is connected to GitHub
## Overview of Git Workflow
Before we get into using Git and GitHub through RStudio, it will be helpful to review the major steps of including version control as you work on code.
Beginning on your local computer, you make **<span style="color:gold">changes</span>** to a file in a folder that you have previously marked for version control tracking (i.e., a **working directory**). Once those changes are made you can **stage changes** within your local computer. After staging, it is best to <span style="color:blue">retrieve the latest file versions from the cloud</span>. You likely will already be up-to-date but this preemptive step can save you a lot of heartache down the line. Once you've confirmed that you have the latest file versions, you can <span style="color:green">shift the revised file(s) to the cloud</span> where any GitHub users with access to your project can access the most recent file and look at the history of all previous changes.
<p align="center">
<img src="images/rstudio/git-workflow.png" alt="Graphic of a white rectangle on top of a blue square. The white rectangle has a happy cloud image and is labeled 'GitHub' while the blue square has an emoji-style laptop. Numbered steps start at the bottom left and work towards the top right as follows: '1-make changes', '2-stage changes', '3-retrieve latest from GitHub', and '4-put in GitHub'" width="70%"/>
</p>
## Git Vocabulary
Finally, it will be helpful to introduce four key pieces of vocabulary before we dive into the interactive component of this workshop.
- **<span style="color:purple">Clone</span>** = copy the entire contents of a GitHub repository to your local computer (done once per computer)
* **<span style="color:gray">Commit</span>** = move a changed local file to your local staging area (step 2 of the above diagram)
* **<span style="color:blue">Pull</span>** = get file(s) *from* the cloud *to* your local computer -- opposite of a "push" (step 3)
* **<span style="color:green">Push</span>** = move file(s) *to* the cloud *from* your local computer -- opposite of a "pull" (step 4)
<p align="center">
<img src="images/rstudio/git-vocabulary.png" alt="Graphics demonstrating a clone (copy entire folder from GitHub to a copmputer), a commit (putting local changes into the staging area), a pull (overwriting local copies with GitHub versions of the same), and a push (overwriting GitHub versions of files with the committed local versions)" width="70%"/>
</p>
## Cloning a Repository
Now, the first step in using Git with RStudio is cloning the repository from GitHub. Note for clarity that in the screenshots below, **GitHub is in dark mode** while **RStudio is in light mode**. To clone a repository, follow these steps:
Navigate to the repository on GitHub and click on **<span style="color:green">Code</span>**. Select "HTTPS" and copy the link.
<p align="center">
<img src="images/rstudio/rstudio-6-new-repo-link.png" alt="Screenshot of the menu that appears in a GitHub repository when someone clicks the 'code' button" width="100%"/>
</p>
Now, return to (or open!) RStudio.
<p align="center">
<img src="images/rstudio/rstudio-7-fresh-start.png" alt="Screenshot of RStudio when no project or scripts are open" width="100%"/>
</p>
Go to the **Project** tab on the top right corner and click **New Project...**
<p align="center">
<img src="images/rstudio/rstudio-8-new-project.png" alt="Screenshot of the menu that appears when someone clicks 'Project: (None)' in RStudio and hovers over the 'New Project...' button" width="100%"/>
</p>
Select **Version Control**.
<p align="center">
<img src="images/rstudio/rstudio-9-version-control.png" alt="Screenshot of a menu with three options: 'new directory', 'existing directory', and 'version control'" width="75%"/>
</p>
Select **Git**.
<p align="center">
<img src="images/rstudio/rstudio-10-clone.png" alt="Screenshot of a menu with two options: 'git' and 'subversion'" width="75%"/>
</p>
Paste the repository URL that you just copied from GitHub. Choose a file path to save your project to.
<p align="center">
<img src="images/rstudio/rstudio-11-clone-repo-details.png" alt="Screenshot of a menu with three open text fields, one for the 'repository URL', one for 'project directory name' and one for 'create project as subdirectory of'" width="75%"/>
</p>
Now we have finished cloning the repository to our RStudio! Notice that we are working in our `git-practice` project and that our `README.md` file shows up under the list of files, just like in our GitHub repository.
<p align="center">
<img src="images/rstudio/rstudio-12-finished-cloning.png" alt="Screenshot of RStudio after a project has been selected. The project name--in the top right corner--is circled in red" width="100%"/>
</p>
## Workflow Refresher
The typical workflow with Git goes like this:
**Step 1**: You modify files in your working directory and save them as usual.
**Step 2**: You **stage** files to mark your intention to "commit" them and then **commit** that version of those files.
- In RStudio, "staging" is done by checking the box next to a given file in the "Git" tab
- Committing files permanently stores them as snapshots to your Git directory
**Step 3**: You **pull** the most recent changes to make sure you've been editing the latest versions.
**Step 4**: You **push** your the version of your files that you committed to GitHub.
Here is the infographic from the start of this chapter again, which shows the same workflow:
<p align="center">
<img src="images/rstudio/git-workflow.png" alt="Graphic of a white rectangle on top of a blue square. The white rectangle has a happy cloud image and is labeled 'GitHub' while the blue square has an emoji-style laptop. Numbered steps start at the bottom left and work towards the top right as follows: '1-make changes', '2-stage changes', '3-retrieve latest from GitHub', and '4-put in GitHub'" width="70%"/>
</p>
## Stage versus Commit
The functional difference between "staging" a file and "committing" one can be a little tough to grasp at first so let's explore that briefly here. We can make an analogy with taking a family picture, where each family member would represent a file.
- **Staging** files is like deciding which family member(s) are going to be in your next picture
- **Committing** is like taking the picture
This 2-step process enables you to flexibly group files into a specific commit. Those groupings can be helpful to you later if you're trying to find what you changed for a specific task (because those changes likely are all in the same commit).
## Creating a New File
Let's try out a simple Git workflow by first creating a new file. This is **Step 1** of the process. We can add new R scripts and files to our repository through RStudio. Try creating a new script by going to **File** > **New File** > **R Script**. Feel free to type anything you want into this script as an example. Name this script after yourself. In the screenshot below, I have named my script `angel-script.R`.
Once you are done, navigate to the **Git** tab on the upper left corner. You should see your new script show up there along with a `.gitignore` and `git-practice.Rproj` file. Do not worry about the `.gitignore` file for now, it was created by RStudio to make sure that some temporary files are not tracked by Git. The `git-practice.Rproj` file will save your settings and open tabs when you close the project, and will restore these settings the next time you open it.
<p align="center">
<img src="images/rstudio/rstudio-13-new-script.png" alt="Screenshot of RStudio with an open script and several uncommitted changes identified in the 'Git' pane. Lack of commit is identified by double yellow squares containing a question mark" width="100%"/>
</p>
Notice that there are color-coded icons next to the files in the "Git" tab. These icons are shorthand for the status--according to Git--of every* file in your working directory. Not technically "every" file because files that are tracked but haven't been modified are not included. See below for definitions.
<p align="center">
<img src="images/rstudio/rstudio-icons.png" alt="A legend of five icons RStudio uses to indicate Git status of a file. Yellow question mark = untracked (file not tracked by Git). Green 'A' = added (file marked to start tracking). Blue 'M' = modified (tracked file with changes). Red 'D' = deleted (tracked file that was deleted). Purple 'R' = renamed (tracked file that was renamed)" width="65%"/>
</p>
In our case, it means that our R script, `.gitignore`, and `git-practice.Rproj` files have never been tracked by Git (since these files were just created). Note also that the `README.md` file is not listed, but it exists (check the Files pane). It is because files that are tracked but have no modifications since the last commit are not listed.
### Adding our Script to the Next Commit
Let us look at the diff of our script. Click on the **Diff** tab.
<p align="center">
<img src="images/rstudio/rstudio-13c-diff.png" alt="Screenshot of the 'diff' button--circled in red--in the 'git' pane of RStudio" width="80%"/>
</p>
Checking our script, we can see the new lines that we just typed are in green, which indicates that these lines have been added for Git. We would like to save a snapshot of this version of our script. Since we've just done **Step 1**, here are the rest of the steps we will need to do to get our script to show up on our GitHub repository:
**Step 2**: Add the file to the next commit by checking the box in front of the file name. Note that the two `?` icons will change to a single `A` on the left to show you that this file is now staged to be part of the next commit.
**Step 3**: In the right pane, type a short but descriptive commit message detailing what you have done so far. Then click on the **Commit** button to save this version of the script in the Git database.
<p align="center">
<img src="images/rstudio/rstudio-14-commit-message.png" alt="Screenshot of the commit menu of RStudio where one file has been staged and an informative commit message has been entered" width="80%"/>
</p>
If all of the above steps went well, you should see something like this:
<p align="center">
<img src="images/rstudio/rstudio-15-finished-commiting.png" alt="Screenshot of the success message after a commit has been made" width="80%"/>
</p>
Notice that Git tells us that `1 file changed` because we've just added a new file to our commit. Now close the window. Before sending our changes back to GitHub, we should make sure that the copy of the repository on RStudio is completely up-to-date with the one on GitHub to avoid any conflicts.
### Getting the Latest Updates
There are two Git commands to exchange between a local and remote versions of a repository:
- Pull: Git will get the latest remote version and try to merge it with your local version
- Push: Git will send your local version to the remote version of the repository (in our case GitHub)
**Before sending your local version to the remote**, you should always get the latest remote version first. In other words, you should pull first and push second. This is the way Git protects the remote version against incompatibilities with the local version. You always deal with potential problems on your local machine. Therefore your sequence will always be:
1. Commit
2. Pull
3. Push
Of course RStudio has icons for that on top of the "Git" tab, with the blue arrow down being for pull and the green arrow up being for push. Remember the icons are organized in sequence!
Let us do the pull and push to synchronized the remote repositories. Click on the **Pull** button to pull changes (if any) from the GitHub repository to the copy on RStudio. We have now synchronized the local (our computer) and remote (on GitHub) versions of our repository. You may have noticed that all of our preceding graphics use **<span style="color:blue">blue</span>** for **pull**-related content and **<span style="color:green">green</span>** for **push**-related information. Hopefully that helps cement the two ideas in your mind!
<p align="center">
<img src="images/rstudio/rstudio-15b-pull.png" alt="Screenshot of the blue 'pull' button and green 'push' button in RStudio" width="40%"/>
</p>
In my case, it turns out that a new script, `lyon-script`, was added to the GitHub repository by a collaborator while I was making my own script. Since I have just pulled, `lyon-script` now shows up in my RStudio files.
<p align="center">
<img src="images/rstudio/rstudio-16-pulled.png" alt="Screenshot of the message returned when you pull files you didn't have locally from GitHub" width="90%"/>
</p>
<p align="center">
<img src="images/rstudio/rstudio-17-finished-pulling-new-script.png" alt="Screenshot of RStudio with the 'git' pane underlined in red where it says 'your branch is ahead of origin/main by 2 commits'" width="100%"/>
</p>
A new message has popped up for me: "Your branch is ahead of 'origin/main' by two commits". This means that I have two additional commits on my local machine that I never shared back to the remote repository on GitHub. If I look at the content of my repository on GitHub, I will see just the `README.md` and `lyon-script`. My changes are NOT in the cloud yet. You might be seeing a similar message as well.
### Sending Changes back to GitHub
So how do we send our changes back to GitHub? Locate the **Push** button on the "Git" tab and click on it. Now your script should show up in the GitHub repository!
<p align="center">
<img src="images/rstudio/rstudio-15b-pull.png" alt="Screenshot of the blue 'pull' button and green 'push' button in RStudio" width="40%"/>
</p>
Once you click that button you should get a success screen like the one pictured below.
<p align="center">
<img src="images/rstudio/rstudio-18-pushed.png" alt="Screenshot of the message received when a push is performed successfully from RStudio" width="90%"/>
</p>
Navigate back to the GitHub website and find your repository. Check to see if your script has been added correctly. In my case, `angel-script.R` finally shows up in my repository.
<p align="center">
<img src="images/rstudio/rstudio-19-updated-repo.png" alt="Screenshot of a GitHub repository with the commits made in earlier screenshots shown in its history" width="100%"/>
</p>
:::callout-note
## If RStudio ever asks for a "password"...
If your personal access token (PAT) was not set up correctly with RStudio or if it expired, then RStudio will ask for your GitHub username and password in a pop-up when you try to push. **Please be aware that when they ask for a "password", they actually meant your token!** Enter your token in the field and you should be able to push now. Make sure to run `gitcreds::gitcreds_set()` to set a valid token afterwards so you don't have to enter it manually every time!
:::
### Rinse and Repeat
Great! Now that your script has been added to the group repository, you should try to repeat the same workflow over again just to get a feel for how it works. Go back to RStudio and edit your own script. Save those edits, add your edited file to the staging area, write a commit message, then commit your changes. After committing, make sure to pull first then push after! When you pull, you might notice that scripts from your group members/collaborators will show up in your RStudio files.
Make sure to work on your own script. If you and another group member work on the same script at the same time, this may lead to merge conflicts with Git. If two people were to work on the same script, they may be making different edits on the same lines, and Git would not know which edits to keep. To avoid merge conflicts, be mindful of what files you are working on and always communicate this to your group members!