-
Notifications
You must be signed in to change notification settings - Fork 11
/
git-github.qmd
950 lines (655 loc) · 46.3 KB
/
git-github.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
---
title: Introduction to Git(Hub)
author: Megan Michel, James Fellows Yates
bibliography: assets/references/git-github.bib
---
::: {.callout-note collapse="true" title="Self guided: chapter environment setup"}
For this chapter's exercises, if not already performed, you will need to download the chapter's dataset, decompress the archive, and create and activate the conda environment.
Do this, use `wget` or right click and save to download this Zenodo archive: [10.5281/zenodo.13759333](https://doi.org/10.5281/zenodo.13759333), and unpack.
```bash
tar xvf git-github.tar.gz
cd git-github/
```
We can then create the subsequently activate environment with.
```bash
conda env create -f git-github.yml
conda activate git-github
```
:::
In this walkthrough, we will introduce the version control system _Git_ as well as _Github_, a remote hosting service for version controlled repositories.
Git and Github are increasingly popular tools for tracking data, collaborating on research projects, and sharing data and code, and learning to use them will help in many aspects of your own research.
For more information on the benefits of using version control systems, see the slides.
By the end of this chapter you will have learned how to:
- Set up SSH keys to allow passwordless interaction between your local machine and GitHub
- Create a GitHub repository
- Copy the repository to your local machine
- Make changes, preserve with to version control
- Synchronize changes between your local and remote repositories
- How to work on the same repository in parallel with others
## Background
What is a version control system? This is a general term for tools that allow us to track changes to objects - in this case files - over time.
When it comes to bioinformatics, this is typically files such as scripts or notebooks, or simple text files such as CSV and FASTAs (although it can also apply to much larger binary files!).
The use of a good version control system allows the restoration of old versions, modification to previous changes, tracking contributions by multiple people etc.
By far the most popular version control system in bioinformatics is [git](https://git-scm.com/) - which was in fact also originally co-written by the creator of the Linux operating system!
Nowadays it is popular to include a remote hosting service for version-controlled repositories.
In bioinformatics, the most popular remote hosting service for Git version controlled code repositories is [GitHub](https://github.com).
While other open-source alternatives exist (e.g. [GitLab](https://gitlab.com) or [BitBucket](htttps://bitbucket.org)), the most popular in bioinformatics is GitHub.
It provides a user-friendly GUI and a range of other useful tools and functionality, in addition to most bioinformatic code and tools are hosted there.
So why should you use a version control system, such as GitHub?
1. To have a (deep) backup of our work
2. Allow you to revert to old versions/modify previous changes to files
3. Allow multiple contributors to work simultaneously
4. Allow you to test new scripts or code before updating a public version in a 'sandbox' area of the repository
5. Help share our data, code, and results with the world!
:::{.callout-tip}
It's not just for reproducible code that sharing things on places like GitHub can be useful!
You can use (well-archived) Git repositories to 'get around' publisher limits when it comes to things such as supplementary files in publications!
For example, in addition to code and scripts, I have previously used a GitHub repository to host additional supplementary figures, methods descriptions, and tables.
You can see this here: [https://github.com/jfy133/Hominid_Calculus_Microbiome_Evolution](https://github.com/jfy133/Hominid_Calculus_Microbiome_Evolution).
Furthermore, by including and describing them alongside with my code, it makes it much easier (and searchable) for other researchers to find out exactly how I did things!
The important thing is to archive the repository on places like Zenodo ([https://zenodo.org/](https://zenodo.org/)), to ensure longevity alongside the publication, and cite the archive DOI in your main manuscript. The corresponding Zenodo DOI for the repository above is: [https://zenodo.org/doi/10.5281/zenodo.3740492](https://zenodo.org/doi/10.5281/zenodo.3740492).
:::
## Basic workflow
The basic concepts of using git and GitHub are as shown in (@fig-gitgithub-workflowdiagram)
![Overview of basic operations when using git, and the pushing to a remote host such as GitHub. See chapter text for description of image. Reconstructed and modified after @Chacon2014-jq.](assets/images/chapters/git-github/git-workflow.png){#fig-gitgithub-workflowdiagram}
In the diagram of @fig-gitgithub-workflowdiagram, the two dark gray boxes represent a local machine (e.g. a laptop) and a remote server.
Within the local machine, an 'untracked' box represent files not indexed by the local git repository.
The arrow pointing into a light grey box (the local repository) in which three white boxes are present.
These represent different 'stages' of an object of the repository.
This first arrow from the 'untracked' box then spans to the furthest box called '_staged_', which is the operation when we add a file to be indexed by the repository (this only happens once).
Once staged, the arrow pointing from the 'staged' box back to the first box or 'status' within the local repository, called '_unmodified_'.
This arrow, converting a staged file to unmodified, represents making a '_commit_' (i.e. recording to git history a repository change).
We can imagine committing to be equivalent to a permanent(!) save.
The next arrow represents an edit to a file, which spans the 'unmodified' box to the middle '_modified_' status.
Once all edits have been made, the edited file is 'staged' - the arrow that goes from the middle of the 'modified' state to the 'staged' state - after which a commit again would be made to record that file as 'unmodified' compared to the change history.
The arrow pointing from the local repository back to the furthest left '_untracked_' state of the local repository represents the removal of the file from indexing/tracking in the git history.
Finally the two arrows that span between the local machine and remote server - one going from the local repository to the a remote repository (on the server) - represent '_push_ ing' the commit history to the server, and in the reverse direction - '_pull_ ing' the commit history back to the local repository.
These can be imagined as backing-up our git history to a cloud server, and the retrieving the backup (albeit with changes from others)
::: {.callout-tip title="Question" appearance="simple"}
Why do you think it is important to have a 'staging' areas of changes before committing them to the git history?
:::
::: {.callout-note collapse="true" title="Answer"}
This can be useful when you are adding multiple modifications to multiple files, that all address the same 'fix' or 'function'.
By staging the files, you can commit them all at once, and have a single entry in the git history that describes all the changes you made.
:::
## Preparation
We will now practise some of the git operations described above.
However, before we do this, we need to set up a GitHub account so that we can communicate via the command line.
GitHub does not allow pushing and pulling with normal passwords, but rather with a concept called **ssh keys**.
**ssh keys** are special cryptographic strings of characters and numbers.
When generating a pair, we get both a 'private' and 'public' key.
The former we keep privately, whereas the other we upload to other servers/people.
When we want to 'prove' that it's us sending changes to the repository, we securely send the private key and this gets compared with the corresponding public key that we uploaded on the remote server.
If after some cryptographic maths magic they match, the server will trust us and will accept our changes.
## Using ssh keys for passwordless interaction with GitHub
So, to begin, we will set up an SSH key to facilitate easier authentication when transferring data between local and remote repositories.
In other words, follow this section of the tutorial so that we never have to type in our github password again!
### Creating SSH keys
First, we can generate our own ssh key pair, replacing the email below with our own email address.
```bash
ssh-keygen -t ed25519 -C "<YOUR_EMAIL>@<EXAMPLE>.com"
```
::: {.callout-note}
The `-t` flag tells the command which cryptographic algorithm to use.
:::
::: {.callout-warning}
Check for typos!
Common errors include:
- `ssh-keygen` is a single word (no spaces)
- Check the numbers in the `-t`
- The `-C` flag is a capital `C`!
:::
When we type this command, we will be asked a range of questions:
1. Enter file which to save key: here we suggest keep as default
2. Enter passphrase: _don't_ specify one here (unless we want to be _ultra_ secure), just press enter
3. Press enter without any passphrase to confirm
We should now (hopefully!) have generated an ssh key.
This is normally indicated by a 'random art image' being pasted to console.
A random art image normally looks like something like:
```bash
+--[ RSA 2048]----+
| o=. |
| o o++E |
| + . Ooo. |
| + O B.. |
| = *S. |
| o |
| |
| |
| |
+-----------------+
```
To check that it worked, we can change into the default directory where keys are stored.
By default on UNIX operating system this is in our home directory under a folder called `.ssh`.
Lets change into that directory, and check the contents.
```bash
ls ~/.ssh/
```
We should now see two files: `id_ed25519`, and `id_ed25519.pub`, amongst others.
The first is a _private_ key.
This we should not share this any one, and should always stay on our local machine.
The second file (ending in `.pub`), is a _public_ key.
This we can give to others, on remote servers, or websites, to allow those places to know it is us.
So lets try this out on GitHub!
### Logging the keys
First, we need to tell our computer that the keys exist and should be used for validation to remote locations.
The tool that does that is called `ssh-agent`.
We can check if it is running with the following command.
```bash
eval "$(ssh-agent -s)"
```
If it's running, we should get a text such as `Agent pid <NUMBERS>`.
If it's not running, see the [GitHub documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#adding-your-ssh-key-to-the-ssh-agent) for more information.
::: {.callout-information}
An operating system assigns each running program a unique number ID. If a program isn't running, it won't have a process ID!
:::
When the agent is running, we need to give the path to the _private_ key file as follows.
```bash
ssh-add ~/.ssh/id_ed25519
```
### Registering keys on GitHub
Next, GitHub needs to have our public key on record so it can compare between the public and private keys.
Open our web browser and navigate to the github account settings page (typically: press the profile picture in the top bar menu, then settings).
Then, under settings, in the side bar go to _SSH & GPG Keys_ (@fig-gitgithub-settings-sidebar), then press _New SSH Key_ (@fig-gitgithub-settings-newkey).
![Screenshot of Github settings page sidebar (as of August 2023), with the 'SSH and GPG keys' section highlighted under the 'Access' section.](assets/images/chapters/git-github/github-settings-1.png){#fig-gitgithub-settings-sidebar}
![Screenshot of the top of the Github SSH and GPG keys page (as of August 2023), with a green 'New SSH Key' button.](assets/images/chapters/git-github/github-settings-2.png){#fig-gitgithub-settings-newkey}
When in the 'new SHH key' page:
- we can give key a title (e.g. the local machine the key was generated on).
- Leave the 'Key type' as 'Authentication Key'
- Paste the entire contents of _public_ key into the main text box that we just generated on our local machine.
```bash
cat ~/.ssh/id_ed25519.pub
```
::: {.callout-warning}
It's very important to paste the _whole_ string!
This starts with `ssh-ed` (or whatever algorithm used) and ending in our email address.
:::
Finally, press the Add SSH key.
To check that it worked, we run the following command on our local machine.
```bash
ssh -T git@github.com
```
We should see a message along the lines of the following
```bash
Hi <YOUR_USERNAME>! you that you've successfully authenticated.
```
::: {.callout-note}
If we get a message saying something such as.
```bash
The authenticity of host 'github.com (140.82.121.3)' can't be established.
```
Type `yes` on the keyboard and press enter.
:::
For more information about setting up the SSH key, including instructions for different operating systems, check out github's [documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).
::: {.callout-tip title="Question" appearance="simple"}
What are the benefits of using SSH keys for authentication with GitHub?
:::
::: {.callout-note collapse="true" title="Answer"}
1. We don't have to remember our password every time we push or pull from GitHub
2. It helps make it easier to get into the habit of regular commits, pushes, and pulls
3. It's more secure than using a password!
:::
## Creating a GitHub repository
Now that we have set up our own SSH key, we can begin working on some version controlled data!
Navigate to your GitHub homepage ([https://github.com](https://github.com)) and create a new repository.
We can normally do this by pressing the ➕ icon on the homepage (typically in right hand side of the top tool bar).
For this tutorial, on the new repository page (@fig-gitgithub-createnewrepo):
- Choose any name for our new repo (including the auto-generated 'inspired ones')
- Leave the description as empty
- Select that the repository is 'public' (the default)
- Tick the 'Add a README file; checkbox
- Leave as default the `.gitignore` and license sections
Then press the green 'Create repository' button.
![Screenshot of top half of GitHub's Create repository interface for creating a new repository, showing owner, empty repository name box, radio boxes indicating whether the repository should be Public or Private](assets/images/chapters/git-github/create_repo.png){#fig-gitgithub-createnewrepo}
::: {.callout-warning}
For the remainder of the session, replace the name of my repository (vigilant-octo-journey) with your own repo name.
:::
Change into the directory where we would like to work, and let's get started!
## The only 6 commands you only need to really know
We have set all our authentication keys for GitHub, and created a new repository.
We can now run through the some of the concepts we learnt in the [Basic Workflow](#basic-workflow) section.
This can be boiled down to just _six_ that we really need to work with for Git(Hub) for basic version control of all our software, scripts, and (small) data!
To start, make sure you're in this session directory
```bash
mkdir /<path>/<to>/git-github
```
### git clone
First, we will learn to _clone_ a remote repository onto our local machine.
This is actually _not_ in our basic workflow diagram in @fig-gitgithub-workflowdiagram, however we only need to do it once, and is only needed when we work with a remote server.
With `clone` we are making a copy of the _remote_ repository, and linking it so we can transmit data between the copy on the _local_ machine with the _remote_ repository on the server.
To make the 'copy', navigate to our new repo:
1. Select the green code dropdown button (@fig-gitgithub-clonebutton)
2. Make sure to select SSH
3. Copy the full address `git@github.com<...>` as shown below in @fig-gitgithub-clonebutton.
![Screenshot of the GitHub repository interface with the green 'code' drop down button pressed, and the menu showing the 'clone' information for SSH cloning (i.e., the copyable SSH address)](assets/images/chapters/git-github/git_clone.png){#fig-gitgithub-clonebutton}
Back at our command line, clone the repo as follows.
```bash
git clone git@github.com:<YOUR_USERNAME>/<YOUR_REPO_NAME>.git
```
::: {.callout-warning}
It's important that select the `ssh` tab, otherwise we will not be able to push and pull with our ssh keys!
If we get asked for a password, we've not used `ssh`!
Press <kbd>ctrl</kbd> + <kbd>c</kbd>, to cancel, and try the command again but with the correct `ssh` address.
:::
Once cloned, we can run `ls` to see there is now a directory with the same name as repository.
```bash
ls
```
If we change into it and run `ls` again, we should see the `README.md` file we specified to be generated on GitHub.
```bash
cd <NAME_OF_REPO>
ls
```
### git add
Next, let's _add_ a new and modified file to our 'staging area' on our local machine.
This corresponds either to the red arrow or blue arrows in (@fig-gitgithub-workflowdiagram-staging).
We can do this in two ways
1. Stage a previously untracked file
2. Stage a tracked, but now modified file
![Overview of basic operations when using git (following @fig-gitgithub-workflowdiagram) but with the command `git add` operations highlighted . The arrows indicating the two 'staging' operations, carried out by the `git add` command, are coloured red for a staging a 'previously untracked', and blue for the editing of an already tracked file and staging of the edited file. Reconstructed and modified after @Chacon2014-jq.](assets/images/chapters/git-github/git-workflow-staging.png){#fig-gitgithub-workflowdiagram-staging}
First we will make a new file called `file_A.txt`, and also add some extra text to the end of the `README.md` file (i.e., just modify).
Once we've made those both, lets first only stage the new file.
```bash
echo "test_file" > file_A.txt
echo "Just an example repo" >> README.md
git add file_A.txt
```
### git status
So hopefully we've staged at least one file, but how do we know exactly what the status is of the modified and unmodified files in the repository?
At any point we can use the command `git status` to give a summary of any files present in the repository directory that changed status since the last commit (i.e., the last preserved entry in the git history).
```bash
git status
```
```bash
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: file_A.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
```
We should see that `file_A.txt` is staged and 'ready to be committed' but `README.md` is NOT staged - thus the changes would not be preserved to the git history.
Comparing to our diagram in @fig-gitgithub-workflowdiagram-staging, we have performed the 'red' arrow, but for the blue arrow, we've only carried out the 'Edit file' arrow, the second 'Stage edited file' is not yet carried out.
::: {.callout-tip title="Task" appearance="simple"}
Stage the modified `README.md` file and check the status again.
:::
::: {.callout-note collapse="true" title="Answer"}
```bash
git add README.md
git status
```
```{verbatim}
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: README.md
new file: file_A.txt
```
We should now see both `README.md` and `file_A.txt` coloured green, and in the 'changes to be committed' section.
We can also see the README is 'modified' whereas `file_A.txt` is a _new_ file, so will be newly indexed with git (i.e., will be the first entry in the git history for that file).
:::
### git commit
Now we need to package or save the changes into a _commit_ with a message describing the changes we've just made.
Each commit (i.e., entry in the git history) comes with a unique hash ID and will be stored forever in git history.
Committing corresponds to the read arrow taking all _staged_ modified or newly added files from the 'Staged' to 'Unmodified state' in (@fig-gitgithub-workflowdiagram-commit).
![Overview of basic operations when using git (following @fig-gitgithub-workflowdiagram) but with the command `git commit` operations highlighted. The red arrow shows committing the changes to history, i.e., taking a tracked file in the staging error, writing the modifications in the file to the git history and placing the file back into the 'unmodified' status column. Reconstructed and modified after @Chacon2014-jq.](assets/images/chapters/git-github/git-workflow-commit.png){#fig-gitgithub-workflowdiagram-commit}
```bash
git commit -m "Add new file and modify README"
```
::: {.callout-note}
The first time we commit, we may get a message about something like this.
```
Your name and email address were configured automatically based
on your username and hostname. <...>
```
For the purposes of this tutorial this is fine, but it is highly recommended to follow the instructions in the message to correctly associate our commits to our GitHub account.
:::
The `-m` part of the command corresponds to the human-readable description of the change.
::: {.callout-tip title="Question" appearance="simple"}
What happens if we run `git status` again?
:::
::: {.callout-note title="Answer" collapse="true"}
```{verbatim}
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
```
We can see while there are no modified or staged files listed any more, our 'branch' is now ahead by 1 commit.
This means that our local copy has an extra entry in the git history, compared to the remote server ('origin') copy of the repository.
:::
### git push
How then do we 'backup' our local changes and the git history back up to the server?
We do that with, yes you guessed it, the `git push` command!
This is the red dashed arrow in (@fig-gitgithub-workflowdiagram-push).
![Overview of basic operations when using git (following @fig-gitgithub-workflowdiagram) but with the command `git push` operations highlighted. The red dashed arrow between the local repository box of the local machine, to the remote repository on the server. This represents 'pushing' or sending the changes made on the local machine back up to remote repository, so both the local and remote repository have the same records of changes to the files. Reconstructed and modified after @Chacon2014-jq.](assets/images/chapters/git-github/git-workflow-push.png){#fig-gitgithub-workflowdiagram-push}
We can run this as follows.
```bash
git push
```
```{verbatim}
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 14 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 367 bytes | 367.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:<USERNAME>/<REPOSITORY_NAME>.git
536183b..2d252b5 main -> main
```
When we do this, we will get a bunch of lines including a set of progress information.
Once we get a couple of hash strings, all our changes have been copied to the remote!
::: {.callout-tip title="Question" appearance="simple"}
We mentioned earlier that as well as our human-readable commit 'message', we will also get a unique hash string for each commit.
Where on the output are the commit hashes?
What do you think the two hashes represent?
:::
::: {.callout-note title="Answer" collapse="true"}
The hash line is:
```{verbatim}
536183b..2d252b5 main -> main
```
The two hashes represent the previous (536183b) entry in the git history, and the new one (2d252b5) of the one we just made when we ran `git commit`.
Tip: try typing `git log`!
The `main` bit of the string represents the branch we pushed to. We will learn more about this later in this chapter.
:::
If we go to our GitHub repository on the website, and refresh the page, we should see the changes - both the file in the file browser at the top, and the new text we added to the `README.md` file!
### git pull
But what if we worked on a different local machine, and pushed changes from there?
How do we the download new commits from our remote to our local repository?
![Overview of basic operations when using git (following @fig-gitgithub-workflowdiagram) but with the command `git pull` operations highlighted. The red dashed arrow going from the remote repository on the server back to the local repository represents copying changes pushed from _elsewhere_ back to our local copy of the repository. Reconstructed and modified after @Chacon2014-jq.](assets/images/chapters/git-github/git-workflow-pull.png){#fig-gitgithub-workflowdiagram-pull}
We carry this out with the counterpart of `push`, `pull` (@fig-gitgithub-workflowdiagram-pull)!
Try running the command now!
```bash
git pull
```
::: {.callout-tip title="Question" appearance="simple"}
What output do we get from running `git pull`?
Why does it say what it says?
:::
::: {.callout-note title="Answer" collapse="true"}
We should get a message of `Already up to date`.
This is because we have made no further changes to the remote version of the repository since we pushed!
:::
So how can we make changes to the remote repository?
One way would be to make a local clone elsewhere on a difference machine (or in a different folder on the same machine!), make some changes and commits there, and then push from there, and pull to our current local repository...
But that sounds rather convoluted, no 😉?
Instead, another benefit of GitHub is we can actually make changes to our repository from our web browser!
In our web browser, in the file browser, click on the `README.md`.
If we're still logged into our GitHub account, we should see a small pencil icon in the top right of the file viewer (@fig-gitgithub-exampleeditgithub).
![Screenshot of a GitHub file viewer of a README.md file, with the pen 'edit' icon in the top right hand corner](assets/images/chapters/git-github/github-edit-file.png){#fig-gitgithub-exampleeditgithub}
After pressing the pencil icon, add some more contents to the `README.md` file, then press the green 'Commit changes' button in the top right, write a commit message (sounds familiar?), and press the next 'green 'Commit changes' button (we can ignore the extended description).
On the resulting file browser, we should see our changes, and our commit message above the file browser with the commit hash!
Moving back to our terminal, try running `git pull` again.
This time we should get a bunch of progress bars and statistics again
::: {.callout-tip title="Question" appearance="simple"}
What do you think the number next to the pulled file name means?
What do the two colours represent?
:::
::: {.callout-note title="Answer" collapse="true"}
The number represents the number of lines with changes on them. The green plus and red minus symbol represents number lines with 'additions' modifications, and deletion modifications accordingly!
A description of the changes is seen in the last line of the `git pull` output.
:::
### Keep practising!
And that's the basic 6 commands you need to know to work with Git and GitHub for our own work - repeat _ad nauseum_!
Git is supremely powerful, but can get extremely complicated because of this.
If we stick with these 6 commands to begin, and as you slowly get more comfortable with the routine, we recommend to start step-by-step broadening our `git` knowledge as you come across other questions and problems with `git`.
There is no one comprehensive course or documentation, so we recommend just keep practising!
## Working collaboratively
Once we're comfortable with working Git for our own projects, it's time to learn a few more things to help us work both more safely, efficiently, and also eventually collaboratively!
Git facilitates 'sandboxes' potential changes and also simultaneous work by small teams through branching, forks, and pull requests.
Branching generates an independent copy of the 'mainline' repository _contents_.
Forking generates an independent copy of an entire repository (with all of it's settings, but with it's own git history)
When working on a branch, we can make as many changes and edits as we wish without breaking or modifying the 'master' version.
When working on a fork, we can make even more changes and edits to the code but also the repository itself, without modifying the original repository (or it's codes).
Once we're happy with the changes we've made in our branch or fork acting as the 'sandbox', we can then incorporate these changes into our 'mainline' repository using a `pull request` and a `merge`.
We can make a branch from any point in our git history, and also make as many branches as we want!
Forks you can only fork at the particular point of latest state of the history when you make the fork.
### Branches
There are two ways we can make a branch.
The first way is using the GitHub interface, as in @fig-gitgithub-githubbranch.
![Three panel screenshot of GitHub interface for switching and creating branches. Panel A: A dropdown menu appears when we press 'main' (or the name of the current branch). Panel B: A search bar in the dropdown allows us to search for existing branches which appears in the search results. If no branch with that name exists, the search results presents a button saying 'Create branch <name> from 'main'. Panel C: Once the new branch is made, we are returned to the repository file or file browser, but with the name on the dropdown listing the name of the new branch we made.](assets/images/chapters/git-github/git_switch.png){#fig-gitgithub-githubbranch}
The instructions in @fig-gitgithub-githubbranch would result in the branch existing on the _remote_ copy of our repository i.e., on GitHub.
To get this branch on our local copy, we can simply run `git pull` in our terminal!
We can also create branches via command line.
From our terminal, we can create a new branch as follows.
```bash
git switch -c test-branch
```
```{verbatim}
Switched to a new branch 'test-branch'
```
The `-c` flag indicates to _create_ the new branch with the name we provide, in the example above this is `test-branch`.
::: {.callout-note}
Earlier versions of `git` used a command called `git checkout`.
Often we will see this command on many older tutorials.
However `git switch` was created as a simpler and more intuitive command.
`git checkout` is much more powerful command, but with great power comes great responsibility (and the risk to break things...)
:::
To switch back to the original main branch we were on, run the same command but without the `-c` flag and with the name of the branch we switch to.
```bash
git switch main
```
::: {.callout-warning}
Note that if we start making changes on one branch we _must commit changes_ for them to be saved to the desired branch, before we switch to a new branch!
Uncommitted changes will 'follow' we to which ever branch we are on _until_ make a commit.
:::
::: {.callout-tip title="Bonus Question" appearance="simple"}
What command could we run to see which branches already exist in the local repository we are on? This command has not been introduced in this tutorial!
Try running
```bash
git --help
```
or Google it to find the answer!
:::
::: {.callout-note title="Answer" collapse="true"}
```bash
git branch
```
```{verbatim}
main
* test-branch
```
Where the green colour and the star indicates the branch we are currently on.
:::
### Pull requests
A Pull request (a.k.a. PR) is the GitHub term for proposing changes to a branch from another branch.
Others can comment and make suggestions before our changes are merged into the main branch.
A pull request is the safest way to check our changes before we merge them in, and to ensure we don't make any mistakes breaking something else in our 'receiving' branch.
::: {.callout-tip}
Git(Hub) will tell and warn us if we are proposing changes on a line where _since_ we branched, someone else has modified that line in the 'receiving' branch.
This is called a merge conflict, and is up to us to decide which is the correct change to retain.
:::
To make a pull request, we first make sure we have a branch with some changes on it.
::: {.callout-tip title="Task" appearance="simple"}
On our local repository, in our terminal, switch back to our `test-branch`, and add another new line to the end of `README.md`, stage the file, commit, and push.
(If we get stuck, or unsure, feel free to check the Answer).
:::
::: {.callout-note collapse="true" title="Answer"}
First change to the branch.
```bash
git switch test-branch
```
```{verbatim}
Switched to branch 'test-branch'
```
Make the edit.
```bash
echo 'We love SPAAM!' >> README.md
git status
```
```{verbatim}
Switched to branch 'test-branch'
```
Add and commit the change to git history.
```bash
git add README.md
git commit -m 'Update README'
```
```{verbatim}
[test-branch 99c4266] Update README
1 file changed, 1 insertion(+)
```
Push the changes to the remote version of the repository.
```bash
git push
```
```{verbatim}
fatal: The current branch test-branch has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin test-branch
```
In this case, our remote copy of the repository does not have the branch.
Therefore the first time we push, we tell Git to tell GitHub to create the branch on the remote as well by using the command it suggests.
```bash
git push --set-upstream origin test-branch
```
```{verbatim}
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 14 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 352 bytes | 352.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
remote:
remote: Create a pull request for 'test-branch' on GitHub by visiting:
remote: https://github.com/<USERNAME>/<REPOSITORY_NAME>/pull/new/test-branch
remote:
To github.com:<USERNAME>/<REPOSITORY_NAME>.git
* [new branch] test-branch -> test-branch
Branch 'test-branch' set up to track remote branch 'test-branch' from 'origin
```
:::
Once we've pushed our changes to our `test-branch` branch, go to the GitHub interface for the repository.
Open the 'Pull requests tab' and press the green 'New pull request' button, or press the green 'Compare & pull request' button in the yellow box that _may_ appear if we recently pushed (@fig-gitgithub-newpullrequest).
![Screenshot of GitHub pull requests tab, with green button for 'New pull request' and a yellow message saying a branch had recent pushes and another green button next to it saying 'Compare & pull request'.](assets/images/chapters/git-github/github-new-pull-request.png){#fig-gitgithub-newpullrequest}
Once opened, we can add a title and a description of the pull request (@fig-gitgithub-openingpullrequest).
![Screenshot of the opening a pull request page, with two text boxes for adding a title, a longer description, and a green 'Create pull request' button at the bottom.](assets/images/chapters/git-github/git-pullrequest-example.png){#fig-gitgithub-openingpullrequest}
Once we press the Create pull request button, it'll open the unique Pull request of this branch, in which others can leave comments and suggestions (@fig-gitgithub-pr-conversation).
By pressing the 'Files changed' tab, we can see exactly what has changed (@fig-gitgithub-pr-files).
::: {layout-ncol=2}
![Example screenshot of conversations tab of an open GitHub pull request, with the title, multiple tabs (Conversation, commits, checks, and file changed), an empty description box, and a comment box at the bottom.](assets/images/chapters/git-github/github-pr-conversation.png){#fig-gitgithub-pr-conversation}
![Example screenshot of files tab of an open GitHub pull request with a text file of README.md being displayed. In the display a red line highlights the old state of a modified line, and a green line right below shows the changes (in this case, a number of empty spaces have been added to the end of the line).](assets/images/chapters/git-github/github-pr-files.png){#fig-gitgithub-pr-files}
:::
Once we and our collaborators are happy with the changes (a code or pull request 'review'), we can go back to the 'Conversation' tab of the unique pull request, and press the green 'Merge pull request' button, and confirm the merge.
When back on the `main` branch of our repository, we should see our updated `README.md` in all it's glory (@fig-gitgithub-merged-changes)!
![Screenshot of GitHub repository `main` branch after a pull request merge, with the text changes that were originally written on the `test-branch` displayed in the `README.md` file.](assets/images/chapters/git-github/github-merged-changes.png){#fig-gitgithub-merged-changes}
For more information on creating a pull request, see [GitHub's documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
::: {.callout-tip title="Task" appearance="simple"}
What command would we use to merge one branch into another on a local copy of our repository?
Tip: you are working with yourself when you work on your local repository. In this case you don't need to 'request' a pull!
:::
::: {.callout-note collapse="true" title="Answer"}
```bash
git switch main
```
```{verbatim}
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
```
```bash
git merge test-branch
```
```{verbatim}
Updating 0ad9bb3..99c4266
Fast-forward
README.md | 1 +
1 file changed, 1 insertion(+)
```
:::
### Forks
When working collaboratively (particularly in big teams), or you're not yet in the development team of a repository (but would like to be), it's often safer to work completely on our own _repository_, rather than just a branch.
Furthermore, sometimes we want to 'diverge' entirely from the original code as we want to do something very different from the original authors of the tool we're forking from but with the same original code base.
In these cases, instead of using branches, we can _completely_ isolate our sandbox using a `fork`.
A `fork` is a complete copy of the entire repository into a new repository, but on our own GitHub account.
This means that we can even completely mess up and destroy our entire 'copy' of the original repository - even deleting it - without effecting the original project.
:::{.callout-tip}
For most scientists, forks are probably not necessary as you are working on your own projects and scripts.
Forks are normally useful for larger team-based projects that require stricter control over the stability of main code.
:::
To make a fork on GitHub, first go to the repository we wish to 'go our own way' with.
For this tutorial, lets use [https://github.com/SPAAM-community/summerschool-github-practise](https://github.com/SPAAM-community/summerschool-github-practise)!
Once there, press the 'Fork' button, which is in near the top right hand side of the repository, next to the 'pin', 'watch', and 'star' buttons (@fig-gitgithub-fork-button).
![Screenshot of the buttons on a GitHub repository used for pinning, watching, forking, and starring the repository.](assets/images/chapters/git-github/github-fork-button.png){#fig-gitgithub-fork-button}
Once we press 'Fork', we will get a new window with a variety of options for setting up our new _Fork_ (i.e., copy) of the entire original repository (@fig-gitgithub-fork-options).
These include things such as renaming the fork, setting a new description, and whether to copy the main branch only.
Whether we want to change any of these depends entirely on what we plan to do with the fork.
In this case, for the tutorial we will leave it up to you.
![GitHub create Fork page, with a variety of options displayed such as setting the name of the fork, setting a new description, and whether to copy the main branch only.](assets/images/chapters/git-github/github-fork-options.png){#fig-gitgithub-fork-options}
Once we press 'Create Fork', we will normally wait a moment or two for GitHub to do it's thing, and then we'll see a shiny new repository!
Importantly, we will see at the top the name of new fork but next to your profile picture.
Additionally, we will have an indication that indeed it is a fork with a message starting 'forked from' (@fig-gitgithub-fork-createdfork).
![The top of a newly forked repository, with a message below the 'summerschool-github-practise' repository name saying 'forked from 'SPAAM-community/summerschool-github-practise'.](assets/images/chapters/git-github/github-fork-createdfork.png){#fig-gitgithub-fork-createdfork}
Now we can once again practise doing a pull request!
However, instead of doing this from a branch, we can can also do a pull request _across_ repositories, i.e., from a fork to the original repository (or even a fork to a fork!) - as long as they share the same git history!
For our tutorial, we will add our name to the end of the list of people on the README of our fork of 'summerschool-github-practise', and then open a pull request.
To make the edit, we will just the GitHub interface for doing so.
Press the pencil icon on the top right of the rendered README.md file (@fig-gitgithub-fork-editpencil).
![Screenshot of GitHub rendered repository README, with a pencil icon in the top right.](assets/images/chapters/git-github/github-fork-editpencil.png){#fig-gitgithub-fork-editpencil}
Once the edit window is opened, add your name and GitHub user name to the list (@fig-gitgithub-fork-addname).
![Screenshot of GitHub file edit window, with a name added to a bullet point list at the bottom.](assets/images/chapters/git-github/github-fork-addname.png){#fig-gitgithub-fork-addname}
Make our commit to record the change to Git history (@fig-accessingdata-firstpagefig-gitgithub-fork-commitedit) and double check we've made the change (@fig-gitgithub-fork-confirmedit).
![A commit message being written describing the addition of a new name in the GitHub commit interface.](assets/images/chapters/git-github/github-fork-commitedit.png){#fig-accessingdata-firstpagefig-gitgithub-fork-commitedit}
![The rendered README with the newly added name at the bottom of the list.](assets/images/chapters/git-github/github-fork-confirmedit.png){#fig-gitgithub-fork-confirmedit}
Back on our forked repository, we can navigate back to the original repo by pressing the link below our Fork's name (@fig-gitgithub-fork-returntooriginalrepo).
![Clicking on the link of the original repository below the name of the fork shows a mini-summary of the original repository's information.](assets/images/chapters/git-github/github-fork-returntooriginalrepo){#fig-gitgithub-fork-returntooriginalrepo}
We can then go to the Pull Request Tab, and press the 'New Pull Request' button (@fig-gitgithub-fork-newprbutton)
![Pressing the green 'new pull request' button on the original repository's Pull Request tab](assets/images/chapters/git-github/github-fork-newprbutton.png){#fig-gitgithub-fork-newprbutton}
Once in the open pull request interface, we almost do the same thing as we did when opening pull requests from branches within the same repository.
A critical difference when dealing with Forks, however, is we must specify _which_ fork our changes are coming from.
We can do this by pressing 'compare across forks', and selecting our fork from the drop down menu (@fig-gitgithub-fork-compareacrossforks).
![The Pull Request creation window, with the 'compare across forks' link pressed, and the name of the edited fork in the dropdown menu as the 'source of the changes' (head) being proposed to go into the 'main' branch of the name of the original repository (base).](assets/images/chapters/git-github/github-fork-compareacrossforks.png){#fig-gitgithub-fork-compareacrossforks}
Once selected, as with branch pull requests, we can check the changes we are proposing, and if happy, press the 'Open Pull Request' to see our Pull Request being ready for review (@fig-gitgithub-fork-openedpr)
![An opened Pull Request on the GitHub interface of the SPAAM-community/summerschool-github-practise repository](assets/images/chapters/git-github/github-fork-openedpr.png){#fig-gitgithub-fork-openedpr}
With that you can request reviews from the curators of the original repository.
Alternatively, if they don't like your changes - you can simply keep your repository and use the code there instead (assuming the original repository had an open source license, of course 😉).
Remember that for most scientists, forks are likely not necessary when working on your own projects or very small teams.
However if you branch further into bioinformatics, and want to contribute documentation, typo fixes, or even code into existing tools - forks are likely your best friend.
::: {.callout-tip title="Task" appearance="simple"}
Other than further 'sandboxing' your changes, why else would you want to fork a code repository?
:::
::: {.callout-note collapse="true" title="Answer"}
Because you want to use the same original code base to make a new or changed tool for a different purpose outside that of the one of the original authors, and/or it doesn't make sense to include it in the original tool.
:::
## Summary
In this chapter, we have gone over the fundamental concepts of Git.
We've gone through setting up a GitHub account to allow passwordless interaction between the GitHub remote repository, and making a local copy on our machine with SSH keys.
Through the GitHub website interface we made a new repository and gone through the 6 basic commands we need for using Git:
1. git clone
2. git add
3. git status
4. git commit
5. git push
6. git pull
We finally covered how to work in collaboratively with:
- Branches: code 'sandboxes' within the same repository to prevent editing the 'main' branch
- Pull requests: how to propose changes from your branch (or fork) to your 'main' branch
- Forks: code 'sandbox' clones of entire repositories in a different user's space
As you continue to use Git and GitHub, always keep in mind the two questions:
- Why is using a version control software for tracking data and code important?
- How can using Git(Hub) help me to collaborate on group projects?
## (Optional) clean-up
Let's clean up our working directory by removing all the data and output from this chapter.
The command below will remove the `/<PATH>/<TO>/git-github` _as well as all of its contents_.
::: {.callout-tip}
## Pro Tip
Always be VERY careful when using `rm -r`.
Check 3x that the path you are specifying is exactly what you want to delete and nothing more before pressing ENTER!
:::
```bash
rm -r /<PATH>/<TO>/git-github*
```
Once deleted we can move elsewhere (e.g. `cd ~`).
We can also get out of the `conda` environment with.
```bash
conda deactivate
```
Then to delete the conda environment.
```bash
conda remove --name git-github --all -y
```
## References