List of commands & case studies you may find yourselF dealing with.
- Git, GitHub and GitLab
- How to install Git
- The HEAD the Index and the working directory
- How to change your email and username
- How to change the
AUTHOR/COMMITTER_NAME
for all your previous commits - How to delete your commits history?
- How to write your commit texts
- How to unstage/delete changes made locally
- Cloning the project
- Clone, venv and requirements.txt installation
- How to check the commits history
- Branch
main
vs.master
- Getting the current status
- Adding changes
- Commit changes
- Undo commits
- Ignoring some files:
.gitignore
.gitignores
vs..gitattributes
- Push to the server
- How to force push
- Pulling from server
- git pull vs. git fetch
- Dealing with errors
- Password no longer accepted
- Saving your token on your Mac
- How to push large files
- How to delete the cache from another git repository
- CI/CD with GitHub Actions
- Tagging
- How to add SSH key
- Git hooks
- Git flow vs. GitHub flow
- Git head and base
- Git merge vs. rebase
- GitHub CLI
- Deploy documentation on GitHub pages
- remote vs. origin
- ssh-vs-http protocols
- git show
- git diff
- How to write a git commit message
- Staging files partially
- git stash
- git blame
- GIT stands for Global Information Tracker. - GitHub and GitLab are remote server repositories based on GIT.
- GitHub is a collaboration platform that helps review and manage codes remotely.
- GitLab is the same but is majorly focused on DevOps and CI/CD.
- If you have a conda virtual environment:
conda install git
- On MacOS, if you are getting this error
xcrun: error: invalid active developer path
, see the discussion here.
- The HEAD keeps track of the last commit snapshot, next parent
- The Index keeps track of the proposed next commit snapshot
- The Working Directory keeps track of your changes, essentially a sandbox
- Your name and email address are configured automatically based on your username and hostname. If this is not done you can always overwrite it with:
git config --global user.name <your_name>
git config --global user.email <your_email_address>
- To change this defaul option, run the following command and follow the instructions in your editor to edit your configuration file:
git config --global --edit
- After doing this, you may fix the identity used for this commit with:
git commit --amend --reset-author
- There are cases where youstart working on your another PC, you push the commits and you then realise you fortgot to update your credentials. If do this what is shown in the remote server is not what you generally see but what git used as a default. This answer on stackoverflow how to ammend this:
- Suppose you want to change the
GIT_AUTHOR_NAME
then run this (-f
is the forced option):
git filter-branch --commit-filter \
'if [ "$GIT_AUTHOR_NAME" = "your_old_author_name" ]; then \
export GIT_AUTHOR_NAME="your_new_name";\
export GIT_AUTHOR_EMAIL=your_email_addressi@gmail.com;\
export GIT_COMMITTER_NAME="your_new_name";\
export GIT_COMMITTER_EMAIL=your_new_name@gmail.com;\
fi;\
git commit-tree "$@"' -f
- Alternative you can also try this (contrary to the one above there is no if statement it will update everything with no distinction):
git filter-branch --commit-filter 'export GIT_COMMITTER_NAME="your_new_commiter_name"; \
export GIT_AUTHOR_EMAIL=your_email_address@gmail.com; git commit-tree "$@"'
- Once this is done you want to update the remote server:
git push --all origin --force
- Say you have some sensitive information on your commits history and you'd like to remove them all.
- One option would be to delete the
.git
folder but this may cause problems in your git repository. If you want to delete all your commit history but keep the code in its current state, it is very safe to do it as in the following. - Find our which branch you are on (generally either
main
ormaster
):git rev-parse --abbrev-ref HEAD
. Then change the code below accordingly on the branch you are on. - Checkout:
git checkout --orphan latest_branch
- Add all the files (move it the stage area):
git add -A
- (Alternative you can move a single/folder file):
git add <path_to_specific_file_or_folder>
- Commit the changes:
git commit -am "commit message"
- Delete the branch:
git branch -D main
- Rename the current branch to main:
git branch -m main
- Finally, force update your repository:
git push -f origin main
- Use templates for better Git commit messages: a casa stade from Autotrader
- Here is the template written by Tim Pope:
Capitalized, short (50 chars or less) summary
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body. The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase can get confused if you run the
two together.
Write your commit message in the imperative: “Fix bug” and not “Fixed bug”
or “Fixes bug.” This convention matches up with commit messages generated
by commands like git merge and git revert.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, followed by a
single space, with blank lines in between, but conventions vary here
- Use a hanging indent
- Before we make a commit, we must tell Git what files we want to commit (new untracked files, modified files, or deleted files). This is called staging and uses the add command.
- To unstage a file that has been staged but keep the modifications in the working directory:
git restore --staged <your_file_path>
- Alternatively, you can simply delete any modification and restore the previous commit:
git restore <your_file_path>
Restore can be dangerous: anything committed in Git can be recovered in one way or another, but restoring a file will delete the modifications forever. - How to remove the staged files but keep the files:
git rm -r --cached <name of the file to be removed>
- This is the case where the repository is created first on GitHub.
# Clone with HTTPS
git clone https://gitlab.com/*******
# Clone with SSH
git clone git@gitlab.com:*******
- This is how to start using a git project:
$ git clone https://github.com/username/project_name.git
$ cd project_name
$ virtualenv -p python3 my_venv_name
$ source my_venv_name/bin/activate
$(my_venv_name) pip install -r requirements.txt
git log
: gets all the commit historygit log -3
: gets the last 3 commit messagesgit log --oneline
: shorthand for --pretty=oneline --abbrev-commit used together.git log --graph
: displays an ASCII graph of the branch and merge history beside the log output.git log --p
: shows each commit’s diff, just likegit show
git log --oneline
: shows each commit in a condensed format on a single linegit log --oneline -- <file name>
: if you are interested in a single file commit history
- Creating a branch is useful because it allows us to freely experiment (generally keep a different model of the code) withouth touching the original code. At a later stage, you can always merged what you have done in the branch version into the main one.
- Once you clone the project to your local machine, you only have the main branch.
- You should make all the changes on a new branch that can be created using the git branch command.
- Your branch is the copy of the master branch/main.
- Creating a new branch does not mean that you are working on the new branch. You need to switch to that branch.
- WARNING!
git switch
is very similar togit checkout
to the point that they do effectively the same thing. See this discussion here.- Create a branch:
git branch mybranch
- Switch to a new branch:
git switch mybranch
- To push into this branch you need to set the upstream branch with:
git push --set-upstream origin <name_of_new_branch>
. Why do we need seto an upstream branch - Show all the local branches of your repo. The starred branch is your current branch:
git branch
orgit branch --show-current
- Show all the remote branches:
git branch -r
- Show all the remote branches with latest commit:
git branch -rv
- Show all remote and local branches with:
git branch -a
- Show all new remote banches: first
git fetch
, then see if they are visibilegit branch -r
, ultimatelygit checkout -t origin/new_remote_branch
, where the flagt
stands for track. - If you want to delete your new branch locally:
git branch --delete <branch_name>
. You can also force delete with:git branch -D <branch_name>
It should be noted that when you delete a local Git branch, the corresponding remote branch in a repository like GitHub or GitLab remains alive and active. Further steps must be taken to delete remote branches.
- Create a branch:
- The computer industry's use of the terms master and slave was considered no longer appropriate after the eventis that took place in the summer of 2020.
- Amid the many protests and the growing social unrest, these harmful and antiquated terms were no longer considered appropriate.
- The option is not a default one, GitHub encourage this but does not ban the use.
- To know what branch you are workign on
- To show what changes (if any) have been staged:
git status
- This stepp is different from the commit phase.
- When you make changes in the code, the branch you work on becomes different from the master branch. These changes are not visible in the master branch unless you take a series of actions.
- The first action is the git add command. This command adds the changes to what is called the staging area.
# Adding everything in the current directoru
git add .
# Adding a specific file
git add <name of your specific file>
# Add manually a folder which is NOT shown in the list of commitments
git add —all <name of your folder>
- It is not enough to add your updated files or scripts to the staging area. You also need to “commit” these changes using the git commit command.
- The important part of the git commit command is the message part. It briefly explains what has been changed or the purpose of this change.
- There is not a strict set of rules to write commit messages.
# Commit plus add a message
git commit -m <add here your message>
# Alternativaly you can use this, which will bring you to current status
# of what changes, you can add a message (via vim for instance) there and save the file
git commit -a
git revert <commit>
Git revert undoes the changes back to a specific commit and adds it as a new commit, keeping the log intact. To revert, you need to provide a hash of a specific commit.git reset <commit>
You can also undo changes by using the reset command. It reset the changes back to a specific commit, discarding all commits made after. Note: Using reset command is discouraged as it modifies your git log history.
- There may be situations where:
- You do not want to share a specific files because of confidentiality reason
- The file is too big and Git-based remote server are not meant for this.
- There are a lot not useful files such as
.pyc
and.so
. - Create a hidden file named: .gitignore where the “.” is important:
vim .gitignore
- Tell Git where the
.gitingore
file is and how to add files:git config --global core.excludesfile '~/.gitignore'
- Add a new file type to gitignore:
echo '.ipynb_checkpoints' >> .gitignore
- This is an example:
# Some of my current configuration of .gitignore
*.ipynb_checkpoints
*.pyc
*.pyo
__pycache__/
.gitignore
ignores untracked files, those that haven't been added with git add. Essentially it tells git that by default it shouldn't pay attention to untracked files at a given path..gitattributes
are for tracked files.- Thais situation is usueful when we want to ignore all PDF fiel but one. That one will be processed with
.gitattributes
and two other could be ignored asper the.gitignore
. - If you put *.pdf in .gitignore, and also use .gitattributes to set up *.pdf with the attributes for LFS tracking, then:
- By default, an untracked PDF file will be ignored by git.
- To add a new PDF file to the index, you would override the ignore rule with
git add -f
- Once a PDF file exists at a specific path, that path is no longer governed by the ignore rule
- Any PDF file you do add will be managed by LFS per the .gitattributes
- Any PDF file already in the repo (which would be unaffected by the ignore rule) should be managed by LFS, though if it was committed before the .gitattributes entry it may not be.
- Pushing your code will save your changes in a remote branch = main branch
- If you were not on a branch this will still work.
- After your branch is pushed, A merge request is asking the maintainer of the project to “merge” your code to the main branch. The maintainer will first review your code. If the changes are OK, your code will be merged.
# This will push the changes to the server
git push
# If you are not working in a branch
git push origin main
- If you are wonder what the
origin
stands for - it is the default name for the remote repository from which you cloned your repository. - Suppose the file you want to update in git but you made some small changes locally that you do NOT want to keep anymore.
- If you use
git pull
it will throw you an error saying: `Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded'.
# Discard local changes to all files, permanently
git reset –hard
# Then you can use “git pull” to update your local directory
git pull
- You have two options here: squashing or filter-branch. The latter seems to be better because it does not mess with the entire history. Ref
- Locally delete or modify (reduce) large files.
- Commit the local deletes.
- Soft reset back X number of commits:
git reset --soft HEAD~X
. - Then recommit all the changes together (AKA squash)
git commit -m "New message for the combined commit"
- Push squashed commit.
- If you get an error like this:
Git push failed, "Non-fast forward updates were rejected"
. Then you can usegit push --force
to force push your changes to the server. You need to push about what you are doing and this may not be ideal when working in large projects where others are involved. Ref
- While you are working on a task in your local branch, there might be some changes in the remote branch. The
git pull
command is used for making your local branch up to date. You should use the git pull command to update your local working directory with the latest files in the remote branch. - By default, the pull command fetches the changes and merges them with the current branch. To rebase, instead of merge, you can add the
--rebase
flag before the remote name and branch:git pull --rebase origin master
- Pulling a specific branch:
git pull origin <branch_name>
- There are 3 importants locations in a git installtion:
- The working directory is where a developer actively edits and updates files that Git tracks.
- The local Git repository is where the history of all commits across all branches are maintained.
- The remote repository is the place where your code is stored.
- The key difference between
git fetch
andgit pull
is that git pull copies changes from a remote repository directly into your working directory, while git fetch does not. The git fetch command only copies changes into your local Git repo. The git pull command does both. - Git will not update your local clone of the remote, to keep updated or synchronised you have to run
git fetch
. Doing a fetch won’t affect your local branches, so it’s one of the safest git commands you can run.
- If it throws an error saying there is another process live. Try: zrm -f .git/index.lockz
- If
git push
hangs after Total, try to increase the maximum size of the buffer with as suggested here:git config --global http.postBuffer 157286400
then try to push again.
- From August 13, 2021, GitHub is no longer accepting account passwords when authenticating Git operations. You need to add a PAT (Personal Access Token) instead, and you can follow the below method to add a PAT on your system.
- From your GitHub account, go to Settings => Developer Settings => Personal Access Token => Generate New Token (Give your password) => Fillup the form => click Generate token => Copy the generated Token, it will be something like ghp_sFhFsSHhTzMDreGRLjmks4Tzuzgthdvfsrta
- Click on the Spotlight icon (magnifying glass) on the right side of the menu bar. Type Keychain access then press the Enter key to launch the app => In Keychain Access, search for github.com => Find the internet password entry for github.com => Edit or delete the entry accordingly => You are done. However, once you use this taken once, when you are prompted for the password this step is not generally required.
- Unset any previous stored credentials you are using with:
git config --global --unset credential.helper
- Tell Git you want to store credentials in the osxkeychain by running the following:
git config --global credential.helper osxkeychain
. - Navigate to your home directtory and locate your
.gitconfigure
withls -a
and check thathelper = osxkeychain
is there. - Now interact with Git as you normally would:
git pull <your_ssh_repository>
, put your username and token when prompted. Your token and user name will be automatically saved in keychain. - You can remove an existing password or token stored in the osxkeychain using the following command:
git credential-osxkeychain erase
- See this refence for more.
- GitHub limits the size of files allowed in repositories. To track files beyond this limit, you can use Git Large File Storage.
- In order to use Git LFS, you'll need to download and install a new program that's separate from Git.
- You can read more about it here
- Suppose you have clone a folder inside one of your git repository. If you try to push git will throw you this error message
You've added another git repository inside your current repository.
You now have two options:- Option #1: navigate to the clones repositoy and delete all the hidden files that start wiht
.git
- Option #2:
git rm --cached <path_of_clone_folder>
- Option #1: navigate to the clones repositoy and delete all the hidden files that start wiht
- How to build a CI/CD pipeline with GitHub Actions in four simple steps
- How to Build MLOps Pipelines with GitHub Actions [Step by Step Guide]
- Listing the existing tags in Git is straightforward:
git tag
- Listing the existing tags with a particular patter:
git tag -l "v1.8.5*"
- Create a tag along with a commit message:
git tag -a v1.4 -m "my version 1.4"
- See more here
- Git Hooks are a built-in feature used to automate tasks/enforce policies.
Pre-push
hooks can be used to prevent pushes to certain branches or run additional tests before pushing.Post-merge
hooks can be used to perform actions after a merge is completed, such as updating dependencies or generating documentation. These hook scripts are only limited by a developer's imagination.- Create directory with
mkdir .git/hooks
. The scripts should be named after the Git Hook event they correspond to (e.g., pre-commit, pre-push, post-merge) and have the appropriate permissionschmod +x
. Git will automatically execute them at the corresponding events.
- The idea is to check all my staged files (
git add file_to_stage
). One of the most common check is formatting and compliance against PEP8. The former is done byblack
while the latter is done byflake8
. If everything passes, the commit is made. If not, then you are requiered to manually ammend the code. Pre-commit are a kind of git hooks.
- Install pre-commit with:
pip install pre-commit
and check installation with:pre-commit --version
- Add pre-commit to
requirements.txt
(orrequirements-dev.txt
if you have one) - Create a file named
.pre-commit-config.yaml
and fill it with actions you'd like to perform. Template - Configure
black
inside the filepyproject.toml
. Template - Configure
Flake8
so it works nicely withblack
in the.flake8
configuration file. Template - Run
pre-commit install
to set up the git hook scripts - Now
pre-commit
will run automatically ongit commit
- If you wish to run before manually before commit:
pre-commit run -a
or equivalently:pre-commit run --all-files
if this is an existing project.
GitHub flow
is used when there are no releases for the application and there is regular deployment to production.Git flow
is more complex than GitHub flow, and is used when the software/application has the concept of the release. When more than one developer is working on the same feature.
- The head is the branch which you are on; that is, the branch with your changes. To show where the head is pointing to use:
git show
and you should be able to see something like this:
commit 541e622e233b664fe5eb2753bf647a9eb0ef678f (HEAD -> main, origin/main, origin/HEAD)
- The base is the branch off which these changes are based.
- Merge allows you to combine all changes of a different branch into the
main
current. - However, consider the scenation where you bronch out and at the same time another developed branched out and merged. Now you'd like to keep your commit history clean and start from the newest merged branched. This has the advantage of keeping your commit history clean which becomes super important when you try to investigate bugs.
- Rebase allows you to use another branch as the new base for your work. Essentially is like saying: "Hey, I want to base my changes on what everybody has already done". Re-comitting all commits of the current branch onto a different base commit. Rebase is a destructive operation. That means, if you do not apply it correctly, you could lose committed work and/or break the consistency of other developer's repositories.
- Reverting (as in undoing) a rebase is considerably difficult and/or impossible (if the rebase had conflicts) compared to reverting a merge. If you think there is a chance you will want to revert then use merge.
- Consider the case where you have two branches
main
andfeature1
wheremain
is the base offeature1
. To add this feature to yourmain
branch:git checkout main
andgit merge feature1
. But this way a new dummy commit is added. If you want to avoid spaghetti-history you can rebase:git checkout feature1
andgit rebase main
; then you can:git checkout main
andgit merge feature1
.
- If on Mac install it with:
brew install gh
- Save your token with:
gh auth login
- Add your recent changes:
git add <name_of_changed_file>
- Add a commit commend:
git commit -m "new_change"
- Push:
git push
- Create a pull request where by providing the base and branch:
gh pr create -B main -H <branch_name> -f
- Merge request with autofil:
gh pr merge --admin -m
- Install
mkdocs
with:pip install mkdocs
- Create a folder called
docs
in your repository and place there all the.md
files there. - Create a
mkdocs.ymal
on your project root directory. - Build doctumentation with:
mkdocs buil
- Build doctumentation with:
mkdocs serve
and you can have a look at the locally deployed docs. - When you are satisfied with it, you can then deploy it to Github pages with:
mkdocs gh-deploy
. This will create an additional branch calledgh-pages
and your docs should then be available.
- When you clone a repository with git clone, it automatically creates a remote connection called origin pointing back to the cloned repository.
- This is useful for developers creating a local copy of a central repository, since it provides an easy way to pull upstream changes or publish local commits.
- This behavior is also why most Git-based projects call their central repository origin.
- Git supports many ways to reference a remote repository. Two of the easiest ways to access a remote repo are:
- SSH can compress and transfer data more efficiently than HTTPS, which can improve the speed and bandwidth of your git operations. In addition, SSH can use stronger encryption algorithms and key lengths than HTTPS, reducing the risk of interception or compromise of your data.
- If you look inside
.git/
you will notice a folder calledobject
. This is where git stores your changeds as an object. Let's you commit a change and you'd like to see if that has been stored correctly, you can do:$ git show d03e2425cf1c33616e12cb430c69aaa6cc08ff84
- Who is going to remember this long number, you can omit most of it as long as the first letter you use refer unambiguously to a commit as in:
git show d03e2
- Is there a better way? Yes! With no arguments, git show shows the most recent commit, but
git show HEAD
does the exact same thing.
git diff
compares unstaged changes.git diff --cached
compares staged changes.
- A bad commit message might clearly describe exactly what the commit does and yet fail to explain the commit. Subtle but important. One why to remember this is to explain what you have changed from a high level and the proceed to explain the why.
Use .equals() instead of == for strings
Replace uses of a == b, when a and b are strings, with a.equals(b) in modules foo, bar, and baz.
vs.
Fix false-negative string comparisons
It turns out that == in Java compares pointers instead of object contents. This has been causing some strings to compare unequal to other strings with identical contents, resulting in intermittent failed logins attempts as well as other issues. Fix the problem by calling .equals() instead.
At a bare minimum we have:
- Commit message begins with a title on its own line and no full stop at the end, followed optionally by a blank line and a longer description.
- Commit message tile is <= 50 characters and uses an an imperative tone: “Fix foo” instead of “Fixed foo”.
- The body is line-wrapped at 72 characters and organized into paragraphs
- The rule is one commit per changes, but what do you do when once change required multiple changes? Doesn’t
git add <file>
force you to put everythong in a single commit? git add -p
can come to the rescure! Unlikegit add
it lets you partially stage a file. When invoked, it displays each block of modified lines (called a hunk) in sequence and asks you whether or not to stage it10. As usual, a subsequent git commit will include only the staged hunks,
git stash
saves changes from the working tree to a more permanent location calledstash
which does not become part of the Git history. It is intended to store in-progress or experimental changes that you want to temporarily remove from the working tree.git stash
pop restores a stash entry to the working tree.
git blame
shows which commit most recently changed each line of the given file.- It's useful when you want to know who’s responsible for a specific piece of code.
- https://www.kdnuggets.com/2021/10/8-git-commands-data-scientists.html
- https://www.upgrad.com/blog/github-vs-gitlab-difference-between-github-and-gitlab/ - https://chryswoods.com/beginning_git/README.html
- Official GitHub Docs
- How to install Git on Windows
- Quick guide of what VC is and how to use GIT
- Use templates for better Git commit messages
- Git cheatsheet
- Why GitHub renamed its master branch to main
- Oh Shit, Git!?!
- What are GitHooks
- Git Flow vs. Github Flow
- Automate Python workflow using pre-commits: black and flake8
- gitignore vs. gitattributes
- About workflow
- When do you use git rebase instead of git merge
- Git pull vs fetch: What's the difference?
- GitHub official command line
- Deploy your docs
- git remote
- is your branch up-to-date?
- Lecture notes on version control