This is all you need to know about Git.

Transplanted Post

After setting up this website I am gradually transferring all my previous blogs stored locally and in other blog system towards here. This is one of them. Thus, you should be aware that the date and time displayed on this blog is not accurate.

Expected Reader Experience: None

This article is good for readers having zero or very limited knowledge on the topic: Git.

About Git

Why use Git?

Many times we will need to control the versions of our files, especially in large projects where many members are cooperating. You can do a rudimentary version control by regular backup your files, but it is time-consuming, and often you cannot retrieve the exact version you want because of your backup period missed it! Git offers a systematic, efficient and fast solution for version control. In contrast to centralized version control tools like SVN, Git employs distributed version control.

“Distributed” Workflow

Distributed version control implies that all members of the team would have a copy of the project file on their computer (called local repository). There is no “central server” as in centralized version control, so to speak. Team members exchange changes among their own devices (For example, if member A and member B both are collaborating on the file apple.txt, they can use Git to push the changes to each other so both member would have the updated apple.txt with both changes in it) - and if one computer fails, just copy files from another.

In practice, however, the team would usually set up a central repository online (called remote repository), commit and pull changes from that central repository to and from their local repositories (instead of pull and push between their local computers). This central repository is set up only for the ease of exchanging files - if it fails, unlike in centralized version control, no data will be lost. An online central repository could allow members not in the same LAN to collaborate with each other; also, member A does not need to wait until member B to turn on her computer so as to push the changes to member B.

Git Setup

Install Git

Download and install Git from the official website.
After installing, 3 executables will appear: Git Bash, Git CMD and Git GUI. Git Bash and Git CMD are both terminals-like command windows and Git GUI is a GUI wrapper around them (read about their difference here). For most of the time we only use Git Bash to do everything by typing out commands in it.

Set up Environment

Start Git Bash, and execute:

git config --global user.name "Your Name"
git config --global user.email "email@example.com"

The git config sets configuration and preferences when using Git (Think of the Settings in other GUI applications). Clearly, these two lines set an identity for you. In version control, it is important to identify each change-maker as the person who commit a change, so that the team can know whose bugs this is. Note that this name and email does not really “log in” to anything or any platform. It is just two lines of text field that will come along with you and your committed changes, like an ID card.

The git config actually writes data into a local file on your computer that stores all your settings. The --global parameter specifies the level of control. It instructs that the configuration should be written into a a config file under the current computer user’s personal folder, so that the settings here you made in this command would apply to every repository this current user use in the future. The other level of control are --local (writes data to the configuration file in a repository, so the configuration apply to only one repository) and --system(writes data to the configuration file in Git installation directory so the configuration apply to all users of this computers).

Configurations set at lower levels will override those in higher levels. So, configurations set at local repository level overrides those in user level, which in turn overrides those in system level.

  • git config [control level] -e allows you to open those configuration files and edit them directly - it will have the same effects as setting those configurations with Git Bash commands. As said, control level can be [--local|--global|--system]. If left blank, it is --local by default.
  • git config [control level] -l allows you to see the content of the configuration files - you can check what configurations are already made.
  • git config [control level] --add <section>.<key> <value> adds a configuration.
  • git config [control level] --get <section>.<key> (or simply git config [control level] <section>.<key>) let you see a configuration. If the requested section does not exist, an error is thrown; if the requested key does not exist, nothing returns.
  • git config [control level] --unset <section>.<key> deletes a configuration.

Two important keys, other than user.name and user.email, that I think one should know, are core.editor and the alias section.

  • The editor controls what editor will be used by Git by default - set it to your favorite editor installation path, for example:
core.editor="D:\Portables\NPP_Portable_7.9.5\notepad++.exe" -multiInst -notabbar -nosession -noPlugin

You can also set editor by git config --global core.editor editor-path. If you are using NPP, use the 32-bit version; if not, you must type -multiInst -notabbar -nosession -noPlugin after the editor-path because the 64-bit plugins may cause problems.

  • The alias sections controls all shorthands for your git commands. See below.

Alias

This is really another type of configuration stored in the config file. Alias allows you to shorten commands.

  • git config [control level] alias.<short-name> <command> sets an alias
  • git config --global --unset alias.<short-name> removes an alias
  • You can also go to the configuration file (Still remember how? Check above) and add/delete alias directly, following the syntax that is already there (should be quite simple).
    Here are some common alias you may want to set:
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.unstage 'reset HEAD --'
git config --global alias.last 'log -1 HEAD'

Alias is not just capable of setting git command. To set a non-Git command, use ! to lead; to set a multiple line command, use & to concatenate. For example,
!cd ${GIT_PREFIX:-.} && start update.sh this will go into the project root directory and execute the update.sh there. !c:/windows/explorer will start a windows explorer.

Set up Local Directory

For Git to manage your directory you must let it know where it is. Direct (not sure how to? See my other [blog] for common CMD command) Git Bash to your project root directory (that is, the folder that contains all project files you want to do version control; if you do not have one, create one and put all your files into it), and type

git init

This initiates the current folder as an empty local Git repository - which means Git starts managing it. You will find that in your project root directory a .git folder is created. If you do not see it, it might be hidden. Check Google to see how to reveal hidden files. Folders that contains a .git subfolder is managed by Git. (Incidentally, the configuration file config for this local repository is also in the .git folder.)

  • git init initializes the current directory as a git directory.
    • git init <directory> initializes the specified directory.
    • Run git init multiple times will do nothing. Only the configuration from the first time is kept.

Now you have set up a Git local repository at your project directory - that is, you have set up the three-tree Git structure that manages version control for your project folder. The three trees are: working directory(working tree), staging index, and commit history.

  • The working tree is simply your computer folder, the folder you want to manage. When you make changes (adding, modifying, deleting files) in that folder, the working tree is updated automatically with the same change (unless you tell Git to ignore certain files explicitly - we will cover that later).
  • The staging index stores your changes that have been “staged” by using git add command. So, after you make a change to a file, the working tree is automatically updated but the staging index is not. You can select certain changes and add them to the staging index (if a change is not staged, in the staging index the file that is supposed to bear this change remains unchanged). After adding, as long as you do not “commit”, you can always remove them from the staging index.
  • The commit history contains permanent snapshot of your file status. After adding certain files into the staging index, you can use git commit to commit the current file status in the staging index. This is like backing-up a copy of the current version in commit history. Think of saving checkpoints in a long game, the numerous commits are your checkpoints in the development cycle. When in need, you can always checkout a certain commit version to the working tree so as to retrieve a previous version.

Use Git

Get help

git help <verb>
git <verb> --help
git <verb> -h
man git-<verb>

There are four ways to get help about a certain action. Use any when in doubt, for example git config -h will tell you how to use the git config command in a brief way. Of course, you could also check the official documentation.

Add, Commit Locally

Now, type

git add -A
git commit -m "Initialized a repo"

If you started from an empty folder, this two lines will do nothing; but if you started from a project folder with pre-existing files, these two lines add those files to the Git repository we just initialized (remember, we just initialized an empty repository). You can understand it as “backing-up” those files to Git.

If we then created another file, say test.txt, under our project folder, and re-run those two lines of commands, Git will again detect this change and do another commit to the local repository to update the version information. If we do not make any change and run those two lines, nothing will happen.

Let’s look closely. git add detects and “stages” changed files. When you do git commit, all staged changes are “committed”. You always need the stage-commit two steps to inform Git to update version record (or, “backup your files”) after making any change. Why two steps

  • git add -A stages all changed file.
    • You can also stage selected file(s) by git add <file-name> <file-name> ....
    • This accepts wildcards. Use git add *.<file-format> to stage all files with a specific format, git add <file-name-prefix>* to stage all file with a certain prefix in names, etc.
    • Use git add . to stage all new and modified files but not deleted files, git add -u to stage all deleted and modified files but not new files.
  • How to “unstage”?
    • Use git reset to unstage all files.
    • Use git reset <file-name> to unstage a specific file.
  • git commit -m "msg" write a message on this commit.
    • You can do both steps (add and commit) by adding -a argument to git commit. That is, do git commit -a -m "msg" to skip git add. This stages all modified files and commit them.
    • The -m signals a msg argument and the stuff in quotation marks are the actual message content. As a good practice, your message should describe what changes you have made in this commit. There are ways to omit this message argument but please do not do that.

Common Pitfall

Do not just omit "-m" argument - a Vim editor will pop out to ask you to enter the message.

  • Realize you have forgotten to add or remove something after committing? git add or git rm it first, and then git commit --amend to re-submit. You will have only one commit in the end instead of two; the second commit would replace the first one.
  • Have a typo in your commit message? Run git commit --amend -m "correct message".

Move, Delete, Rename

By right you can work without these - but they can speed up the process.

  • There is a separate command git rm <file-name> to both delete a file from the working directory (file system) and the staging index.
    • It has the same effect as deleting that file manually (or run rm <file-name>) in file explorer and then git add.
    • git rm <file-name> -cached will remove the file from the staging index but it will remain at the working directory.
    • If the file to be deleted has already been staged, you must use force deletion git rm -f <file-name>.
    • You can delete all files and directory in a parent directory by going into that directory and run git rm –r * (recursive deletion).
  • git mv <file_from> <file_to> would move or rename the file from both the working directory (file system) and the staging index.
    • It has the same effect as renaming a file in the file explorer (or run mv <file_from> <file_to>), and then git rm <file_from> followed by git add <file_to>.
    • If the new file name is already existing, use force rename git mv -f <file_from> <file_to>

Check status

Believe me, you will need to know the status quo of your project now and then.

  • git status tells you the current project status: what changes are detected, and out of these which are staged but not committed, and what are not even staged (not tracked)… You can have a better idea where you are in the project.
  • git log shows you this repository’s commit history. The displayed commit id are numerical ids calculated by Git that uniquely identify each commit.
    • Add -p or --stat to show changes made along with each commit.
    • Add --graph to show visual history of merging and branching.
    • Add --pretty=oneline argument to display one commit in one line, or -s to shorten. If you used -s, new files are marked by ??, staged files are marked by A and modified files are marked by M.
    • Add --reverse to display logs in the reverse order.
    • Add --<user-name> to display only commits made by a specific user.
    • Add --since --before --until --after to limit the time period at which commits are done. For example, --before={3.weeks.ago} --after={2010-04-18}
    • Add --no-merges to hide merged commits.
    • Check Viewing the Commit History for more options.
  • git blame <file-name> lists the editing history of a file.
  • git diff <file-name> shows you the latest change made on the selected file.
  • git reflog displays all commands you have made. I barely use this - but it turns out useful under certain desperate situations.
  • git clean can remove untracked files from the working tree.

Regret

One of the most important reasons why version control exists is to allow you to regret.

  • git checkout <file-name> let you discards all unstaged changes in the working tree, but it will not touch any change that has already been committed.
  • git reset [reset mode] commit-id let you go back to a previous commit (last version) - this can revert all changes since that commit.
    • Find your commit-id by git log, or, just use HEAD^ for the one commit back, HEAD^^ for two commits back, HEAD^3 for three, etc. (If you want to move back forward in commits after moving back, you will not see your desired id with git log; in this case, use git reflog.)
      • HEAD points to the current branch, and the current branch points to a specific commit.
    • reset mode can be [--soft | --mixed | --hard]. If left empty, it is --mixed by default. See their difference here or here. For most of the case, I simply want to go back in time and thus I use --hard (it will simply discard all changes since that previous commit, reset your working tree to that previous commit and leave no staged change in staging index).
  • People always forget - but any wrongly deleted file can also be reverted in the same way! Git does not only revert modification, it can revert deletion.
  • Incidentally - do you not think checkout is too long? Remember alias? Run git config --global alias.co checkout and from now on you can replace checkout with co every time you want to use it.

Common Pitfall

Sometimes you will see Git commands with double dashes -- before their <file-name> parameter. For example, git checkout -- <file-name> instead of just git checkout <file-name>. The meaning of this please refer to here, here and here.

Remote Repository

Set up Remote Directory

As previously said, usually we set up a remote directory online for easy exchange of changes. The standard solution is GitHub, a free platform for hosting Git remote repository - this saves you the hassle of building a git server by your own. Register a Github account first, and then you can get remote repositories for free. (Free users can only create public repositories - that means your project is publicly visible; to create private repositories, you must pay for GitHub Pro; students may get GitHub Pro for free; another option is to build a Git server of your own - but for personal users that probably cost more than a GitHub Pro…)
Register a GitHub account and create a repository (name it properly to avoid confusion). If you need guidance on these, check GitHub Documentation.

Now two things need to be done:

  • Link your local repository to the remote repository. This can be done by git remote add remote-alias remote-url. The remote-alias is a name you give to this remote repository (usually we use origin, but you can use any name), and the remote-url is the link to this repository. This link can be obtained by clicking “Clone” on your GitHub repository page. There are two options - HTTPS or SSH. If you are the repository owner, use SSL, or else use HTTPS.
  • Set your remote repository to allow read/write from your local repository. This is done by add a SSH key to your GitHub account - for detailed steps, see GitHub Documentation. If you do not do this you cannot push anything from local to remote - it will be rejected.

Once the two things are done you basically connected your local and remote repository.

  • You can check all added remote repository by git remote -v.
  • You can remove any added remote repository by git remote remove <remote-alias> (this does not delete the GitHub repository; it only disconnect the local and the remote repository).
  • You can rename a remote repository by git remote rename <original-name> <new-name>.
  • Local and remote repository need not to be one-to-one; for example, you can add multiple remote repositories to a local repositories (just make sure you push to each of them).

Push

Now, every time you committed some change, you can do git push -u <remote-alias> <branch-name> to push this commit to the remote repository at GitHub, your teammates can then get changes from there.

  • git push commands allows you to push the local commit history to the remote repository after you committed all changes. The -u is to link the current local branch to the remote branch with the branch-name, so the second time onwards you only need to enter git push <remote-alias> <branch-name> to push the local commits on current local branch to the linked remote branch. All repository will by default have one branch call master, so you probably want to use git push -u origin master if you have not modified any branch. We will talk about branching later.
  • Note that Git actually silently creates a remote branch with the same name as the local branch, and what we do is to push content from the local branch to that remote branch. If we want our target remote branch to have a different name, we can use git push <remote-alias> <local-branch-name>:<remote-branch-name>. Of course, if that remote branch with our desired name is already created, we just do git push <remote-alias> <desired-remote-branch-name>.

Conflict, Pull, Fetch, Merge

Best Practice

Do not modify the same file at the same time from two branches.

Sometimes, when you and your teammates modified the same document and both want to push to the remote repository, there will be conflicts. When this happens, the push action on one side would be rejected and she must resolve the conflict before pushing.

  • Let your copy override your teammate’s copy. In this case just force push by git push <remote-alias> <branch-name> -f. This results in the loss of your teammate’s changes and you probably do not want to do that.
  • Instead, the common thing to do is to take in your teammate’s change first in your local repository and then push. To do so, use git fetch to download all remote changes (this does not modify your local files or staging index) and if you are fine with all the changes, use git merge to merge these changes to your local files (this will automatically creates a new commit that incorporates all changes from the remote directory). To do this in one step, use git pull.
  • If there are true conflicts - i.e., you and your teammate modified the same part of the same document, git merge or git pull cannot incorporate the changes from the other side automatically, and will throw a merge conflict.
    • In such a state, The HEAD pointer stays the same, and The MERGE_HEAD ref is set to point to the other branch head; All files without conflicts are staged (but not committed) and all files with conflicts are not staged. The working tree files with conflicts are updated to reflect the result of the merge conflicts using the familiar conflict markers <<< === >>>. See how conflicts are presented.
    • You can check the conflicts by git mergetool, git diff or git log --merge -p <path> to see the difference. To check a file content: git show :1:<filename> shows the common ancestor, git show :2:<filename> shows the HEAD version, and git show :3:<filename> shows the MERGE_HEAD version.
    • If you want to let your teammate’s changes override your changes, do git checkout --theirs <file-name> and then git add <file-name>. Still you can use . in place of the file-name to resolve all conflicts using your teammate’s copy. To do the reverse, change --theirs to --ours.
    • If you want a bit from both side, you have to go to the conflicted file, open it and adjust its content manually. After that, git add and git commit.
    • To avoid going into such a merge-conflicted state in the first place, you can set the merge strategy when doing merge/pull: git merge --strategy-option theirs OR git pull --strategy-option theirs. You can shorten the --strategy-option to -X.
  • There are two commands that might be useful.
    • git merge --abort to abort a merge when in a conflicted state (it really does a git reset --merge but will not run when not merging). You usually want to use this when a merge gets into a complex conflict.
    • git merge --continue to commit changes after resolving all conflicts (it really does a git commit but it will not run when not merging).
    • You can use these two commands in replacement of git reset and git commit during a merge, see here.

Clone

Following the previous steps, we started from a local repository, created another remote repository and established a connection between the two. There is really a simpler way to achieve the same effect: to start from a remote repository.
Create a repository in GitHub and get its SSH link, direct Git Bash to an empty folder in your file system locally (you probably want to create a dedicated folder for this), and then type in git clone <repo-link>. The files (if any) would be downloaded from GitHub remote repository into that folder, a local Git repository will be initialized automatically at that folder, a local branch with the same name as the remote branch will be created and it will be linked to the remote branch - all done.

  • git clone <repo-link> <local-directory> can clone the remote repository to the specified local directory (if not specified, clone to the current directory by default)
  • git clone -o remote-alias clones a remote repository while setting an alias for it (by default it is origin).
  • To add other tracking relation between the current local branch and remote branches, use git checkout --track <remote-alias>/<remote-branch-name>.

Common Pitfall

You do not need to do git init before git clone. git clone does everything for you.

Branch Management

How to Use Branch

Why bother branching?

Branching is inevitable. When we set up a remote repository, we are already using branching - one local branch and one remote branch. When multiple team members cloned a remote repository to their respective computers, their local branches are created. They need to constantly track the remote branch, pull from it and merge their work into the remote branch.

Moreover, we will open a local branch every time a new feature request or bug pops out. We will maintain a stable branch master locally, and when bug or features are calling, the person in charge will branches out from the stable branches and do her own work on that separate branch dev, leaving the stable branch intact; after she finishes, she merge back her work into master and delete dev. This provides extra security to the work we have already done.

There is a good illustration (in Chinese) of this workflow here and here. Remember: HEAD points to the current branch, and the current branch points to a specific commit.

Make, Switch, Delete

Use the commands below to manage branches and working at different branches:

  • git branch <branch-name> make a branch that branches out from the current commit. You will not be automatically switched to that branch.
  • git checkout <branch-name> OR git switch -c <branch-name> switch to the branch specified by the name.
    • git switch master could directly switch back to the master branch.
  • git checkout -b <branch-name> does the previous two steps in one go: create and switch to a branch.
    • If this command is used on a remote branch (that is, git checkout -b <remote-alias>/<remote-branch-name>), it will create a local branch with the same name as the remote branch and set up their tracking relation and also switch to that local branch.
      • In fact, the -b here can be omitted in this situation.
      • If want to customize the name of the created local branch, add the name as the second last argument: git checkout -b <local-branch-name> <remote-alias>/<remote-branch-name>.
  • git branch -d <branch-name> deletes a branch by name.
    • If a branch is not fully merged but you do want to delete it, replace -d with -D to force delete.
    • git push <remote-alias> --delete <branch-name> deletes a remote branch.
  • git branch lists all branches.
    • git branch -v lists all branches with their last commit.
    • git branch --merged list all branches that have been merged into the current branch, and git branch --no-merged does the reverse.
    • git branch -vv lists all tracking branches.
  • git merge <branch-name> merge the branch specified by the name to the current branch.
    • By default, git uses fast-forward strategy to merge (simply move the pointer of the old branch to the same place as the new branch). Deleting any of the branch results in all information about that branch being lost. To disable fast-forward, add --no-ff when merging, which forces Git to create a new commit after merging (even though there is no conflict). For more about merge strategy, see Merge Strategies.
    • When conflicts arise, refer to the strategy above to resolve them.

Other Useful Topics

You can stop reading now. You have all Git knowledge for daily use. The topics below are not used as frequently but might turn out useful under certain messy situations. Completing the topics below gives you an edge over other Git beginners.

Submodules

Often, scenarios arise that a project needs another project as its part - do not just copy-paste another projects over entirely! See the elegant way to manage that using submodules.

Rebasing

A merge from C1 to C2 will consider their common ancestor C0 and the two descendants C1 and C2 and combine all of them. A rebase will have the same end effect, but with a cleaner branch history. Check the Rebasing chapter for a detailed explanation.

Stashing and Cleaning

You must have experienced the situation where when you are working on a feature and your boss request you to fix an urgent bug. Now what? Commit the unfinished feature half-way? Make yet another branch? They will work, but Stashing provides an easier solution.

.gitkeep and .gitignore

Add a .gitkeep file to an empty directory so that Git will manage that (Git does not manage completely empty directory). The .gitignore file lists files that should be ignored by Git. Create a .gitignore file (usually it is created for you) at the project root directory to specify files that Git should ignore in that directory. For the syntax, see Ignoring Files.

Tagging

Sometimes we attach tags to our commits to the convenience of categorization, search and management. This Chapter has an excellent explanation on tagging.

Other Git servers

There are other hosting website other than GitHub such as Gitee. Get to know them and their specific features. Tired of using a third-party server? You can learn how to set up your own Git server here or here (in Chinese).

Using Git GUI

After you get familiar with Git command-line operations you can consider switching to a GUI wrapper for Git, like SourceTree.

Reference