Why use a Version Control System?
Overview
Teaching: 10 min
Exercises: 0 minQuestions
What is version control software?
Why should I use it?
Objectives
Explain what version control software does
Describe the advantages of using version control
State that Git is an example of version control software
The Essence of Version Control
- A system for managing your work (not necessarily just code) which records snapshots of the current state of a set of files
- Provides a historical record for your project
- Reports “diffs” that describe the file changes between snapshots
- Implements branching:
- Allows working on several different features at the same time and switching between them whilst also maintaining a working copy of the code
- Different people can work on the same code/project without interfering with each other
- You can experiment with an idea and discard it if it turns out to be bad
- Implements merging:
- The opposite of branching
- Combines different branches together
What Can Go Wrong Without Version Control
Consider the following directory listing. This is a common situation that can occur when working without version control (in fact it’s probably the best case scenario).
mylib-1.2.4_18.3.07.tgz somecode_CP_10.8.07.tgz
mylib-1.2.4_27.7.07.tgz somecode_CP_17.5.07.tgz
mylib-1.2.4_29.4.08.tgz somecode_CP_23.8.07_final.tgz
mylib-1.2.4_6.10.07.tgz somecode_CP_24.5.07.tgz
mylib-1.2.5_23.4.08.tgz somecode_CP_25.5.07.tgz
mylib-1.2.5_25.5.07.tgz somecode_CP_29.5.07.tgz
mylib-1.2.5_6.6.07.tgz somecode_CP_30.5.07.tgz
mylib-1.2.5_bexc.tgz somecode_CP_6.10.07.tgz
mylib-1.2.5_d0.tgz somecode_CP_6.6.07.tgz
mylib-1.3.0_4.4.08.tgz somecode_CP_8.6.07.tgz
mylib-1.3.1_4.4.08.tgz somecode_KT.tgz
mylib-1.3.2_22.4.08.tgz somecode_PI1_2007.tgz
mylib-1.3.2_4.4.08.tgz somecode_PI_2007.tgz
mylib-1.3.2_5.4.08.tgz somecode_PI2_2007.tgz
mylib-1.3.3_1.5.08.tgz somecode_PI_CP_18.3.07.tgz
mylib-1.3.3_20.5.08.tgz somecode_11.5.08.tgz
mylib-1.3.3_tstrm_27.6.08.tgz somecode_15.4.08.tgz
mylib-1.3.3_wk_10.8.08.tgz somecode_17.6.09_unfinished.tgz
mylib-1.3.3_wk_11.8.08.tgz somecode_19.7.09.tgz
mylib-1.3.3_wk_13.8.08.tgz somecode-20.7.09.tgz
...
The trouble with this way of working:
- Lots of manual work to manage these files
- Names are uninformative
- Not clear which versions of
mylib
andsomecode
are compatible - Difficult to find changes between versions
Mistakes Happen
Without recorded snapshots you cannot:
- Undo mistakes and go back to a working version of your code
- Find out when a mistake was made and which results it may affect
- You might not even be able to tell what your mistake was (“It was working yesterday…“)
Working on different things
- For example new features and bug fixes, but you also want to use the current code for ongoing analysis
- Usually leads to multiple different copies of the code
- Copies need to be combined back together - but this often doesn’t happen
Collaboration
- “I will just finish my work and then you can start with your changes.”
- “Can you please send me the latest version?”
- “Where is the latest version?”
- “Which version are you using?”
- “Which version have the authors used in the paper I am trying to reproduce?”
Reproducibility
- How do you indicate which version of your code you have used in your paper?
- When you find a bug, how do you know when precisely this bug was introduced? (Are published results affected? do you need to inform collaborators or users of your code?)
What about Dropbox or Google Drive?
Using a system like this solves some but not all of the issues above:
- Document/code is in one place, no need to email snapshots.
- How can you use an old version? Possible to get old versions but in a much less useful way - snapshots of files, not directories.
- What if you want to work on multiple versions at the same time? Do you make a copy? How do you merge copies?
- What if you don’t have internet?
Git - A Version Control System
The materials of this course focus on teaching the version control software (VCS) Git. Whilst there are many different implementations of VCS, Git has become established as by far the most widely used. We focus on use of Git via its command line interface as we believe this is the best way to communicate the important fundamental concepts.
Git is a very powerful tool. Unfortunately it is also quite difficult to start using. Git often uses confusing and unintuitive terminology and the benefits of its use are often only apparent in the longer term. Today we will make every effort to demystify Git and make clear why its usage is an essential part of any programming activity.
"It's called a merge commit, Marty! It's a directed acyclic graph! We need to bisect and revert, and then use interactive rebasing possibly with a cherry-pick. What aren't you getting about this!?" pic.twitter.com/zahjHhiB33
— Gabriel Lebec (@g_lebec) October 2, 2020
These materials are written in the style of The Carpentries. If attending an instance of this workshop you are fully encouraged to “type along” with the instructor in order to be able to complete the exercises.
Key Points
Version control software refers to a type of program that records sets of changes made to files
VCS is a ubiquitous tool for software development
Tracking changes makes it easier to maintain neat and functional code
Tracking changes aids scientific reproducibility by providing a mechanism to recreate a particular state of your code base
VCS provides a viable mechanism for 100’s of people to work on the same set of files
VCS lets you undo mistakes and restore a code base to a previous working state
Git is the most widely used version control software
Using Git facilitates access to online tools for publication and collaboration
Committing and History
Overview
Teaching: 30 min
Exercises: 20 minQuestions
How do I start a project using Git?
How do I record changes made in a project?
How do I view the history of a project?
How can I correct mistakes I make with Git?
Objectives
Explain how and why Git must be configured
Explain what a repository is
List the commands used to create a Git commit
Describe the difference between the working directory, staging area and index
Use a commit history to find information about a repository
List the commands that can be used to undo previous commits
Explain potential issues with rewriting the commit history
First Things First
You should have already completed the setup instructions for this workshop and have Git installed. Launch a command line environment (on Windows launch “Git Bash” from the Start menu; on Linux or macOS start a new Terminal). We will use this command line interface throughout these materials. We focus on teaching Git with the command line as we believe this is the most thorough and portable way to communicate the underlying concepts.
You can use the command line to interact with Git but there is still some extra information you must provide before it is ready to use. Enter the following commands, using your relevant personal information as required.
git config --global user.name "FIRST_NAME LAST_NAME"
git config --global user.email "email@example.com"
The information provided here will be included with every snapshot you record
with Git. In collaborative projects this is used to distinguish who has made
what changes. The --global
part of the command sets this information for
any projects on which you might work on this computer. Therefore you only need
to perform the above commands once for each new computer Git is installed on.
The Command Line Interface
For users not generally familiar with using command line interfaces it’s worth taking a moment to consider the commands that were just run. To understand what we just did let’s break down the first command:
git
- This simply indicates to the command line that we want to something with Git.
- All commands that we use today will start with this.
config
- Git is a very powerful tool with lots of functionality so next we need to indicate what we want to do with it.
- Putting
config
indicates we want to change something about how Git is configured.
--global
- Parts that start with dashes are called flags and are used to fine tune the behaviour of the command given.
- The role of the
--global
flag is explained above.
user.name "FIRST_NAME LAST_NAME"
- Finally we tell Git what we want to configure and the details to use.
Creating a Repository
Warning for Linux and macOS users
Before you move onto this exercise, you should run the following command:
$ git config --global core.autocrlf input
This will stop git recording changes to line endings, which can – depending on which text editor you’re using – result in git erroneously thinking every line in a file has changed.
For a longer explanation of why this may be needed, see GitHub’s comprehensive explanation here.
Now that Git is ready to use let’s see how to start using it with a new project. In Git terminology a project is called a repository (frequently shortened to “repo”).
For this workshop you were provided with a zip file. If
you have not already done so, please download it and place it in your home
directory. The zip file contains a directory called recipe
which in turn
contains 2 files - instructions.md
and ingredients.md
. This is the project
we’ll be working with; whilst not based on code this recipe for guacamole is an
intuitive example to illustrate the functionality of Git. To extract the archive
run the following command:
unzip recipe.zip
Then change the working directory of the terminal the newly created recipe
directory:
cd recipe
You’ll need to repeat cd recipe
if you open a new command line interface. Feel
free to open ingredients.md
and instructions.md
and take a look at them (use
a normal file browser if you’re not comfortable doing this on the command
line). Files with a .md
extension are using a format called Markdown, don’t
worry about this now, for our immediate purposes these are just text files. Use
of Markdown and GitHub will come up in the next session however.
To start using Git with our recipe we need to create a repository for it. Make
sure the current working directory for your terminal is recipe
and run:
git init
Initialized empty Git repository in /home/username/recipe/.git/
The path you see in the output will vary depending on your operating system.
master
andmain
branchesA branch is a specific version of the state and history of the work in the repo. Traditionally, the default branch name whenever you
init
a repository wasmaster
. However, the awareness of the online community has improved lately and some tools, like GitHub, use nowmain
as the default name instead. You can read the rationale in this link.If you are using
git
version 2.28 or higher (you can find the version you are using withgit --version
) you can change the default branch name for all new repositories with:$ git config --global init.defaultBranch main
For existing repositories or if your git version is lower than 2.28, you can create the
master
branch normally and then re-name it with:$ git branch -m master main
Depending on your exact version of git, you might get an error like the following when trying to rename the branch:
error:: refname refs/heads/master not found fatal: Branch rename failed
If that is your case, make sure there are not uncommitted files in the repository, and that you have made at least one commit (see below for more information about commits). Ultimately, you can simply create a separate branch called
main
and use that one as your default branch rather thanmaster
, which you can then delete.We will use
main
as the default branch name throughout the workshop. Branches will be covered in detail in our intermediate Git course.
Creating The First Snapshot
Before we do anything else run the following:
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
ingredients.md
instructions.md
nothing added to commit but untracked files present (use "git add" to track)
This is a very useful command that we will use a lot. It should be your first port of call to figure out the current state of a repository and often suggests commands that can be used for different tasks.
Don’t worry about all the output for now, the important bit is that the two files we already have are untracked in the repository (directory). Git does not track any files automatically so we need to do this explicitly. To do this, we first add the files to Git’s staging area, like so:
git stage ingredients.md
git stage instructions.md
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: ingredients.md
new file: instructions.md
Now this change is staged and ready to be committed (note that we could
have saved some typing here with the command git stage ingredients.md
instructions.md
).
git stage
vsgit add
Note that you will sometimes see the
git stage
command written asgit add
(e.g. in the command output above). These commands are completely equivalent, but in this course we will usegit stage
throughout for consistency.
Let us now commit the change to the repository, with a brief but informative description of the change:
git commit -m "adding ingredients and instructions"
[main (root-commit) aa243ea] adding ingredients and instructions
2 files changed, 8 insertions(+)
create mode 100644 ingredients.md
create mode 100644 instructions.md
We have now finished creating the first snapshot in the repository. Named after the command we just used, a snapshot is usually referred to in Git as a commit, or sometimes a changeset. We will use the term “commit” from now on. Straight away query the status to get this useful command into our muscle memory:
git status
On branch main
nothing to commit, working tree clean
The output we get now is very minimal. This highlights an important point about the status command - its purpose is to report on changes in the repository relative to the last commit. In order to see the commits made in a project we can use:
git log
commit b7cd5f6ff57968a7782ff8e74cc9921cc7463c30 (HEAD -> main)
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Mon Dec 30 12:51:04 2019 +0000
adding ingredients and instructions
We’ll talk in more detail about the output here but for now the main point is to recognise that a commit has been created with your personal information and the message you specified.
Staging and Committing
For our first commit we saw that this is a two step process - first we use git
stage
then git commit
. This is an important pattern used by Git. To understand
this in more detail it’s useful to know that git has three ‘areas’.
- The Working Directory (or Working Tree)
- This is the copy of the files that you actually work with in a normal way.
- The Staging Area (or index)
- When you run
git stage
a copy of a file is taken from the working tree and placed here. - New (untracked) files must be added to the staging area before git will track them.
- If a tracked file has been changed it must be added to staging area for that change to be included in a commit.
- This is known as staging files or adding them to the staging area.
- Only files in the staging area are included in a commit.
- When you run
- The Repository
- When you run
git commit
a new commit is created in the repository. - All files in the staging area are moved to the repository as part of the new commit.
- When you run
The relationship between the commands we’ve seen so far and the different areas of Git are show below:
Exercise: Create some more commits
Add “1/2 onion” to
ingredients.md
and also the instruction “enjoy!” toinstructions.md
. Do not stage the changes yet.When you are done editing the files, try:
git diff
There’s a lot of information here so take some time to understand the output. If your output doesn’t contain colours you may want to run
git diff --color
.First, practice what we have just seen by staging and committing the changes to
instructions.md
. Remember to include an informative commit message.Now, run
git status
andgit diff
. Then, stage and commit the changes toingredients.md
but, after each step rungit status
,git diff
andgit diff --staged
. What is the difference between the two diff commands? How does running staging and committing change the status of a file?
Why stage?
The last exercise highlights the reason Git uses a staging area before making commits. You can make file changes as you want all at once and then group them together logically to make individual commits. We’ll see why having only sets of related changes for a specific purpose in a single commit is so useful later on.
Git History and Log
We used git log
previously to see the first commit we created. Let’s run it
again now.
git log
commit b6ff1ca61f08241ec741f6fc58ab2a443a253d89 (HEAD -> main)
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 12:32:04 2019 +0000
Added 1/2 onion to ingredients
commit 2bf7ece2f57594873678f9c17832010730970b28
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 12:28:19 2019 +0000
Added instruction to enjoy
commit ae3255af37e82a98c57f16a057acd1ad5a15ff28
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 12:27:14 2019 +0000
Adding ingredients and instructions
Your output will differ from the above not only in the date and author fields but in the alphanumeric sequence (hash) at the start of each commit.
- We can browse the development and access each state that we have committed.
- The long hashes following the word commit are random and uniquely label a state of the code.
- Hashes are used when comparing versions and going back in time.
- If the first characters of the hash are unique it is not necessary to type the entire hash.
- Output is in reverse chronological order, i.e. newest commits on top.
- Notice the label HEAD at the top, this indicates the commit that the current working directory is based on.
What is a commit hash?
A commit hash is a string that uniquely identifies a specific commit. They are the really long list of numbers and letters that you can see in the output above after the word
commit
. For example,ae3255af37e82a98c57f16a057acd1ad5a15ff28
for the last entry.Occasionally, you will need to refer to a specific commit using the hash. Normally, you can use just the first 5 or 6 elements of the hash (eg. for the hash above it will be enough to use
ae3255a
) as it is very unlikely that there will be two commit hashes with identical starting elements.Throughout this course, we will indicate that you need to use the hash with
[commit-hash]
. On those occasions, replace the whole string (including the square brackets!) with the hash id. For example, if you need to usegit show
(see example below) with the above commit hash, you will run:git show ae3255a
Exercise: Recalling the changes for a commit
The command
git log
shows us the metadata for a commit but to see the file changes recorded in a commit you can usegit show
:git show [commit-hash]
Use one of the commit hashes from your Git history. To see the contents of a particular file from when the commit was made, try:
git show [commit-hash]:ingredients.md
To Err is Human, To Revert Divine
Rewriting History
A very common and frustrating occurrence when using Git is making a commit and then realising you forgot to stage something, or staged something you shouldn’t have. Fortunately the Git commit history is not set in stone and can be changed.
To undo the most recent commit you can use:
git reset --soft HEAD^
Follow this up with:
git log
commit 2bf7ece2f57594873678f9c17832010730970b28 (HEAD -> main)
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 12:28:19 2019 +0000
Added instruction to enjoy
commit ae3255af37e82a98c57f16a057acd1ad5a15ff28
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 12:27:14 2019 +0000
Adding ingredients and instructions
Notice we’ve gone from three commits to two. Let’s also run:
git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: ingredients.md
This shows that the content which was part of the commit has been moved back into the staging area.
From here we can choose what to do. We could stage some additional changes and create a new commit, or we could unstage ingredients.md and do something else entirely. For now let’s just restore the commit we removed by committing again:
git commit -m "Added 1/2 onion to ingredients"
Changing History Can Have Unexpected Consequences
Using
git reset
to remove a commit is only a good idea if you have not shared it yet with other people. If you make a commit and share it on GitHub or with a colleague by other means then removing that commit from your Git history will cause inconsistencies that may be difficult to resolve later. We only recommend this approach for commits that are only in your local working copy of a repository.
Reversing a Commit
Sometimes after making a commit we later (sometimes multiple commits later) realise that it was misguided and should not have been included. For instance, it’s a bit of cliché to tell people to “enjoy” at the end of a recipe, so let’s get rid of it with:
git revert --no-edit [commit-hash]
[main a70e1c5] Revert "Added instruction to enjoy"
Date: Tue Dec 31 12:37:47 2019 +0000
1 file changed, 1 deletion(-)
Check the contents of instructions.md
and you should see that the enjoy
instruction is gone. To fully understand what revert is doing check out the
repository history:
git log
commit ddef60e05eae3cc73ea5be3f98df6ae372e43750 (HEAD -> main)
Author: Christopher Cave-Ayland <c.cave-ayland@imperial.ac.uk>
Date: Tue Dec 31 14:55:52 2019 +0000
Revert "Added instruction to enjoy"
This reverts commit 2bf7ece2f57594873678f9c17832010730970b28.
...
Using git revert
has added a new commit which reverses the changes made in the
specified commit.
This is a good example of why making separate commits for each change is a good
idea. If we had committed the changes to both ingredients.md
and
instructions.md
at once we would not have been able to revert just the enjoy
instruction.
The Ultimate Guide to Undoing in Git
It can be quite easy to get into a messy state in Git and it can be difficult to get help via a search engine that covers your exact situation. If you need help we recommend consulting “On undoing, fixing, or removing commits in git”. This page contains a very comprehensive and readable guide to getting out of a sticky situation with Git.
Key Points
Setup Git with your details using
git config --global user.name "FIRST_NAME LAST_NAME"
andgit config --global user.email "email@example.com"
A Git repository is the record of the history of a project and can be created with
git init
Git records changes to files as commits
Git must be explicitly told which changes to include as part of commit (known as staging changes) with
git stage [file]...
Staged changes can be stored in a commit with
git commit -m "commit message"
You can check which files have been changed and/or staged with
git status
You can see the full changes made to files with
git diff
for unstaged files andgit diff --staged
The commit history of a repository can be checked with
git log
The command
git revert commit_ref
creates a new commit which undoes the changes of the specified commitThe command
git reset --soft HEAD^
removes the previous commit from the history
Break
Overview
Teaching: min
Exercises: minQuestions
Objectives
Let’s take a 10 minute break. Get up and stretch your legs, grab a coffee, perhaps revise the key points from what we’ve covered so far, or ask any questions you might have.
Key Points
Sharing your code
Overview
Teaching: 10 min
Exercises: 10 minQuestions
How can I share my code with others?
What should I take into account when sharing my code?
Objectives
Differentiate the use case for public and private repositories.
Describe the key information that should be present in a software repository.
Configure git with your GitHub credentials.
Set up a repository in GitHub.
Collaborating: what you need to know?
Often, you will need to share your code with others, either with just another person in the same office or anyone, anywhere in the world. Repository hosting services let you do precisely that, keeping all the advantages of VCS and adding on top tools to ease the collaborative development of the code:
- Managing different people working on different features.
- Keeping track of the changes introduced in the code and by whom.
- Opening, reviewing, discussing and merging “pull requests”.
- Opening “issues” to report bugs, request features or discuss different aspects of the code.
GitHub: Why GitHub?
There are several widely used repository hosting services using Git, such as GitLab or Bitbucket. In this course we will use GitHub because:
- It is very easy to use and set up.
- It is, arguably, the most used hosting service of them all.
- Imperial has a GitHub Organisation any Imperial staff or student can join. You don’t need to join for this course but instructions to do so can be found here.
Set up your GitHub account
Set up your free github account:
- Go to www.github.com/join.
- Enter your details and click “Create an account”. You can use your Imperial e-mail address, but this is not mandatory.
- Choose the Free plan.
- Check your e-mail and click “Verify email address”.
- You can fill out the questionnaire or click “skip this step”.
Create a Personal Access Token (PAT):
- In order to access GitHub from the command line, you will need a PAT.
- Follow the instructions here to generate one. Ensure that “repo” and “workflow” are ticked.
- Keep the PAT safe - once you navigate away from the page you won’t be able to view it again. If you lose it, you can always regenerate it.
If you’re familiar with SSH keys, you can follow these instructions to use that alternative authentication method.
Exercise: Log in to GitHub
Check that you can log in to GitHub using your Personal Access Token, or your SSH key if you opted for that method.
If you run into any issues, ask a demonstrator for assistance so that we can all move forward together.
Private vs public repositories
Depending on who you want to give access to your repository, there are two broad types of repositories: private and public. This choice, the first one you will need to make, is not written in stone and you can set a repo private initially and make it public later on.
When you are part of an organization account, there are more options to control the visibility of a repository.
Private repositories
- Only you and the GitHub users you choose can have access to the repository.
- The repository is not listed in the GitHub directory neither it is discoverable by Google and other search engines.
- Ideal for testing, for projects with a view on commercialization, preliminary work on future open projects or for school/Msc/PhD projects not meant to be public.
- Free accounts (except if part of an Organisation account) have several limitations on the features that a private repository has (eg. fewer collaborative features, no GitHub pages, etc.).
Public repositories
- Anyone can see the repository, clone it and fork it (how it is then used depends on the license; see below).
- You keep control of who will be able to contribute to the repository.
- The choice for open source projects and to share your work to a wider potential user base.
Open Source Projects
There is a growing recognition that reproducibility and open source practices in scientific software development are closely interrelated. It is increasingly expected that publications are accompanied by the analysis code and raw data used to create them. As a budding researcher one of the best ways to improve the impact of your work is to make it as easy to reproduce as possible.
Read more:
Things to include in your project
There are a few files that you should always include in the root directory of your repository:
README.md
- Written in Markdown, it is the front page of your repo.
- Should describe in lay terms (or not) the purpose of the software, intended audience, etc.
- Should include simplified installation instructions or a link to more detailed instructions described elsewhere.
- Often includes badges, providing quick information on the status of the documentation, the builds, the software version, license, etc.
- For inspiration see Solcore
- For further guidance see Make a README or this template.
Licence
- Important in any repository, essential in a public one.
- Describes how people are allowed to use (and reuse) the information in your repository.
- Do use a standard licence file to avoid headaches and legal issues later on.
- If your repository is part of an Organisation, make sure this organisation allows that licence. Ultimately, it will be them the ones having to fight any legal battles!!
- Licence choice is also something you should consider discussing with your supervisor if relevant, they may have strong views.
See GitHub Help: Adding a license to a repository
Licence for Imperial College London software
Imperial College’s preferred licence is the permissive BSD 2- or 3-clause. You can check the details at Imperial website: Open Source Software Licences. The guidance on this site is primarily intended for members of staff, however, it is correct for graduate students with the exception that (subject to certain conditions) you are entitled to hold the copyright. This site also tells you who you should contact in case you want a different licence model for your work e.g. commercial.
Open Source software licences
There is a huge range of different licences, ranging from fully permissive to very restrictive. A couple of websites with more information on the topic (including how to licence things that are not software) are:
Installation process/instructions
- If short, they can be part of the README file above.
- Otherwise, they should have their own INSTALLATION.md file and, definitely, be included in any documentation you write for the software.
- Should be complete and specific for any operating system and platform you want to support.
- If you know your software will not work in, let’s say, Windows, say so!!
CITATION.cff
- Indicates how your software should be cited by anyone using it.
- A recently developed standard.
- cff = Citation File Format.
- Create and check via the provided tools.
- Supported by GitHub, Zenodo and Zotero.
Digital Object Identifiers (DOI)
If you are serious about what you are doing and want people to really cite your work properly - and get recognition for it - consider providing your repo with a digital object identifier (DOI). You can get one from:
CONTRIBUTING.md
- Are guidelines explaining how people should contribute to your project.
- Could include steps for creating good issues or pull requests.
- Often, also have links to external documentation, mailing lists, or a code of conduct and community and behavioural expectations.
See GitHub Help: Setting guidelines for repository contributors.
Creating a repository
Now you have all the information you need to create a new repository in GitHub. Just follow these steps:
- Once logged in to GitHub, press the
+
symbol in the top right, and choosenew repository
from the dropdown menu.- Give your repository the name
example
.- Add a short description for your project.
- You are going to create a Public repository, so select that option.
- Click on Initialize this repository with a README. This will create an empty README file in the root directly that you can edit later on.
- Select a licence for your repository. Which one is up to you, but make sure you have read what they entail before (Tip: there is a little “i” next to the dropdown list with some help on this. In case of doubt, choose BSD 3 -clause.
Solution
Your repository is now ready and you should see something similar to this: It tells you there is only 1 commit, 1 branch and 1 contributor, the type of licence you have chosen and also that there are two files: LICENSE and README.md, which is also rendered immediately below.
To make this complete, let’s add some contributing guidelines:
- Go to Insights in the upper right corner of the repository.
- And then click on Community Standards on the left hand side.
- The screen now shows how the project compares with the recommended community standards. Is not bad, but could be better.
- Click on Add in the Contributing line. In the new screen you can write your contributing guidelines. Tip: No one writes this from scratch.
Have a look at some Examples of contributing guidelines and copy/paste those parts relevant for your project.
- Once you are done, click on Commit new file and the changes will be confirmed. Now you should see a CONTRIBUTING.md file in the root directory.
Key Points
Public repositories are open to anyone to use and contribute.
Private repositories are just for yourself or a reduced set of contributors.
README contains a description of the software and, often, some simplified installation instructions.
The LICENSE describes how the software must be distributed and used.
Using one of the OSI (open source initiative) licenses is recommended if the repository is public.
CONTRIBUTING describes how other users can help develop the software.
CITATION helps others to cite your software in their own papers.
GitHub can be used to set up a software repository, share your code and manage who can access it, and how.
Remote repositories
Overview
Teaching: 15 min
Exercises: 20 minQuestions
Is my local repository the same as the remote one?
How can I send my local changes to the remote one?
How can I get the changes others have made?
Objectives
Explain the differences between a local and a remote repository.
Explain what tracking and upstream mean.
Use the push command to send changes on the local branch to the remote one.
Use the pull command to update your local branch with the remote one.
Remote and local repositories
- The
example
repository you just created is a remote repository: it is hosted by a third party hosting system - GitHub in this case. - The
recipe
repository you created earlier was a local repository: it was just a directory on your hard drive using git for version control. - Local and remote repositories can be synchronized, so changes are accessible by other contributors.
- This synchronisation is not automatic: it has to be done explicitly for each branch you want to keep up to date (see pull and push below ).
- The default name for a remote repository synchronised with a local one is
origin
.
Tracking and upstream
- A local branch synchronised with a remote one is said to be tracking that remote branch.
- The remote branch being tracked is called the upstream branch.
- As mentioned, we will be working in this course with the main branch only, but the concepts are applicable to all branches.
Configuring repositories
Depending on whether you are starting from a remote repository and want to get a local one out of it or the other way around, the steps are different.
Configuring a remote repository from a local one
In this case, you have a local repository and you want to synchronise it with a new, remote one. Let’s create a remote for the
recipe
repository you worked on earlier.
- Create a new repository in GitHub, as in the last episode. Give it a name, description and choose if it should be public or private, but do not add any other file (no README or licence).
- You will be offered a few options to populate the remote repository. We are interested in the third one. You will need to make sure HTTPS is selected (not SSH) for your personal access token to work.
- Launch a new command line interface and run
cd recipe
to navigate to the directory where you have your local repository - then execute:git remote add origin ADDRESS_OF_YOUR_REMOTE_REPO git push --set-upstream origin main
For example, for this course’s repository, the address would be:
https://github.com/ImperialCollegeLondon/introductory_grad_school_git_course.git
In general, it is something like:https://github.com/USERNAME/REPO_NAME.git
macOS and Linux users: You will be asked to provide your GitHub username and password. Enter your personal access token (PAT) as your password. Windows users: You will be presented with a
CredentialHelperSelector
dialog box. Ensure “manager-core” is selected and check the box for “Always use this from now on”. Press Select. From the next dialog select “Token” then paste the PAT you saved earlier and press “Sign in”. On subsequent interactions with GitHub your credentials will be remembered and you will not be prompted.
- The first line above will set the GitHub repository as the remote for your local one, calling it
origin
.- The second line will push your main branch to a remote one called
origin/main
, setting it as its upstream branch.- You can check if things went well by going to GitHub: the repository there should contain all the files of your local repository.
Configuring a local repository from a remote
This involves the git command clone. Let’s create a local copy of the
example
repository you created remotely in the last episode.
- On GitHub, press the down arrow in the far top right, next to your picture, and choose “Your repositories” from the drop-down menu.
- Choose the
example
repository from the list.- In the main screen of your repository, click on the green button on the right, Code, and copy the address that appears in the
Clone
section. This should look similar tohttps://github.com/YOUR_USERNAME/example.git
.- Open a new command line interface and execute the commands:
cd $HOME git clone ADDRESS_OF_YOUR_REMOTE_REPO cd example
- This will download the remote repository to a new
example
directory, in full, with all the information on the branches available inorigin
, if any, and all the git history.- By default, a local
main
branch will be created tracking theorigin/main
branch.- You can find some information on the repository using the commands already discussed, like
git log
.
Pushing
- Its basic use is to synchronize any committed changes in your current
branch to its upstream branch:
git push
. - Changes in the staging area will not be synchronized.
- If the upstream branch has changes that you do not have in the local branch, the
command will fail, requesting you to
pull
those changes first (see below).
Pushing an updated README
You want to update the README file of the
example
repository with more detailed information of what the repository is about and then push the changes to the remote.Modify the README file of your local copy of
example
with your preferred editor (any change is good enough, but better if they are useful - or at least, funny!) and synchronise the changes with the remote. Check on GitHub that you can view the changes you made.Solution
git stage README.md git commit -m COMMIT_MESSAGE git push
Pulling
- Opposite to
git push
,git pull
brings changes in the upstream branch to the local branch. - You can check if there are any changes to synchronize in the upstream
branch by running
git fetch
, which only checks if there are changes, and thengit status
to see how your local and remote branches compare in terms of commit history. - It’s best to make sure your repository is in a clean state with no staged or unstaged changes - so, all changes committed.
Conflicts
If the local and upstream branches have diverged - for example if you edited something directly in the repository and also locally - the command will attempt to merge both. This merging might be a smooth, transparent process - Git is pretty smart when it comes to merging branches - but might also result in conflicts that need to be resolved before proceeding.
This is part of the scope of the intermediate Git and GitHub course that you can find in https://imperialcollegelondon.github.io/intermediate_grad_school_git_course/
Pulling an updated README
When reviewing your new README file online, you have discovered a typo and decided to correct it directly in GitHub. Modify the README file online and then synchronise the changes with your local repository (tip: you can edit any text file directly in GitHub by clicking in the little pencil button in the upper right corner).
Solution
git fetch git status
This will indicate that the remote branch is ahead of your local branch by 1 commit. Since there are no diverging commits, it is safe to pull.
git pull
Key Points
origin is typically the name of the remote repository used by GitHub.
Local and remote repositories are not identical, in general.
Local and remote repositories are not synchronized automatically.
push and pull commands only affect the branch currently checked out.
Break
Overview
Teaching: min
Exercises: minQuestions
Objectives
Let’s take another 10 minute break. Get up and stretch your legs, grab a coffee, perhaps revise the key points from what we’ve covered so far, or ask any questions you might have.
Key Points
Using GitHub Issues
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How can I use GitHub issues to manage a TODO list for my project?
Objectives
Explain what GitHub issues are and how to create them.
Explain how issues can help you keep track of your work in GitHub
How to claim and assign issues.
Learn how to use GitHub functionality to effectively communicate who is working on what.
GitHub issues
- GitHub issues are a feature of GitHub. They let you keep track of your work on a GitHub repository and are used to report bugs, request features or enhancements, or to discuss implementation details of some parts of the code.
- Issues are a kind of TODO list, with pending and completed tasks, as well as serving to prioritise the development activity.
- Labels can be added to the issues by the repository administrator to inform at a first glance what the issue is about. Typical labels are “bug”, “enhancement”, “low priority” or “good first issue”, for example.
- Issues also can have one or more people assigned to them who will take care of sorting them out and closing them when complete or if no longer relevant.
- By default, any GitHub user can create an issue in a public repository.
The following figure shows some of the issues open in a certain repository. The labels tell us there are a couple of bug reports, a couple of issues related to the performance of the software and several ones that are simple enough to be tackled by novice people. Most of them have some discussion going on.
Creating GitHub issues
It is possible to create issues in a number of ways (see the GitHub docs page on creating issues to learn about the various methods). One of the most common and simple methods is to create an issue from a repository. This can be done by navigating to the main page of the repository on GitHub and clicking on Issues
, then clicking on the New Issue
button.
Create two new issues
Follow the information above to create two new issues in your example repository. It doesn’t matter what the issue title or contents are at this stage while we are learning how to create issues, but generally this is important, and you should always refer to a repository’s contributing guidelines for how to construct your issue. For now, some simple example titles such as “Fix bug” and “Create Feature” will suffice. Try adding some labels to your issues and assigning yourself.
Solution
You have created two new issues and your issues page should look something like this: Notice the labels have been added next to the issue title, and the assignees’ profile pictures are visible too.
Mentioning other issues
All issues receive a tag number starting at 1 and preceded by # e.g. #40 or #110. If you want to refer to an issue in any comment anywhere in GitHub, just use its tag number and these will be automatically linked from the comment. It is slightly different if you want to mention an issue from another repository; simply paste the URL of that issue and GitHub will usually format this nicely for you. You can always check out the “Preview” tab at the top of the edit box when writing a comment or issue to preview how the formatting will look.
Keeping track of your work with task lists
GitHub provides functionality with issues which allow you to easily organise and keep track of the work going on in a project and to stay up to date with developments.
A particularly useful way of organising issues is to track them as part of a larger issue by using task lists. Tasks lists are a set of tasks which are rendered on GitHub as each being on a separate line with a clickable checkbox. Task lists can be added in any comment on GitHub using Markdown, but, if a task list is added to the body of an issue, extra functionality is provided which can be a powerful way to track progress of segments of work in a project. Read more about using task lists in the GitHub Docs.
Create a task list
Try creating a new issue which contains a task list. To create a task list in the body of an issue, preface list items with a hyphen (or an asterisk) and space followed by
[ ]
. To mark a task as complete, use[x]
. Create two items in your task list, one for each of the the issues you created earlier. Try referencing the two previously created issues by using a hash followed by the number of the issue e.g.#2
. (GitHub may prompt you to select an issue as soon as you type a hash)Solution
You should now have a third issue which contains a task list which references the two issues you created earlier.
Good-to-know task list functionality
- If a task references another issue and someone closes that issue, the task’s checkbox will automatically be marked as complete.
- If a task requires further tracking or discussion, you can convert the task to an issue by hovering over the task and clicking the circle icon in the upper-right corner of the task.
- Any issues referenced in the task list will specify that they are tracked in the referencing issue.
Claim issues
There are some restrictions on who can be assigned to an issue. If you do not have write access to the repository (which is often the case) and you are not part of the same organisation of the repository, the only way of being assigned to an Issue is by making a comment on the Issue. This also serves to warn others that you are volunteering to work on that. A “Hey, I can tackle this.” is often enough.
Closing issues
Let’s see what happens when you close one of the first two issues you created earlier. Open one of the issues you created earlier. Let’s say you have addressed the content of the issue and are satisfied that it can now be considered dealt with. Find and click the “Close issue” button at the bottom of the issue. Now return to your task list issue and see if you can spot what is different.
Solution
In your task list issue you should find that one of the referenced issues now has a tick next to it, indicating that it is closed.
Communication is key
Opening issues and tagging them appropriately - such as as a question, bug, feature request, for example - is a much more useful and productive way of contributing to a repository than, say, sending the developer an email. Not only does it make use of the functionality of GitHub and the productivity benefits that entails, but it also means that it is more likely to be found by someone in the future who has a similar query.
Templates
As a repository maintainer, there are ways you can encourage and help contributors to make meaningful issues. One way is to make use of templates. Templates provide a means to standardise the way in which information is provided in an issue. When a contributor wishes to create an issue on your repository, they can then select the appropriate template for their issue which provides them with guidance about what information should be present and how it should be presented. Read more about using templates to encourage useful issues in the GitHub Docs.
Mentions
@mention-ing collaborators can be a convenient way to draw their attention to a comment. Anyone you @ must first have access to your repository. Similarly, we saw above how we can use
#
to link related issues in a comment. Combining these simple yet powerful formatting syntaxes can greatly improve communication efficiency - certainly in line with GitHub best practices.
Staying up to date
You can keep track of recent comments in an issue by subscribing to an issue so that you receive notifications about latest comments and developments in that issue. Notifications and links to issues you’re subscribed to can be found on your GitHub dashboard.
Key Points
Issues are a feature of GitHub which let you track work in a repository.
GitHub provides functionality for referencing issues in comments
Task lists can be created to keep track of a list of issues
Formatting syntaxes, templates, and subscribing to issues help with communication
Using GUIs and IDEs
Overview
Teaching: 20 min
Exercises: 5 minQuestions
What other ways can I interact with Git other than at the command line?
What tools are available for using Git?
When is it better to use each method of interacting with Git?
Objectives
Use GitKraken to examine your commit history
Add a commit with GitKraken
Use GitKraken to carry out other tasks, such as reverting and pushing to GitHub
Background
As with any aspect of your development environment, the choice of when to use a GUI frontend to Git – if at all – is a matter of personal taste. Like text editors, there are a plethora of options out there and people often have strong opinions about the superiority of one over the other. In short, there’s no single “right” way to do it.
Accordingly, this section, more so than previous ones, is really just an illustration of one way to use Git with one particular GUI program; the hope is that you will gain an understanding of the kinds of tasks that GUIs can make easier, so that you can make your own decision about whether it’s worth the hassle.
There are many GUI interfaces for Git to choose from; the Git website has a decent list. For the purposes of this tutorial though, we will stick to using GitKraken as it is free, cross-platform and has (by our own subjective measure) one of the prettier and more intuitive interfaces. Once you have completed this section, you may want to return to the list of GUIs and see if there are any that better suit your own taste. Another good choice is GitHub Desktop, GitHub’s own attempt at building a GUI: it is polished and well integrated with GitHub. (There’s no Linux client, however.) In addition, many IDEs and editors nowadays (including Visual Studio Code and Sublime Text, etc.) have built-in integration with Git.
Whether you decide to use a GUI or not, however, we suggest it is important that you also understand how to use Git from the command line: firstly, it teaches you about the underlying concepts and, secondly, it is a useful fallback for when your shiny GUI falls over.
Why would I want to use a GUI?
Or: Surely all the cool kids just use Git from the command line?
While programs can convey a lot of information via a simple command-line interface, there are limits to what can be conveyed easily. Notably, in the case of Git, your repository’s history is really a graph (specifically, a directed acyclic graph) and a text interface isn’t a particularly good way of showing a graph.
So far, your recipe
repository is fairly simple, but it is worth noting that
repositories can become complex, e.g.:
You wouldn’t want to try to visualise this in your terminal, any more than you would a Tube map.
(If you’re wondering how a repository’s history can end up looking so complicated, it’s because of a feature we aren’t covering in this course in detail called branching. For those who are interested, it is covered in the intermediate-level Git course.)
Getting started with GitKraken
Firstly, you will need to download and install GitKraken, which can be obtained here.
Next you need to create a GitKraken account, the main purpose of which is to link the GitKraken app with your GitHub account, so that you can make use of its GitHub integration. While this is technically optional, we strongly recommend it. So, when you start GitKraken, click “Sign up with GitHub”:
This should open a browser window in which you will have to sign in with GitHub. Once this is complete GitKraken should log in automatically. If not, you may have to copy the OAuth token manually into GitKraken (like I did).
Next, GitKraken will ask you to supply information about who you are, as you did when you first started with Git. Enter your name and email address, then click “Create Profile”. Here’s what it looks like for me:
You’re now ready to roll!
Examining your existing repository
Let’s start by examining the recipe
repository you’ve been working on. Open GitKraken
and click “File”, then “Open Repo”.
You should see something like the following:
(If your window doesn’t show all of these features, you may need to click on tabs etc. to make them visible.)
For the most part, based on your understanding of Git, you should be able to understand most of the features in this window, namely:
- In the middle of the screen you have your commit history
- On the left-hand side you can see your local
main
branch and the remote one too (note the GitHub logo to indicate that it’s a remote) - On the left-hand side, you also have a button to display your GitHub issues (I currently don’t have any for this repository)
- On the right-hand side you have details about the current commit (the description, author, which files were changed etc.)
(Don’t worry about the other buttons on the left-hand side for now. They aren’t covered in this course, but are in the intermediate-level Git course.)
Now click on “instructions.md” in GitKraken (under where it says “1 modified”). You should now be able to see what changes were made to the file in the the last commit:
Green lines indicate new text and red ones indicate removed text. In this case, as just the “enjoy” instruction at the end of the file was removed, Git has interpreted this the line being removed and replaced with an empty one.
Click the cross in the central pane to close this view and return to a view of the repository’s history.
Exercise: Examine some more of your repository’s history
Try examining some more of the history to familiarise yourself with the interface. Click on a commit to view details about it. Then click on one of the modified files to remind yourself what changes you made.
Modifying your repository
In this next section, you will make some more changes to your repository.
Return to your text editor/IDE and make the following changes:
ingredients.md
: Add “1 clove of garlic, chopped” to the listinstructions.md
: Change the instruction “and mix well” to “and mix thoroughly”
Now return to GitKraken and you should see that there is a yellow pencil icon with a “2” next to it:
This indicates that two files have been modified since the last commit and that they are not staged.
Click on this line with the yellow pencil and GitKraken will show you which files were changed:
If you click on each of the files you can see how they were changed.
Note that there is a button to stage each file individually and one to stage them both at once. If you accidentally stage a file, there is a corresponding “unstage” button for individual files.
As we have made two changes that are essentially unrelated, let’s make two separate commits. Stage one of the files, add a commit message and click “Commit changes”:
Exercise: Add the other commit by yourself
Stage the file, enter an appropriate message and commit it.
GitKraken allows you to add a one-line summary as well as a longer description, so add one here. How does it appear when you view the commit with
git log
?
Now upload your changes to GitHub by pressing the “Push” button.
Going further
Exercise: Explore GitKraken yourself
This final exercise is a change for you to play around and familiarise yourself with the interface. You could try repeating some of the tasks you completed previously using the command line (such as reverting a commit). There is a table below showing how you carry out these operations using GitKraken.
You could also try creating a new repository on GitHub using GitKraken.
There is also a useful series of tutorials in the GitKraken user guide.
Git command | GitKraken equivalent |
---|---|
git stage |
Click on pencil icon by changes, then “Stage” or “Stage all changes” |
git commit |
After staging, enter a commit message then click “Commit changes” |
git push |
Click the “Push” button in the toolbar |
git revert |
Right-click on commit and click “Revert commit” |
git init |
Click “File” then “Init Repo” (choose GitHub.com to create remote) |
git status |
Look at the top of the commit list and you can see if there are uncommitted files etc. |
Key Points
Knowing how to use Git from the command-line is useful for understanding concepts and as a fallback
Text editors and IDEs often have built-in Git support
GUIs are particularly useful for viewing the graph-like structure of a repository
It is worth taking time to explore other tools for Git so you can find a workflow that suits you