Want to see the full-length video right now for free?
Welcome back to our tour of the Git object model. In this video, we'll go beyond the base objects and look at more of the structure with tags, branches, and remotes, as well as reviewing how the various Git commands act on this collection of objects.
Note - If you haven't watched the First Part of this review of the Git object model, we highly recommend you go back and do that now, as this video largely builds on that foundation.
Before adding more to our growing picture of the Git object model, let's quickly review the base objects we covered in the first video:
Returning to our peek into the .git
directory, we can first review the layout:
$ tree .git -L 1
.git
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── hooks/
├── index
├── info/
├── logs/
├── objects/
└── refs/
In the previous video, we focused primarily on the objects
directory, which
acted as a database of the blob
, tree
, and commit
objects we created as
we worked with our repo.
In this video, we'll instead focus on the refs
directory. Peeking inside,
we'll see:
$ tree .git/refs
.git/refs
├── heads/
| └── master
└── tags/
The first directory we'll encounter within the refs
directory is heads
.
These are our local branches. The directory is called heads
, as our local
branches are the collection of things that HEAD
can point at.HEAD
is
the ultimate ref, defining what we currently have checked out.
Currently, our heads
directory only contains a single file, master
. We
call this master
file a "file", rather than some more complex Git object,
because that it what it is. We can test this by cat
ing it out:
$ cat .git/refs/heads/master
f95b2fe3b64c6351e7eec4011921b4469098b9ba
Here we can see that the file contains a string which looks very much like a Git object hash. We can then turn around and ask Git about the object:
$ git cat-file -t f95b2fe3b64c6351e7eec4011921b4469098b9ba
commit
$ git cat-file -p f95b2fe3b64c6351e7eec4011921b4469098b9ba
tree 0cae7dc167b255c0123c7c396fc48ce40fc35cfa
parent ef34a153025fffb8a498fff540f7c93963937291
author Chris Toomey <chris@ctoomey.com> 1441311544 -0400
committer Chris Toomey <chris@ctoomey.com> 1441311544 -0400
Another file in app dir
Now we have a full picture of what exactly our master
branch
is: a file, stored in .git/refs/heads
. Its contents are the
hash of a single commit. We know that commits contain a pointer to the
working tree, as well as parent commits, and now we can add branches to the
list of pointers in our view of the Git world.
Branches are just pointers; nothing more!
Now that we have an understanding of branches, we can shift our focus to
tags. We'll create a tag by running git tag v0.1
, and then we can take
another look at our .git/refs
directory to see what we have:
$ tree .git/refs
.git/refs
├── heads/
| └── master
└── tags/
└── v0.1
Now we have a new file, v0.1
, stored in the tags
directory. Similar to
the master
head file, we can cat out the tag file directly to see what it
contains:
$ cat .git/refs/tags/v0.1
f95b2fe3b64c6351e7eec4011921b4469098b9ba
Just like our master
file, the v0.1
file contains nothing more than the
hash of a commit. It is possible for tags, unlike branches, to grow a bit more
complex by adding things like annotations, PGP signatures, and other metadata.
In this case, they will be stored in the .git/objects
directory, and the tag
file will simply contain the hash of that tag object (which will contain the
hash of the commit that was tagged).This just adds one additional step, so we
can still think of tags as simple pointers to commits.
While branches and tags are very similar in that they both simply contain a reference to a commit, they differ in that branches can change what they point at, but tags cannot.
Tags exist to lock down and name ("tag", if you will) a specific version of the code. Branches exist to track the changes in our codebase over time, and will therefore update whenever we commit or merge.
For the small local sample repo we've been working with so far there are no remotes, but we can hop over to the local checkout of the Upcase repo to see an example that contains remotes:
$ tree .git/refs
.git/refs
├── heads/
| ├── deck-last-attempt
| ├── master
| ├── ... (truncated)
| └── welcome-trail
├── remotes/
│ ├── origin
│ │ ├── HEAD
│ │ ├── cjt-north-star-metric
│ │ ├── master
│ │ ├── mg-button-colors
│ │ └── ... (truncated)
│ ├── production
│ │ └── master
│ └── staging
│ ├── dashboard-staging
│ ├── ... (truncated)
│ └── master
└── tags/
└── v0.1
With the more real-world example of the Upcase Git repo, we can see that there
is now a third subdirectory alongside heads
and tags
in the .git/refs
directory.
Within this remotes
directory, there is a directory for each of our remotes,
namely origin
, staging
, and production
. This adds a bit more structure,
but otherwise these objects are the same as our branches. We can confirm this by
investigating the contents of one of these remote branch files:
$ cat .git/refs/origin/cjt-north-star-endpoint
3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5
$ git cat-file -t 3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5
commit
$ git cat-file -p 3891a7bc21e5e0c69e71e8153bb8b4a67b80bff5
tree 32022b6465ebf9f9e37b7e1caccb3c9e620dd465
parent 7262141ae317f56b567ed2f95505e6ca9bbe1605
author Chris Toomey <chris@ctoomey.com> 1433384047 -0400
committer Chris Toomey <chris@ctoomey.com> 1435239388 -0400
WIP analytics JSON endpoint
Again, we see more of the same. Remote branches are simply pointers to a commit. It's pointers all the way down, friends!
HEAD
is the final object we need to be aware of to understand Git. HEAD
,
unlike the other objects we've discussed, is a singleton, meaning that
there is only ever one HEAD
.
HEAD
identifies the currently checked out object. Typically, this is a
branch (with that branch pointing to a commit), but it is possible to check
out a commit directly, in which case HEAD
would be pointing at that commit.
HEAD
is a file just like our branch objects. It lives at the root of the
.git/
directory and its contents are similarly simple:
$ cat .git/HEAD
ref: refs/heads/master
This is the normal mode for Git, where HEAD
points to a branch, in this case
the master
branch. If we were to check out a commit directly, then HEAD
would simply point at that commit:
$ git co 833c1ea
$ cat .git/HEAD
833c1ea55d76adcf48b5f7e933271fcc3e36f123
So once again we find ourselves with a pointer. HEAD
points to a branch,
that branch points to a commit, and that commit points to a working tree and
parent commit. Pointers. All. The. Way. Down.
And, with the addition of HEAD
, we have a complete picture of the Git object
model.
blobs
, trees
, and commits
.branches
, tags
, and remote branches
.Now that we understand the objects that are used throughout Git, we're going to zoom out a bit and focus primarily on commits and refs. Nearly all operations in Git involve commits, although typically these commits are referenced through refs like branches and remotes.
Checking out a new branch is just the act of creating a ref file, specifically a "head", and populating it with the relevant commit hash.
$ git checkout -b new-branch
First Git will follow from the HEAD
to the current branch to determine what
commit hash that branch points at. With that info, Git creates a new file in
.git/refs/heads
with our new branch name as the file name, and the commit
hash as the contents. Lastly, it updates HEAD
to point at this new ref.
Similarly, we can use the verbose form of checkout
, where we explicitly
specify the base branch. For instance:
$ git checkout -b other-branch master
is largely the same as the last check out, but instead of starting from HEAD
, we
start from the specified branch to determine the commit to point at, and use
that to populate our new ref file.
There's an alternative form of checkout when we check out a file by specifying a ref. Technically, we need a tree to get to a specific version of a file, but Git's pointer system also allows for something to be "tree-ish". When something is tree-ish, it will eventually lead to a single tree by dereferencing the pointers.
A commit is tree-ish because commits point at a single tree for the working directory.
Refs are tree-ish because they point at commits, which point at a tree.
Even HEAD is tree-ish by the same logic.
So if we use the following form of the checkout command:
$ git checkout master -- app/assets/javascripts/application.js
Git will begin by looking up the commit that master
points at, then
the working tree of that commit, and then walk down through the intermediate
trees until it reaches the blob for app/assets/javascripts/application.js
,
and restore that version of the file.
Committing takes all of the staged objects and stores them as needed. This typically involves at least one new blob, and a new tree for the current version of the working directory.
It then builds a commit object that points at our new tree, as well as the commit we are currently on.
Lastly, it updates our checked out branch to point at this newly created commit.
$ git commit -m "Add new file"
A fast-forward merge is about the simplest operation we can perform. It creates no new objects, instead simply updating the current branch to reference a different commit.
$ git merge --ff-only feature
A traditional merge is much more interesting. We start with two diverging histories, and Git creates a new tree for us from the two existing trees.
Once it has the new tree, Git will create a new commit that points at this tree. Lastly, the branch ref will be updated to point at this new commit.
Comparing these two merge strategies, it becomes clear why we prefer the fast-forward only merges. In a fast-forward merge we are just updating a pointer, but the code is not changed. In a traditional merge, Git does its best to bring together two different versions of the code, creating a new commit and tree that we have not interacted with.
$ git merge feature
So with this comparison of traditional and fast-forward merges in mind, we can
talk about our good friend rebase
. Rebase can be performed when we have new
commits on both our feature branch, and our "upstream" branch (typically
master
). We want to update the commits on our branch so they include the
changes on master
.
When we rebase, we essentially replay our work on the current version of the upstream branch. Git does this by calculating each of the diffs for the commits unique to our branch, then applies them onto the upstream branch one by one. Each application of a diff creates a new commit, reusing the associated commit message and author details.
Note that the old commits still exist, but they are now orphaned. No refs point to them any longer and so they are essentially unreachable, although we know from the discussion of the reflog in the first video that we could easily restore them by checking the reflog.
Once all the new commits have been created, our branch is updated to point at the tip commit of our rebased group.
From here, we could now fast-forward merge the master branch into ours, as we are now in line with its history. The key difference between this and a traditional merge is that all of the commits here were created by us, and we get to interact with them and test them as needed before merging them into master.
$ git rebase master
Interactive rebase is very similar. We begin with a set of commits, typically on a feature branch and ahead of master, and we perform our interactive rebase. When we squash them down, we create a new commit using the tree of our former tip commit, and compose a new commit message.
Once again, we can see that the old commits live on despite being orphaned, and we can therefore get back to them as needed.
$ git rebase --interactve master