Fundamentals of Git

I first started using git on professional projects about two years ago. Throughout my career, I’ve used the whole gamut of version control systems: CVS, VSS, TFS, Subversion, Perforce and Git, in that order. Of all these tools, Git had the most difficult (notoriously so) learning curve. After a year or so, I gained enough working knowledge to use Git very effectively, even if I didn’t totally understand how things worked under the hood. As I spent some time guiding other newbies to use it for the first time, I delved deeper into the fundamentals and also learned some of its more esoteric functionality. At this point going back to traditional VCS’s feels dramatically limiting.

Having used only traditional version control systems, let’s acknowledge that Git introduces an explosion of new functionality and terms (push, pull, rebasing, remotes) and also requires that we mentally re-map previously canonical version control terminology like: checkin, checkout, sync, commit, and others.

Distributed Version Control Systems

Git is a distributed version control system, meaning a local copy of the repository is a fully autonomous repository with the entire commit history and capabilities of a traditional server-side repository. All day-to-day source control operations like committing changes, viewing history of a file, and creating a new branch happen entirely in your local repository. This is often befuddling to new users of git, who are accustomed to traditional source-control systems, where similar operations are so tightly synchronized between client and server, they appear to be singular.

While a Git repository is self-sufficient and autonomous, programmers will obviously still need to synchronize changes with a remote repository at some point. This is achieved with the commands push and pull.

Clones

Cloning is how you retrieve a full copy of the repository of interest. You can clone a repository locally as many times as you like.

git clone git@github.com:joyent/node.git

The default result of this operation is that a full copy of the repository will be created in a directory called ./node.

Remotes

Non-distributed VCS’s are based on the traditional client-server model. The server is centralized and all clients sync with that server. With Git your repository ‘tracks’ a remote server by default, but your local repository can track multiple remote servers. This sounds crazy complicated to the uninitiated, but it’s intrinsic to how a lot of source software is developed. After cloning a repository locally, type:

git remote

This will list all of the remote repositories that your local repository is tracking. Git neophytes should simply note that the authoritative remote server is conventionally referred to as “origin”. When you see “origin” in Git command output, this is a reference to the remote server that is being tracked for the current branch. Don’t ascribe too much significance to this, as it’s literally just a naming convention. You can easily change “origin” to any name you desire.

Branches

One of the most important features of Git is how branching works. People often claim branching is “cheap” in Git. The reasons for this are:

branches are created entirely locally (no server-side operation)
branches are NOT copies of a file system, but rather they are a reference to a commit. This second point is critical, as it results in a very fast operation. If you’ve ever branched in Subversion, you’ve probably noticed it takes a while for the operation to complete, as the entire subset of files is being copied. Branches are actually newly generated in the file system. Similarly, in Perforce, files are copied and then directed to populate (P4 command). The consumption of space on the file system as well as the cost of read/write operations causes most people using traditional version control systems to branch sparingly and with great caution. Git, however, supports frequent and near-instantaneous branching by simply creating a pointer to a specific commit in history. Nothing is copied! As a result, Git gives programmers the ability to experiment more easily, as well as to version changes to source code at a more granular level without incurring the latency of a client-server connection.

Rebasing

This is easily the most controversial aspect of Git, especially when it comes to the schism of opinion over how teams should use it. Rebasing is the act of rewriting a select portion of branch commit history. Imagine you are a developer in this most common scenario:

You create a branch to start coding a new feature.
You make several commits to the new feature branch
Changes from another teammate have been committed to the parent branch, so you get these new latest changes from the remote server and you rebase your commits onto the latest changes from the parent branch by typing:

git rebase origin/<name_of_parent_branch>

…Deep breath. The above scenario can be pretty mind-bending for neophytes. Rebasing works like this:

Git rewinds all your local commits and puts them aside in a safe place
Changes that were fetched from the server are now fast-forwarded onto your feature branch
Your previously saved local commits are now fast-forwarded on top of your updated feature branch. Rebasing is an incredibly powerful feature that allows you to rewrite commit history, change the order of commits, and combine multiple commits into a single commit.

Still confused as to why this is useful? That’s okay, because rebasing is Git’s MOST misunderstood concept, which I will address in more depth in future posts.

In Closing

For many programmers version control systems are like a substitutable public good (eg. water, power, etc), but Git and other distributed version control systems like Mercurial, add a new dimension to the software development process. It’s perfectly normal for teams and companies to continue using Git the same way that they used Subversion and TFS in the past, but it would behoove anyone to look at what Git offers beyond the traditional source control system. In future posts, I’ll talk about some of the finer points of distributed version control and the nuances of Git.