`git [what] -p` ALL the things!

A while back, I was doing a lot of code changes that sprawled throughout the codebase, and when I finally completed a single chunk of work mixed in with a lot superfluous code changes, I was left with the final task of committing only a subset of un-staged changes in my local repository. I had several files that contained multiple changes which either needed to be committed or deleted, and I didn’t have a way to do it other than manually.

I’m so lazy (or just allergic to reversing changes manually) that I wondered how I could selectively check out chunks of changes from the files while leaving others in place so that I could stage and subsequently commit those changes. A minimal search yielded this great Stack Overflow post explaining how to selectively check out chunks of code. Sweet! TLDR; git checkout -p allows you to interactively un-stage code changes in your local repository.

After finding that post, I quickly shaped up my next commit and moved on to other tasks. But that episode reminded me that another way to solve the problem would have been to execute the inverse I was trying to achieve in the above-mentioned scenario and use git add -p to selectively stage chunks of code changes for commit (as opposed to remove unwanted code from potential commits). This is useful when you are experimenting with a lot of different code changes (perhaps you like to liberally sprinkle console logging everywhere when fixing obscure bugs?) in your local repository, and it’s more efficient to try multiple approaches and not immediately clean up after yourself because you’re trying to get in the flow of understanding or working on some hard problem. Using the -p flag will allow you to stage one code diff and discard another in the same file. The typical work flow is to use git add -p to selectively add ONLY the code diffs that solved your problem for a given commit. After choosing the correct code diffs, commit what you’ve staged, and then run git checkout . to get rid of the remaining cruft that didn’t work. See how that works?

You can read more about adding commits in chunks here on the Git SCM website. You might be wondering what’s so special about -p ? What does p even stand for? Git repositories are sometimes described as being a tree of linked-lists where the nodes in the list are patches. p is an abbreviation for “patch”!

On a closing note, the nice thing about git is they try (like many other UNIX commands) to maintain standard configuration flags across all operations (eg -p!)  Try using git log -p and see what happens?

Git: Undo a commit

At some point, a new user of Git will accidentally commit some undesirable changes. This post will address a few scenarios where you might want to “undo” a commit.

First case, you’ve committed some terrible code, and you simply want to blow away the previous commit and start over. Remember, each commit is uniquely identified by a hash identifier, and you’ll need to locate the hash of the commit just previous to the unwanted commit you just created. To undo the most recent commit, you’ll type:

git reset --hard [hash_of_commit_previous_to_unwanted_commit]

WARNING: Much like ‘–hard’ implies, this will completely blow away the unwanted commit. You will not be able to retrieve the changes for that commit in a way that is…economical. What if you want to undo the commit, but you still want to further manipulate the changes that were contained in that commit? A common scenario might be that you want to improve the changes or that you simply want to recover a specific portion of the commit. Then, you simply modify the above command:

git reset --soft [hash_of_commit_previous_to_unwanted_commit]

OR

git reset [hash_of_commit_previous_to_unwanted_commit]

There is a slight difference between resetting a to a previous commit using –soft versus no option at all. What happened in the first command (reset using –soft), is that we’ve undone the previous commit and pulled the changes back into “staged” mode, which is essentially what the status of the files were right before you originally committed the unwanted changes. The second command (reset, without specifying an option) returns the changes to “unstaged”, which means that if you want to make some additional modifications and then commit them, you’ll need to add or “stage” them to do a subsequent commit.

All of the above examples require you to seek out the specific hash of the commit to which you want to reset the current branch, and this can be tedious. You can accomplish the same general goal (using similar options for each type of reset) as the above commands, by simply typing:

git reset HEAD~1

Perhaps you’ve seen the term HEAD in command-line output with Git and have always wondered what it means? Basically HEAD is a symbolic reference to the most recent commit (or “tip”) of a branch. By specifying HEAD~1, you’re basically saying, I want to go back exactly one commit from HEAD. Learning to use the convention of HEAD~[some-number-of-commits-back] is a common shorthand for specifying the target of other Git commands as well (such as rebase and log), so be sure to get comfortable using. It will be very handy and convenient for you in the future when executing more advanced operations with Git.

There you go…now you can undo a commit! Git luck to you!

What is Git revert?

When using Git, one of the first problems people encounter is how to undo a commit. Taking a quick look at the help docs, it would seem that “revert” is the way to undo a commit. In Subversion, for example, “revert” actually blows away uncommitted changes to local files. But with Git, “revert” means something entirely different. This is the first in a series of posts on using Git, and I’ve set up a Github repository called git-examples to act as practical reference for some of these issues.

What happens when you revert? Consider the following Git command:

 git revert <hash-of-commit-to-reverse>

This will create an additional commit that represents the inverse of commit you want to revert. The commit message will look like this:

git-revert-commit-detail-github

The default commit message for a reverted commit will indicate the hash (c74ba1e) of the specific commit that is being reverted.

So what is actually happening when you revert a commit? Simply put, Git creates a commit that is the exact inverse of the targeted commit. Git is NOT undoing the commit. It is literally creating a new commit that reverses the changes of the target reverted commit. The new commit will re-add any code that was deleted from the target commit, and delete any code that was added to the target commit. The original commit will continue to exist in your commit history for eternity! Newcomers to Git are very often confused when they see a brand new commit in their branch, with the original commit continuing to exist.

Take a look at the original commit:

git-revert-target-commit

And here is the new commit created by reverting:

git-revert-inverse-commit.png

Note that code that was added in the target (original) commit, was then deleted in the subsequent commit created by the revert command. For a more in-depth look at those commits, take a look at the branch that I’ve created to demonstrate the use of the revert command.

After grasping what reverting commits is all about, you may decide that using revert is unnecessary since you only need to undo your most recent commit. If that’s the case, you’ll want to use the reset command (which I will discuss in a future post)

So why would anyone use the revert command? IMO, there’s two special cases that may necessitate using revert:

  1. The commit you want to undo is far back in the commit history, and it’s too late to reset or interactively rebase (I’ll talk more about interactive rebasing in later posts). The example above shows a deadly simple example, but in real life, the commit you want to revert may encompass complicated changes across multiple files, and revert guarantees to reverse exactly those changes.
  2. Using revert is a way to document a specific code-change. It indicates to future developers/readers of the commit history, that someone very deliberately corrected changes from a previous commit.

For more in-depth explanation of how reverting works in Git, take a look at this great post.

NOTE: The site gitready.com, is hands-down, the BEST resource that I’ve found on the web for all things Git-related!

Fundamentals of Git

I first started using git on professional projects about two years ago. Throughout my career, I’ve used the whole gamut of version control systems: CVS, VSS, TFS, Subversion, Perforce and Git, in that order. Of all these tools, Git had the most difficult (notoriously so) learning curve. After a year or so, I gained enough working knowledge to use Git very effectively, even if I didn’t totally understand how things worked under the hood. As I spent some time guiding other newbies to use it for the first time, I delved deeper into the fundamentals and also learned some of its more esoteric functionality. At this point going back to traditional VCS’s feels dramatically limiting.

Having used only traditional version control systems, let’s acknowledge that Git introduces an explosion of new functionality and terms (push, pull, rebasing, remotes) and also requires that we mentally re-map previously canonical version control terminology like: `checkin`, `checkout`, `sync`, `commit`, and others.

Distributed Version Control Systems

Git is a distributed version control system, meaning a local copy of the repository is a fully autonomous repository with the entire commit history and capabilities of a traditional server-side repository. All day-to-day source control operations like committing changes, viewing history of a file, and creating a new branch happen entirely in your local repository. This is often befuddling to new users of git, who are accustomed to traditional source-control systems, where similar operations are so tightly synchronized between client and server, they appear to be singular.

While a Git repository is self-sufficient and autonomous, programmers will obviously still need to synchronize changes with a remote repository at some point. This is achieved with the commands `push` and `pull`.

Clones

Cloning is how you retrieve a full copy of the repository of interest. You can clone a repository locally as many times as you like.

`git clone git@github.com:joyent/node.git`

The default result of this operation is that a full copy of the repository will be created in a directory called `./node`.

Remotes

Non-distributed VCS’s are based on the traditional client-server model. The server is centralized and all clients sync with that server. With Git your repository ‘tracks’ a remote server by default, but your local repository can track multiple remote servers. This sounds crazy complicated to the uninitiated, but it’s intrinsic to how a lot of source software is developed. After cloning a repository locally, type:

git remote

This will list all of the remote repositories that your local repository is tracking. Git neophytes should simply note that the authoritative remote server is conventionally referred to as “origin”. When you see “origin” in Git command output, this is a reference to the remote server that is being tracked for the current branch. Don’t ascribe too much significance to this, as it’s literally just a naming convention. You can easily change “origin” to any name you desire.

Branches

One of the most important features of Git is how branching works. People often claim branching is “cheap” in Git. The reasons for this are:

  • branches are created entirely locally (no server-side operation)
  • branches are **NOT** copies of a file system, but rather they are a reference to a commit.

This second point is critical, as it results in a very fast operation. If you’ve ever branched in Subversion, you’ve probably noticed it takes a while for the operation to complete, as the entire subset of files is being copied. Branches are actually newly generated in the file system. Similarly, in Perforce, files are copied and then directed to `populate` (P4 command). The consumption of space on the file system as well as the cost of read/write operations causes most people using traditional version control systems to branch sparingly and with great caution. Git, however, supports frequent and near-instantaneous branching by simply creating a pointer to a specific commit in history. Nothing is copied! As a result, Git gives programmers the ability to experiment more easily, as well as to version changes to source code at a more granular level without incurring the latency of a client-server connection.

Rebasing

This is easily the most controversial aspect of Git, especially when it comes to the schism of opinion over how teams should use it. Rebasing is the act of rewriting a select portion of branch commit history. Imagine you are a developer in this most common scenario:

1. You create a branch to start coding a new feature.
2. You make several commits to the new feature branch
3. Changes from another teammate have been committed to the parent branch, so you get these new latest changes from the remote server and you rebase your commits onto the latest changes from the parent branch by typing:

git rebase origin/<name_of_parent_branch>

*……Deep breath*. The above scenario can be pretty mind-bending for neophytes. Rebasing works like this:

  • Git rewinds all your local commits and puts them aside in a safe place
  • Changes that were fetched from the server are now fast-forwarded onto your feature branch
  • Your previously saved local commits are now fast-forwarded on top of your updated feature branch.

Rebasing is an incredibly powerful feature that allows you to rewrite commit history, change the order of commits, and combine multiple commits into a single commit.

Still confused as to why this is useful? That’s okay, because rebasing is Git’s MOST misunderstood concept, which I will address in more depth in future posts.

In Closing

For many programmers version control systems are like a substitutable public good (eg. water, power, etc), but Git and other distributed version control systems like Mercurial, add a new dimension to the software development process. It’s perfectly normal for teams and companies to continue using Git the same way that they used Subversion and TFS in the past, but it would behoove anyone to look at what Git offers beyond the traditional source control system. In future posts, I’ll talk about some of the finer points of distributed version control and the nuances of Git.