Changing the committer

Quite often, I find myself using git for non-work related activity on my work laptop. Yeah, yeah, I know.

Normally, I remember to set my email to be my home address before starting work.

$ mymail='dom [at] happygiraffe (dot) net'
$ git config $mymail

Of course, you’d use your proper email address, instead of that obfuscated form.

Note that we don’t use --global. This change is specific to the repository that we’re working in.

Unfortunately, I usually just dive in and start working. About four or five commits down the line, I realise I’ve screwed up. What then?

git filter-branch to the rescue! We just need to change a couple of environment variables and redo each commit.

$ git filter-branch --env-filter "export GIT_AUTHOR_EMAIL=$mymail GIT_COMMITTER_EMAIL=$mymail" master
Rewrite 0c5299bf98bf30938bb1d0fc0211aa9f3a9ddcf8 (3/3)
Ref 'refs/heads/master' was rewritten

Like all uses of filter-branch, you should only do this on an unpublished repository, as it’s effectively altering history.

There is a reference to the original commits left behind, in case I screwed something up. When you’ve checked that everything looks OK, you can clean up.

$ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
$ git reflog expire --expire=now --all
$ git gc --prune=now
Counting objects: 9, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (9/9), done.
Total 9 (delta 1), reused 0 (delta 0)

temporarily ignoring files in git

Quite often, you want to change a file temporarily whilst you work on something, but you know you don’t want to commit it. Right now I want to change my project’s logging from INFO to DEBUG, but I don’t want to commit that.

There’s a command git update-index which has a flag --assume-unchanged. And it just makes those files ignored for a while.

$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#	modified:   src/main/webapp/WEB-INF/log4j.xml
no changes added to commit (use "git add" and/or "git commit -a")
$ git update-index --assume-unchanged src/main/webapp/WEB-INF/log4j.xml
$ git status
# On branch master
nothing to commit (working directory clean)

Easy! Now, edit away.

… time passes …

And now to get everything back to normal.

$ git update-index --verbose --really-refresh
src/main/webapp/WEB-INF/log4j.xml: needs update
$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#	modified:   src/main/webapp/WEB-INF/log4j.xml
no changes added to commit (use "git add" and/or "git commit -a")

I have to be honest, this is slightly hacky. It would be nice to be able to tell git “ignore this change,” in the way you can say “add this change”. But it works OK for now.


Solr’s Lucene Source

I’m debugging a plugin for Solr. I’ve just about got the magic voodoo set up so that I can make Eclipse talk to tomcat and stick breakpoints in and so on. But I’ve immediately run into a problem.

Even though Solr itself comes with -sources jars, the bundled copy of lucene that they’ve used doesn’t. Needless to say, this is a bit of a hindrance.

Thankfully, the apache people have set up, which makes this situation a lot less annoying than it could be.

First, I checked out copies of lucene & solr.

$ git clone git://
$ git clone git://

Now, I need to go into solr and figure out which version of lucene is in use. Unfortunately, it’s not a released version, it’s a snapshot of the lucene trunk at a point in time.

$ cd …/solr
$ git branch -r
  origin/HEAD -> origin/trunk
$ git whatchanged origin/tags/release-1.3.0 lib
commit 904e378b7b4fd18232f657c9daf484a3e63b272c
Author: Yonik Seeley 
Date:   Wed Sep 3 20:31:42 2008 +0000

    lucene update 2.4-dev r691741

    git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68

:100644 100644 a297b74... 54442dc... M  lib/lucene-analyzers-2.4-dev.jar
:100644 100644 596625b... 5c6e003... M  lib/lucene-core-2.4-dev.jar
:100644 100644 db13718... f0f93a7... M  lib/lucene-highlighter-2.4-dev.jar
:100644 100644 50c8cb4... a599f43... M  lib/lucene-memory-2.4-dev.jar
:100644 100644 aef3fb8... 79feaef... M  lib/lucene-queries-2.4-dev.jar
:100644 100644 1c733b9... 440fa4e... M  lib/lucene-snowball-2.4-dev.jar
:100644 100644 0195fa2... b5ff08b... M  lib/lucene-spellchecker-2.4-dev.jar

So, the last change to lucene was taking a copy of r691741 of lucene’s trunk. So, lets go over there. And see what that looks like.

$ cd …/lucene
$ git log --grep=691741

Except that doesn’t return anything. Because there was no lucene commit at that revision in the original repository (it was something to do with geronimo). So we need to search backwards for the commit nearest to that revision. Thankfully, git svn includes the original subversion revision numbers of each commit.

$ cd …/lucene
$ git log | perl -lne 'if (m/git-svn-id:.*@(d+)/ && $1 <= 691741){print $1; exit}'

So now we can go back and find the git commit id that corresponds.

$ cd …/lucene
$ git log --grep=691694
commit 71afff2cebd022fe63bdf2ec4b87aaa0cee41dc8
Author: Michael McCandless 
Date:   Wed Sep 3 17:34:29 2008 +0000

    LUCENE-1374: fix test case to close reader/writer in try/finally; add assert b!=null in RAMOutputStream.writeBytes (matches FSIndexOutput which hits NPE)

    git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68

Hurrah! Now I can checkout the same version of Lucene that’s in Solr. But, probably more useful for Eclipse, is just to zip it up somewhere.

$ cd …/lucene
$ git archive --format=zip 71afff2 >/tmp/

Excellent. Now I can resume my debugging session. 🙂

NB: I could have just used subversion to check out the correct revision of Lucene. But, I find it quicker to use git to clone the repository, and I get the added benefit that I now have the whole lucene history available. So I can quickly see why something was changed.


Publishing a subdirectory to github pages

I’ve written some HTML documentation for jslint4java. It lives in jslint4java-docs/src/main/resources in typical maven fashion. I’d like to get it published on github pages.

The starting point is similar to their documentation.

git symbolic-ref HEAD refs/heads/gh-pages
rm .git/index
git clean -fdx
echo "My GitHub Page" > index.html
git add .
git commit -a -m "First pages commit"
git push origin gh-pages

That lands us with a brand spanking new branch to play with. What I’d like to do is make a new commit on that branch, but from a tree which is already in my repository. The magic word is git commit-tree. It needs two things: the id of the tree and the id of the parent commit to attach to.

The parent to attach to is easy. It’s the tip of the gh-pages branch we just made. git show-ref will let us know the id.

$ git show-ref -s refs/heads/gh-pages

The next bit is the tree we want to commit. git ls-tree is the tool for the job.

$ git ls-tree -d HEAD jslint4java-docs/src/main/resources
040000 tree 5feb5926c39b5e6af3a51feb04750c819bf08b94	jslint4java-docs/src/main/resources

git commit-tree will return the id of the new commit it just created. All that remains is to update the gh-pages branch to point at it.

Pulling it all together, we have:

parent_sha=$(git show-ref -s refs/heads/gh-pages)
doc_sha=$(git ls-tree -d HEAD jslint4java-docs/src/main/resources | awk '{print $3}')
new_commit=$(echo "Auto-update docs." | git commit-tree $doc_sha -p $parent_sha)
git update-ref refs/heads/gh-pages $new_commit

This isn’t ideal — it won’t automatically track updates to that directory. But it’s easy enough to run this once in a while to publish an update.

The end result is that my documentation is published.


Mercurial support on Google Code

Google Code Blog: Mercurial support for Project Hosting on Google Code: We are happy to announce that Project Hosting on Google Code now supports the Mercurial version control system in addition to Subversion.

This is great news. I love git, but mercurial is also widely used by many projects (Python, Mozilla, …). Bringing DVCSs to a wider audience can only be a good thing. The freedom of making everyone a “committer” is awesome.


Searching through all revisions

A colleague was asking:

I’m trying to work out a technique for searching for an occurrence of a phrase in _all_ revisions of a specific file in a subversion repository. How can I do this?

Of course, in subversion, the answer is slow and complicated. But, you can use git (and git svn in particular) to achieve the answer fairly simply (and much more quickly).

First, you just clone the subversion repository into git. This takes a while, mostly because subversion isn’t that quick.

  git clone -s

Then, you can wrap a little bit of shell scripting around git to get what you need.

  cd proj
  git branch -a | grep tags/ | while read tag
    git --no-pager grep 'something' $tag -- some/file.txt

So, we pull out a list of tags, and run git grep over each one.

There may well be a more effective way to do this, but hey, it took seconds to come up with. And it shows off the reason I like git — it’s so scriptable.


git tree

Just a quick one based on today’s git ready: text-based graph. That describes how to get nice ascii-charts of your git repo. But the command is too long to type. So:

  git config --global alias.tree 'log --graph --pretty=oneline'

Now, I can just do git tree. Lovely.


Github Pages with Maven

Github recently introduced their pages feature, for serving static content. This sounds like an ideal match for a maven , so I thought I’d give it a go. I managed to get up and running in fairly short order, but I’m not totally happy with the end result. And that’s not just the fault of the default maven site.

First things first. We need to set up the gh-pages branch as devoid of content. I nicked most of this from drnic’s sake-tasks, even though the instructions are also on

git symbolic-ref HEAD refs/heads/gh-pages
rm .git/index
git clean -fdx
echo "

Coming soon!

" >index.html git add index.html git commit -a -m 'Initial page' git push origin gh-pages

Soon, you should be able to see the “Coming soon!” message on

Now that’s there, we need to set it up so that maven can deploy the site to it. First, let’s add the branch as a submodule.

repo=`git config remote.origin.url`
git submodule add -b gh-pages $repo site

That should create a directory “site” which contains the index.html you committed a moment ago. Now you have to ask Maven to deploy the site there. Add this to your POM.


Now you can say mvn site-deploy and it will fill up the site directory with lots of lovely HTML. Then you have to commit it (separately, since it’s a submodule) and push it back to github.

  mvn site-deploy
    cd site
    git add .
    git commit -a -m 'Generate site.'
    git push origin gh-pages

And shortly thereafter it shows up on the web site. Magic.

I like this a lot in principal. But I have a few issues with it as well. Mostly, this is down to git submodules being quite a blunt instrument:

  • They don’t auto-update when there’s a new commit in the submodule — they point at a fixed commit. You have to ask for it to be updated.
  • On top of that, you can’t point the submodule at a branch. It’s always pointing at a specific commit.
  • The submodule location is recorded in .gitmodules. The main issue I have with this is I can’t find a way to say “my current repository”. Why is this a problem? Well, if somebody forks the codebase (and they will, this is git) then they want their site to point at their gh-pages branch, not mine.
  • The submodule location is my private push URL, not the public one. Which is all well and good, but when somebody checks out the project, they’re going to have an error when they try to check it out. As I found out when setting up the build in hudson.
  • When somebody else checks out the repository they have an extra step to go through to fetch the submodule contents (git submodule init git submodule update).

I think that next time I try this, I’ll skip submodules and just checkout a second copy of the repo in ../project-site.

I still think it’s the right way to go. Not just for the site, but you can also use github pages to serve up a maven repository in a very similar fashion. Github and maven do seem to go together rather well.


Commit Messages

I’ve been using git for most of this year. It’s been a really excellent ride, and the things it enables are wonderful. But one of the things I’ve found most handy is a convention.

The commit message is treated like an email: one line of subject and then a few paragraphs of explanation. Keep the subject line short so you don’t have scroll bars when they’re displayed.

This works really well for scanning the history of a project — you can get at at a glance overview of what’s changed and then click on a single commit for more detail. I’ve tried to carry this over to subversion projects as well.


The best subversion client

The best subversion client I’ve used to date? git. It’s so script-friendly! This morning somebody asked me for a complete history of a project in CSV format. Using my nicely cloned repository, it was a simple matter of giving the correct format to git log.

  ⇒ git log --pretty='format:%h,%ai,%an,%s' | head -5
  f584913,2008-09-26 21:58:02 +0000,Dominic Mitchell,Pull out a base class for OptionInstances.
  803a32a,2008-09-26 21:57:38 +0000,Dominic Mitchell,Organise imports.
  1cb0132,2008-09-26 21:57:17 +0000,Dominic Mitchell,Switch from a Set of OptionInstance to a Map from Option to OptionInstance.
  a0c1efd,2008-09-26 21:56:56 +0000,Dominic Mitchell,Introduce OptionInstance.
  a211ae3,2008-09-26 21:55:59 +0000,Dominic Mitchell,Use standard idiom for emptying a Set

I love how git embodies the Unix toolkit approach.