Jabbering Giraffe

Solr's Lucene Source

I’m debugging a plugin for Solr. I’ve just about got the magic voodoo set up so that I can make Eclipse talk to tomcat and stick breakpoints in and so on. But I’ve immediately run into a problem.

Even though Solr itself comes with -sources jars, the bundled copy of lucene that they’ve used doesn’t. Needless to say, this is a bit of a hindrance.

Thankfully, the apache people have set up git.apache.org, which makes this situation a lot less annoying than it could be.

First, I checked out copies of lucene & solr.

$ git clone git://git.apache.org/solr.git
$ git clone git://git.apache.org/lucene.git

Now, I need to go into solr and figure out which version of lucene is in use. Unfortunately, it’s not a released version, it’s a snapshot of the lucene trunk at a point in time.

$ cd …/solr
$ git branch -r
  origin/HEAD -> origin/trunk
  origin/branch-1.1
  origin/branch-1.2
  origin/branch-1.3
  origin/sandbox
  origin/solr-ruby-refactoring
  origin/tags/release-1.1.0
  origin/tags/release-1.2.0
  origin/tags/release-1.3.0
  origin/trunk
$ git whatchanged origin/tags/release-1.3.0 lib
…
commit 904e378b7b4fd18232f657c9daf484a3e63b272c
Author: Yonik Seeley 
Date:   Wed Sep 3 20:31:42 2008 +0000

    lucene update 2.4-dev r691741

    git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3@691758 13f79535-47bb-0310-9956-ffa450edef68

:100644 100644 a297b74... 54442dc... M  lib/lucene-analyzers-2.4-dev.jar
:100644 100644 596625b... 5c6e003... M  lib/lucene-core-2.4-dev.jar
:100644 100644 db13718... f0f93a7... M  lib/lucene-highlighter-2.4-dev.jar
:100644 100644 50c8cb4... a599f43... M  lib/lucene-memory-2.4-dev.jar
:100644 100644 aef3fb8... 79feaef... M  lib/lucene-queries-2.4-dev.jar
:100644 100644 1c733b9... 440fa4e... M  lib/lucene-snowball-2.4-dev.jar
:100644 100644 0195fa2... b5ff08b... M  lib/lucene-spellchecker-2.4-dev.jar
…

So, the last change to lucene was taking a copy of r691741 of lucene’s trunk. So, lets go over there. And see what that looks like.

$ cd …/lucene
$ git log --grep=691741

Except that doesn’t return anything. Because there was no lucene commit at that revision in the original repository (it was something to do with geronimo). So we need to search backwards for the commit nearest to that revision. Thankfully, git svn includes the original subversion revision numbers of each commit.

$ cd …/lucene
$ git log | perl -lne 'if (m/git-svn-id:.*@(\d+)/ && $1 

So now we can go back and find the git commit id that corresponds.

$ cd …/lucene
$ git log --grep=691694
commit 71afff2cebd022fe63bdf2ec4b87aaa0cee41dc8
Author: Michael McCandless 
Date:   Wed Sep 3 17:34:29 2008 +0000

    LUCENE-1374: fix test case to close reader/writer in try/finally; add assert b!=null in RAMOutputStream.writeBytes (matches FSIndexOutput which hits NPE)

    git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@691694 13f79535-47bb-0310-9956-ffa450edef68

Hurrah! Now I can checkout the same version of Lucene that's in Solr. But, probably more useful for Eclipse, is just to zip it up somewhere.

$ cd …/lucene
$ git archive --format=zip 71afff2 >/tmp/lucene-2.4-r691741.zip

Excellent. Now I can resume my debugging session. 🙂

NB: I could have just used subversion to check out the correct revision of Lucene. But, I find it quicker to use git to clone the repository, and I get the added benefit that I now have the whole lucene history available. So I can quickly see why something was changed.