OSGI Intro

On Tuesday, I attended the OSGI: Let’s Get Started session with Simon Maple and Zoë Slattery, courtesy of SkillsMatter and LJC. I figured it’s time to figure out what I am supposed to be doing with it. :)

For the last release I enabled OSGI headers for jslint4java. I was hoping that this session would show me how I fared in that.

First, what is OSGI? At the most basic, it’s a way of providing some order and structure to the traditional Java classpath. OSGI achieves this by using bundles.

A bundle is a regular jar file, but with additional metadata in META-INF/MANIFEST.MF. Details like the name, version and dependencies. The dependencies are interesting. A bundle can depend directly on other bundles, but that’s discouraged. A better approach is to specify that you depend on java packages. That way you don’t have to tie yourself to a particular provider of a package.

When OSGI loads in a bundle, it gives each bundle a unique ClassLoader. This means that:

  • You can have multiple versions of bundles loaded simultaneously. You don’t have to force everything to the same version.

  • Each bundle can only see classes that have been explicitly exported by its dependencies, not the whole transitive closure. This is very good for keeping your code clean.

    This also leads to a pattern I’ve seen before in the maven world: separate artifacts for APIs vs implementation. Pulling out interfaces is generally a good idea. But by putting them in a separate OSGI bundle, you enforce that your implementation can remain invisible. Even the “hello OSGI world” demo was shown this way.

On top of this metadata, OSGI provides a runtime for loading and unloading bundles. The runtime also supports the concept of services, where you can ask the runtime for various services. This looks cool, but the dynamicity of it can be hard to deal with—that service you got from the runtime can disappear at any point. There was a demo of something called blueprint, which aims to help, but it looked almost exactly like “more Spring XML” to me. If I was doing this, I’d look at peaberry instead.

How do you go about getting started with OSGI? Well, you could manage the bundle metadata yourself, but it’s much easier to use a tool to do it for you. One such tool was demo’d: bnd. The maven-bundle-plugin that I used for jslint4java builds on bnd.

If you need a runtime for your app, there are two in common use: Equinox and Felix. Equinox is the runtime used by Eclipse.

For followup detail, they recommended checking out anything by Neil Bartlett. It’s a shame he couldn’t make it.

Overall I was pretty impressed. It made me realise that I got the basics right, and I know where I need to go when I need more. Thanks, guys!

Having written all this, I’ve just realised the the wikipedia page on OSGI demonstrates nearly all of it, and with examples.


Java Platform Encoding

This came up at $WORK recently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from inside Java showed that we had double encoded Unicode.

Initially, we just slapped -Dfile.encoding=UTF-8 on the command line. But that failed when the site that called this code went through an automatic restart. So we investigated the issue further.

We quickly found that the presence of absence of the LANG environment variable had a bearing on the matter.

NB: ShowSystemProperties.jar is very simple and just lists all system properties in sorted order.

$ java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode)
$ echo $LANG
$ java -jar ShowSystemProperties.jar | grep encoding
$ LANG= java -jar ShowSystemProperties.jar | grep encoding

So, setting file.encoding works, but there’s an internal property, sun.jnu.encoding as well.

Next, see what happens when we add the explicit override.

$ LANG= java -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding

Hey! sun.jnu.encoding isn’t changing!

Now, as far as I can see, sun.jnu.encoding isn’t actually documented anywhere. So you have to go into the source code for Java (openjdk’s jdk6-b16 in this case) to figure out what’s up.

Let’s start in main(), which is in java.c. Actually, it’s JavaMain() that we’re really interested in. In there you can see:

JavaMain(void * _args)
  jobjectArray mainArgs;

  /* Build argument array */
  mainArgs = NewPlatformStringArray(env, argv, argc);
  if (mainArgs == NULL) {
      goto leave;

NewPlatformStringArray() is defined in java.c and calls NewPlatformString() repeatedly with each command line argument. In turn, that calls new String(byte[], encoding). It gets the encoding from getPlatformEncoding(). That essentially calls System.getProperty("sun.jnu.encoding").

So where does that property get set? If you look in System.c, Java_java_lang_System_initProperties() calls:

    PUTPROP(props, "sun.jnu.encoding", sprops->sun_jnu_encoding);

sprops appears to get set in GetJavaProperties() in java_props_md.c. This interprets various environment variables including the one that control the locale. It appears to pull out everything after the period in the LANG environment variable as the encoding in order to get sun_jnu_encoding.

Phew. So we now know that there is a special property which gets used for interpreting “platform” strings like:

* Command line arguments
* Main class name
* Environment variables

And it can be overridden:

$ LANG= java -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding

The importance of Central

One of the selling points of maven is it’s dependency mechanism. You say what code you need, and maven makes sure it’s there for you. The magic behind this is called central. It’s a phenomenal collection of software (much akin to Perl’s CPAN).

This is useful enough that other projects have sprung up that also use central, like apache ivy.

Central also has source code jars (most of the time). This means it’s possible to jump from your piece of code seamlessly into 3rd party code that’s available on central. I can’t imagine getting up to speed with other people’s code anywhere near as quickly without this sort of facility available. Sure, the source is available, but is it a single click away from your code? Not likely. m2eclipse and NetBeans 6.7 make this trivial.

So, having your open source project on central is a really good idea. It lets your users get at it easily and automatically. You really want to do this.

However, getting your projects on to central isn’t simple (see Guide to uploading artifacts to the Central Repository). It boils down to two choices:

  • Package up your code using mvn repository:bundle-create and file a Jira issue for it. This can take four weeks or more.
  • Set up and host an rsyncable repository somewhere. This can be automatically pulled into central.

Both of these options are fairly painful. That’s why for the latest release of jslint4java, I was pleased to see that Sonatype are offering an alternative: This is much simpler because you can just use the usual maven deployment mechanism (or presumably ivy equivalent). A short while later, will sync up with central.

However, the sonatype guys are (rightly) concerned with the quality of artifacts on central. You have to jump extra hoops to go down this road with your open source projects. The main one that tripped me up was requiring all artifacts be GPG signed, which entailed learning GPG and the maven-gpg-plugin. This took me some time. However, subsequent releasing to central is much easier now I’ve been through all the hoops.

  • If you want to get your open source project on central, I’d definitely recommend checking out
  • If you want to contribute to an open source project, offering to help get it on to central could be very useful. You’ll not just be helping that project, you’ll also be helping all users of central. The bigger it gets the more useful it is.

Using JavaRebel with Cocoon

Normally, the cocoon-maven-plugin includes a reloading classloader, so that changes to class files are automatically picked up when do mvn jetty:run. Just hit refresh and your changes get picked up. It’s just like working in PHP. 🙂

This is OK, but it’s not foolproof. This morning, I saw a few errors of the form “expected class SearchManager, but got class SearchManager”. This is a case of the same class being loaded by a different ClassLoader. Annoyingly, I can no longer reproduce this.

There’s a commercial product, JavaRebel, that aims to do a much better reloading ClassLoader. So, I thought I’d give it a try.

The basic idea to use it is twofold:

  • Include the javarebel jar as an agent.
  • Stop jetty from auto-reloading.

Of course, this being cocoon, we also have to stop the cocoon-maven-plugin from using its reloading classloader.

The javarebel documentation is quite clear on how to configure maven and jetty. But it makes no mention of cocoon (understandably).

Thankfully, it’s all fairly simple to configure with a maven profile. This makes it easy to call from the command line.


With that in place, all that remains is a teeny-tiny shell script to augment the normal call to maven.

MAVEN_OPTS="$MAVEN_OPTS -noverify -javaagent:$javarebel_jar" mvn -Pjavarebel "$@"

With this, you can immediately see that javarebel is enabled, as it spits out a big message at startup time. But more importantly, as soon as I change a spring bean (and reload the page that uses it), I get this on the console:

JavaRebel: Reloading class 'com.example.Spigot'.
JavaRebel-Spring: Reconfiguring bean 'spigot' [com.example.Spigot]

Hurrah — no errors! It all seems to work rather well. I should probably purchase a licence. 🙂

Update: I’ve seen the error again:

Caused by: org.springframework.beans.PropertyBatchUpdateException; nested PropertyAccessExceptions (1) are:
PropertyAccessException 1: org.springframework.beans.TypeMismatchException: Failed to convert property value of type [com.example.MyService] to required type [com.example.MyService] for property ‘myService’; nested exception is java.lang.IllegalArgumentException: Cannot convert value of type [com.example.MyService] to required type [com.example.MyService] for property ‘myService’: no matching editors or conversion strategy found
at org.springframework.beans.AbstractPropertyAccessor.setPropertyValues(
at org.springframework.beans.AbstractPropertyAccessor.setPropertyValues(
… 101 more


Solr’s Lucene Source

I’m debugging a plugin for Solr. I’ve just about got the magic voodoo set up so that I can make Eclipse talk to tomcat and stick breakpoints in and so on. But I’ve immediately run into a problem.

Even though Solr itself comes with -sources jars, the bundled copy of lucene that they’ve used doesn’t. Needless to say, this is a bit of a hindrance.

Thankfully, the apache people have set up, which makes this situation a lot less annoying than it could be.

First, I checked out copies of lucene & solr.

$ git clone git://
$ git clone git://

Now, I need to go into solr and figure out which version of lucene is in use. Unfortunately, it’s not a released version, it’s a snapshot of the lucene trunk at a point in time.

$ cd …/solr
$ git branch -r
  origin/HEAD -> origin/trunk
$ git whatchanged origin/tags/release-1.3.0 lib
commit 904e378b7b4fd18232f657c9daf484a3e63b272c
Author: Yonik Seeley 
Date:   Wed Sep 3 20:31:42 2008 +0000

    lucene update 2.4-dev r691741

    git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68

:100644 100644 a297b74... 54442dc... M  lib/lucene-analyzers-2.4-dev.jar
:100644 100644 596625b... 5c6e003... M  lib/lucene-core-2.4-dev.jar
:100644 100644 db13718... f0f93a7... M  lib/lucene-highlighter-2.4-dev.jar
:100644 100644 50c8cb4... a599f43... M  lib/lucene-memory-2.4-dev.jar
:100644 100644 aef3fb8... 79feaef... M  lib/lucene-queries-2.4-dev.jar
:100644 100644 1c733b9... 440fa4e... M  lib/lucene-snowball-2.4-dev.jar
:100644 100644 0195fa2... b5ff08b... M  lib/lucene-spellchecker-2.4-dev.jar

So, the last change to lucene was taking a copy of r691741 of lucene’s trunk. So, lets go over there. And see what that looks like.

$ cd …/lucene
$ git log --grep=691741

Except that doesn’t return anything. Because there was no lucene commit at that revision in the original repository (it was something to do with geronimo). So we need to search backwards for the commit nearest to that revision. Thankfully, git svn includes the original subversion revision numbers of each commit.

$ cd …/lucene
$ git log | perl -lne 'if (m/git-svn-id:.*@(d+)/ && $1 <= 691741){print $1; exit}'

So now we can go back and find the git commit id that corresponds.

$ cd …/lucene
$ git log --grep=691694
commit 71afff2cebd022fe63bdf2ec4b87aaa0cee41dc8
Author: Michael McCandless 
Date:   Wed Sep 3 17:34:29 2008 +0000

    LUCENE-1374: fix test case to close reader/writer in try/finally; add assert b!=null in RAMOutputStream.writeBytes (matches FSIndexOutput which hits NPE)

    git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68

Hurrah! Now I can checkout the same version of Lucene that’s in Solr. But, probably more useful for Eclipse, is just to zip it up somewhere.

$ cd …/lucene
$ git archive --format=zip 71afff2 >/tmp/

Excellent. Now I can resume my debugging session. 🙂

NB: I could have just used subversion to check out the correct revision of Lucene. But, I find it quicker to use git to clone the repository, and I get the added benefit that I now have the whole lucene history available. So I can quickly see why something was changed.


The Maven Ecosystem

Last night I went to see Jason van Zyl of sonatype talking about various bits of the maven ecosystem, and where they’re going. The main bit for me was what’s coming up in maven 3.0. There was a great deal of talk about OSGI related issues, but it reinforced my belief that whilst there’s some good technology in there, it’s still quite complicated to use and manage. Steps are being taken to address this (better tooling support), but they’re not there yet. Also, for the kind of things I do (simple, content-driven, somewhat static webapps), it doesn’t seem to be necessary anyway.

So what’s coming up in maven 3.0? Fundamentally, there won’t be that many new user-visible features (wait for 3.1!). Internally, there have been huge refactorings by the sound of things (along with integration tests to ensure no user-visible regressions). They’re switching away from plexus and towards guice + peaberry. But that’s internal detail. And in theory, it shouldn’t matter even if you’re a plugin author.

What sounded really nice was the focus on making life much easier for users of the embedded maven. Primarily, this means IDE authors. Things like plugin & lifecycle extension points, and incremental build support should allow m2eclipse to be much, much more intelligent about the work they do. Jason mentioned that a version of m2eclipse which builds on the trunk of maven 3.0 can now build the trunk of maven in seconds rather than minutes. Why? Because it’s not duplicating work that’s already been done by Eclipse.

The main change is to the artifact resolution system. It’s been one of the main source of bugs in maven 2.0. It’s been completely junked in 3.0 and replaced with mercury, which handles both transport and resolving artifacts. It should be better tested, and things like version ranges much closer to how OSGI does things.

One (minor) change is that the error messages should be much better. That’s a welcome relief.

There are other tidbits that I think are scheduled for 3.1 that should be really nice:

  • everybody’s favourite: versionless parent elements
  • attributes in the POM — hooray, that should make POMs vastly smaller.
  • mixin POMs — should allow much more flexibility in constructing dependencies on both groups of artifacts and groups of plugins.

There were further talks about hudson & nexus, but I’m fairly familiar with these, so I didn’t see much of news to me.

My thanks go to Peter Pilgrim for organising and EMC/Comchango for hosting.


dependency complexity

I love the google-collections library. It’s got some really nice features. But, it’s not stable yet. They’ve explicitly stated that until they hit 1.0 it’s not going to be a stable API. So there are changes each release. Nothing major, but changes.

As an example, in the jump from 0.9 to 1.0rc1, the static methods on the Join class became the fluent API on the Joiner class.

(as an aside, could we have some tags, please?)

Following this change is simple.

@@ -310,7 +310,7 @@
         } catch (KeyStoreException e) {
             throw new RuntimeException(e);
-        return Join.join(" | ", principals);
+        return Joiner.on(" | ").join(principals);


But the knock-on effect comes when you start getting lots of things which have google-collections dependencies. At $WORK, I’ve got a project whose dependencies look like this.


I wanted to extract a part of DC2 into its own library, commslib. This was pretty easy as the code was self contained. Naturally, I wanted it to use the latest version of everything, so I upgraded google-collections to 1.0rc1. Again, fairly simple.

This is what I ended up with.


Except that now there’s a problem.

  • commslib uses Joiner, so it’ll blow up unless it upgrade DC2‘s google-collections to 1.0rc1.
  • GSK uses Join, so it’ll blow up if I upgrade DC2‘s google-collections to 1.0rc1.

And thus have I painted myself into a corner. 🙂

As it happens, DC2 had a dependencyManagement section forcing everything to use google-collections 0.8. → Instant BOOM.

The solution is to upgrade all my dependencies to use google-collections 1.0rc1. But this turns out to be a much larger change than I had originally envisaged, as now I have to create releases for two dependent projects. This isn’t too much of a hassle in this case (yay for the maven-release-plugin), but it could be a large undertaking if either of those projects is not presently in a releasable state.

I’m not trying to pick on google-collections (I still love it). I’m just marvelling at how quickly complexity can blossom from something so simple.


java -jar blats your classpath

This tripped up a colleague today. When he mentioned it, about three other people (myself included) piped up with oh yeah, that caught me out a while back.

  java -cp 'a.jar:b.jar' -jar foo.jar

This completely ignores the classpath. It’s not blindingly obvious that this is going on. If you manage to wind your way down to java(1), you do see:


Execute a program encapsulated in a JAR file. The first argument is the name of a JAR file instead of a startup class name. In order for this option to work, the manifest of the JAR file must contain a line of the form Main-Class: classname. Here, classname identifies the class having the public static void main(String[] args) method that serves as your application’s starting point. See the Jar tool reference page and the Jar trail of the Java Tutorial for information about working with Jar files and Jar-file manifests.

When you use this option, the JAR file is the source of all user classes, and other user class path settings are ignored.

Note that JAR files that can be run with the “java -jar” option can have their execute permissions set so they can be run without using “java -jar”. Refer to Java Archive (JAR) Files.

(underlining mine)

This is fairly clear, but unless you’re used to looking for man pages, this might well not be found. In particular, some indication on the java -help output would be a boon.


More Java Memory Analysis

If you found Heap Dump Analysis interesting, you might also be interested in Identifying ThreadLocal Memory Leaks in JavaEE Web Apps from Igo Molnar. I saw this a day or so after my original post. He uses The eclipse memory analysis tool (MAT) to figure out what’s going on.

In his case, he couldn’t use visualvm as the heap dump was too large.

I downloaded MAT for my problem and had a look at it. It seems like it does much more than visualvm (there are may “pre-canned” queries available, plus an “object query language”). But that makes it correspondingly harder to use. I spent a while going through the tutorials to get the hang of it, whereas visualvm seemed to come together quickly for me. I suspect that if I ploughed more time into it, I’d get a lot more out of it. It seems to be something that I should learn before I get to need it again. 🙂


Heap Dump Analysis

As part of the investigation into cocoon memory usage, I had to try and understand what was going on inside the JVM. The best way to do that is a heap dump. The general idea is to dump out the contents of the JVM’s memory into a file, then analyse it using a tool.

First, getting a heap dump. This used to be something of a pain, but Java 6 now includes the jmap -dump command. This allows you to (relatively) easily extract a heap dump from a running JVM.

  % pid=$(cat /some/where/pid.txt)
  % jmap=$JAVA_HOME/bin/jmap
  % sudo $jmap -F -dump:live,format=b,file=mla-post-cache-clear.hprof $pid

That takes a little while (a minute or two) to run, and ended up with a 500Mb file. It’s best to bring it back to your workstation in order to look through. Thankfully it compresses quite nicely.

  % time rsync -avz server:mla-post-cache-clear.hprof .
  receiving file list ... done

  sent 42 bytes  received 72988157 bytes  3105880.81 bytes/sec
  total size is 580115453  speedup is 7.95
  rsync -avz server:mla-post-cache-clear.hprof .  8.42s user 2.85s system 47% cpu 23.643 total

Now things get interesting. We can fire up visualvm. Inside visualvm, do File → Load… and point it at the heap dump file we just obtained. You should end up with something like this:

VisualVM 1.1 heap dump summary

Now this is interesting, but not that useful. If you click on the classes button, you see this:

VisualVM 1.1 classes in heap dump

What’s really interesting about that is the byte[] line. Those instances are only 2% of objects, but are taking up 67% of memory! If you double click on that line, you get taken through to the instances view.

VisualVM 1.1 instances inheap dump

There are three panels here:

  1. The left hand view is a list of all the instances, ordered by size. In this case the top instances are the same size, which is quite suspicious.
  2. The top right hand view is a breakdown of the fields in the object. Not that useful for a byte[].
  3. The bottom right hand view is the useful one — it lists all references to this instance. Here it’s showing that thi byte[] is linked to a ByteArrayOutputStream.

Now, you can expand the references to that byte[] to find out where it’s being referenced. But there’s a really nifty trick. If you right-click, you get an option to “Show Nearest GC Root”. And this is what it shows (after a little while).

VisualVM 1.1 nearest GC root

This isn’t the simplest thing to interpret. But, it does show that the HttpServletResponseBufferingWrapper is holding the byte[] itself. Whilst that servlet response is held by a TraxTransformer. That sounds like a good place to begin investigating. Why is TraxTransformer holding on to ServletResponses?