If you’re happy and you know it…

Sometimes I come home from work in a bad mood. Then I realise I have a roof over my head, I’m well fed and I have a beautiful wife and daughter. It’s easy to forget the simple things.


SAX EntityResolver

I was trying to resolve entities (&weirdChar;) in an XML file. Easy enough, use a validating parser. But here’s the tricky bit: get the entity definitions from the classpath. This should still be easy, as SAX provides an EntityResolver.

Unfortunately, the interactions between JAXP and SAX make life complicated. I found that you have to ignore the SAXParser (from JAXP) and instead focus on the XMLReader interface (part of plain old SAX).

This is what I came up with. First, a small driver.

  public void parseIt() {
    SAXParserFactory spf = SAXParserFactory.newInstance();
    XMLReader reader = spf.newSAXParser().getXMLReader();
    reader.setEntityResolver(new MyResolver());
    // Look for test.xml on the classpath.
    InputStream testXmlStream = App.class.getClassLoader().getResourceAsStream("test.xml");
    reader.parse(new InputSource(testXmlStream));

That references the EntityResolver implementation I wrote:

  class MyResolver implements EntityResolver2 {
    public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId)
      throws SAXException, IOException {
      InputStream stream = getClass().getClassLoader().getResourceAsStream(systemId);
      return new InputSource(stream);

Actually, I had to use EntityResolver2 for reasons I don’t entirely understand.

On top of this, I found that I had to include xerces 2.8 explicitly as a dependency. The version bundled with Java 1.5 is Xerces 2.6.2, which has a bug: It passes the entity resolver an absolutized systemId. Which makes it very difficult to resolver further. What a pain in the arse.

But it does now work, and I can successfully resolve entities off the classpath.


JavaScript Scope

This is an issue that popped up with a colleague yesterday. Roughly, there was some code like this.

  function setUpStuff() {
    var items = cssQuery("#stuff p");
    for (var i = 0; i < items.length; i++) {
      var item = items[i];
      item.onclick = function() {
        alert("item is " + item);

Unfortunately, even if cssQuery() returns a list of three items, every call to alert() will show the last item. Why?

It’s all down to the fact that JavaScript doesn’t have block-scoped variables. It only has function-scope.

In the context of the code above, it means that item exists from the point of entry to setUpStuff(). It’s just undefined until the we get into the loop. The consequence of this is that each time we assign onclick, we’re referring to the same variable.

The solution is to create another function scope, of course.

  function setUpStuff() {
    var items = cssQuery("#stuff p");
    for (var i = 0; i < items.length; i++) {

  functon setUpAStuff(item) {
    item.onclick = function() {
      alert("item is " + item);

This now works as expected, because inside setUpAStuff(), we’re referring to a different variable each time (you could also use an anonymous function). This is definitely one of the more confusing areas of JavaScript (though JSLint does pick up on this).

* Block-Structured JavaScript
* Core JavaScript 1.5 Guide:Block Statement (on devmo)


Logging in Cocoon 2.2

I’ve had to try and understand logging in Cocoon 2.2 for a project at work recently. It’s been “interesting,” so I thought I’d blog the process in case anybody else needs to o this…

Normally, logging in Java is quite simple: you add log4j to your classpath, then create a to say what gets logged. If you’re running as part of a webapp, you use something like Spring’s Log4jConfigListener to ensure that the configuration gets applied as soon as the webapp is active.

Cocoon is different. By default (in development) you run it using a combination of the cocoon-maven-plugin and mvn jetty:run. This is quite cunning (as Grzegorz explained in a comment a while back), it lets you edit all sorts of stuff and have it dynamically reloaded. In order to make the cocoon “block” work with jetty, the maven plugin creates things like web.xml for you automatically, so there’s no chance to edit things. Drat.

Now, if you follow the documentation for logging in Cocoon, it advises:

The usual Cocoon web application sets up Log4j through the Cocoon Spring Configurator.

Lovely advice, except that by the time Spring has started up, read your logging configuration and applied it, a great deal of interesting events have already occurred. You really need to enable logging as early as possible using a ServletContextListener.

Thankfully, it’s possible to do so, even when using mvn jetty:run.

First, you need to create a log4j configuration that’s suitable. I think it has to be XML, which is a shame as it’s more complicated than the properties file. I wanted to change the defaults to get my FlowScript calls to log to the console. This is what I ended up with in etc/log4j.xml.


Note that we shut everything up (WARN) by default, and then explicitly enable messages for things we want to see (INFO). I’ve found that even in development, this helps to tell the wood from the trees.

With that in place, you have to edit pom.xml in order to tell the cocoon-maven-plugin to use that instead if its default. The default pom should have a build/plugins/plugin section, to which this stanza needs adding.


Finally, you need to arrange for the auto-generated web.xml to be patched with a reference to Log4jConfigListener. This is done through Cocoon’s slightly arcane mechanism, xpatch. Create a file src/main/resources/META-INF/cocoon/xpatch/log4j.xweb which looks like this.


Now if you run mvn jetty:run in your block and inspect the generated web.xml, you should see the above patched in to place. Also, you should be able to generate messages on the console from within FlowScript by doing:"hello world");

The procedure above is a hassle. But the benefit of being able to see logging messages coming out on the console in front of you is significant.

One final point to note. When you do run mvn jetty:run, you’ll see a few log4j errors, i.e.

log4j:WARN No appenders could be found for logger (org.apache.commons.configuration.ConfigurationUtils).
log4j:WARN Please initialize the log4j system properly.

log4j:WARN No appenders could be found for logger (org.apache.commons.jci.stores.MemoryResourceStore).
log4j:WARN Please initialize the log4j system properly.

As far as I can tell these are completely ignorable, just very annoying. They appear to happen before jetty itself starts up, and are irrelevant to the web app (as far as I can see).


git branches with subversion

I like using git, particularly in combination with git svn. It makes it really easy to work with version control offline. But there’s a problem: branches.

Now git is really good at using branches. Unfortunately, git svn can’t cope very well with pushing one of git’s merges back into subversion. It gets really confused. Trust me.

Thankfully, I’ve found an easy way to do it: patches.

In the git format-patch man page, there’s a useful example:

  % git format-patch -k --stdout R1..R2 | git-am -3 -k

That is, make a mailbox full of all the changes between R1 and R2, then apply them to the current checked out branch. Essentially, “copy the changes on that branch to this one”.

I just needed to do this with a project I’ve been playing with. I’d been working on a branch proper-xml-generation. In order to merge it, I had to:

  1. Rebase that branch on the master, so I know that there won’t be any conflicts.
    • git checkout proper-xml-generation; git rebase master
  2. Switch back to the master branch.
    • git checkout master
  3. Copy the patches.
    • git format-patch -k --stdout master..proper-xml-generation | git am -3 -k
  4. Pump the results back into subversion.
    • git svn dcommit

And hey presto, the branch is back in subversion. It looks a bit weird having 14 commits in a few seconds though.

The main disadvantage of this is that it’s pretty much a one-time push back into subversion. You don’t get all the nice usual features of git, where you can make more changes on the branch and merge them. But it’s been sufficient for me for a little while now, so I thought I’d share it.


All Change!

So I got fed up with typo, for one minor reason and another (mostly my fault, I confess). Frankly, I’m just more used to wordpress, as I use it at work (blogs make great notepads). So, I decided to switch. It’s been an interesting exercise…

The bulk of the work is building a converter. Thankfully, typo is built on rails, so it was relatively easy to hack on. I looked through wordpress and reckoned that the easiest way to get data in was through the wordpress backup mechanism, WXR (WordPress eXtended RSS). There’s a dearth of information about this format, but after taking a quick look, I figured it wouldn’t be too hard to get typo to spit out something similar. It was basically RSS with a few extra elements.

The initial work of producing the WXR feed from typo went quite quickly. However, when I tried to import it into a test wordpress installation, the problems started. It turns out that wordpress doesn’t use an XML parser for reading in WXR. It uses regexes.

As any fule no, parsing XML with regexes is bound to lead to pain. And indeed it is. Getting builder.rb to emit non-indented tags (as wordpress desires) when it’s in the middle of emitting indented stuff is yucky.! << '<< 'CDATA['! << html(item)! << ']]'! << '>'

Anyway, I thought I’d cracked the problem, once I got the export working. I immediately saw that all the XML in several posts was coming out double encoded. Doh. One quick wordpress plugin fixed the issue quite neatly (wp-correct-syntax-xml.php).

PHP is very amenable to cut’n’paste.

Then I remembered to look through my access logs. And I realised quite how many RSS and Atom feeds I publish. Much time later, I now have 15 hairy RedirectMatch permanent beckoning feed readers into the correct place.

Then I had a final thought. The IDs of the articles are going to be different in the feeds. In order to correct that, I had to go back to the export and insert the correct guid elements. This led me to a slight wild goose chase in typo. If you look in app/views/xml/_atom10_item_article.atom.builder, you see: "urn:uuid:#{item.guid}"

And the IDs that I was seeing coming out in the feed were Tag URIs. It took me a while to track down that the default feed for typo takes an entirely different route going through articles_controller.rb and into app/views/articles/_atom_feed.atom.builder, which in turn makes use of Rails’ AtomFeedBuilder::entry. That’s what spits out the Tag URI."tag:#{},#{@feed_options[:schema_date]}:#{record.class}/#{}")

Then I realised that the RSS feed was spitting out UUIDs instead of Tag URIs. Sigh. But I looked at the stats and I’m serving more than twice as much Atom as RSS, so I decided to stick with Tag URIs.

Finally, I pulled the big switch, and hit refresh in NetNewsWire. And all of my posts are marked as “new”. Bloody hell, I have no idea what’s caused that. It’s too late and I give up for now. Apart from that, things mostly seem to be working.

In case anybody else is interested, here’s my patch against typo to produce WXR feeds. It’s very simplistic, has no tests and is strictly “works well enough for me”.

Update: I don’t know how, but I completely managed to miss the typo-wordpress converter.  This would have been much simpler.  Although as I was migrating off of textile at the time, it made sense for me to export HTML.  Ah well, it was an interesting learning experience.


Bootstrapping Spring

Recently, I’ve been converting a project to use Spring. My main method looks something like this.

  ApplicationContext context = new ClassPathXmlApplicationContext("context.xml");
  Main main = (Main) context.getBean("main");;

There’s a problem though. Several of my beans need to be configurable. Normally, I’d resolve this by using a PropertyPlaceholderConfigurer (or more likely the context:property-placeholder element). This lets me have values like ${baseDirectory} in my spring configuration.

But PropertyPlaceholderConfigurer takes a fixed location. I want that location to be specified on the command line.

This is something of a conundrum. But then I noticed the method addBeanFactoryPostProcessor() on ConfigurableApplicationContext. Now, I can make things work. This is what the code looks like now.

  PropertyPlaceholderConfigurer ppc = new PropertyPlaceholderConfigurer();
  ppc.setLocation(new FileSystemResource(args[0]));

  ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext();

  Main main = (Main) context.getBean("main");;

Whilst it’s not as simple as before, it’s now hugely more flexible.


Exceptional Origins

I’ve just noticed something rather nice in Eclipse. The “Mark Occurrences” feature (Mark Occurrences) will show you where an exception is thrown. For example, here I’ve clicked on the IOEXception in the method definition.

Eclipse highlights where exceptions are thrown

You can clearly see that read(), write(), flush() and close() are points at which the IOEXception can be thrown.

Similarly, you can highlight the return type of a method to see all the exit points of a method.

Eclipse highlights all return sites in a method

One final tip about Mark Occurrences: You can optionally select occurrences in the “Next / Previous” toolbar buttons. i.e.

Enabling “next occurrence” in eclipse

Doing this allows the Next key (Command-. on my Mac, likely Ctrl-. on Windows) to jump between occurrences. So you can click on a method name and cycle through all mentions of that method in a file. Very handy. You’ll notice that it works for compile errors too.

I have to confess that I used to find Mark Occurrences quite irritating. Now I know what it can do for me, I’m a much happier punter.


Dates & Time Zones in Java

I’ve just found an annoying bug in a product at work. I was formatting a date (as part of a test) and expecting to get back a known value. The easiest way to do this is:

  Date when = new Date(0); // 1970-01-01 00:00:00
  SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd't'HH:mm:ss");
  String whenStr = fmt.format(when);

Sadly, this produced a date which was an hour out.

Thankfully, the fix is simple (when you know how).

  Date when = new Date(0); // 1970-01-01 00:00:00
  SimpleDateFormat fmt = new SimpleDateFormat("yyyy-MM-dd't'HH:mm:ss");
  String whenStr = fmt.format(when);

It’s usually a good idea to always work in GMT. Or UTC. Or Zulu time.

On a slightly related note, I really hope that JSR 310 gets in to Java 7 (though don’t get me started on the opaqueness of the JSR process). It’s a fresh DateTime api based upon joda time. Anything is better than Date and Calendar, let’s hope it does what DateTime did for Perl.


Exceptional Eclipse Tip

By default, when eclipse creates a try/catch block for you, you end up with something like this:

  try {
  catch (EvilException e) {
    // TODO auto-generated catch block.

This is worse than useless, as it (effectively) covers up the exception1. A far better default choice is to wrap the checked exception in a RuntimeException if you don’t know what to do with it.

Thankfully, it’s fairly easy to arrange this in eclipse. Go to Preferences → Java → Code Style → Code Templates and edit the “catch block body” fragment. It should look like this:


I’m going to try and spread this around the office a bit. It should make for some slightly more robust code…

1 Please don’t remind me that it comes out on the console—people are very good at ignoring that.