JavaScript URI Objects

I started looking at a dashcode project the other day. Within a few minutes, I realised that I needed something like, in order to resolve URIs into an absolute form. I had a quick look around and there didn’t seem to be anything code out there that does this.

So, I had a quick look at RFC 3986 and noticed that amongst other things it contains pseudo-code for parsing and resolving URLs. I coded it up first in Ruby, which took the pseudo-code almost word for word, and made some tests for it. Then I translated into JavaScript and figured out how to use Test.Simple.

The end result is js-uri, a small URI object. It does what I need at the moment.

  // Parsing.
  var some_uri = new URI("");
  alert(some_uri.authority); //
  alert(some_uri);           //

  // Resolving.
  var blah      = new URI("blah");
  var blah_full = blah.resolve(some_uri);
  alert(blah_full);         //

There’s quite a bit it doesn’t do yet.

  • %-escaping and unescaping.
  • Parsing the URL in more detail. It would be useful to pull out the query parameters and host / port for instance.
  • For some reason, the tests only work in Firefox. Not sure why.

But hopefully it’s still useful.

In the process of developing the JavaScript, I spent ages tracking down one particular problem. I was trying to introduce new scope to define a couple of private helper functions:

  (function () {
      function merge(base, rel_path) { ... }

      function remove_dot_segments(path) { ... }

      URI.prototype.resolve = function (base) {
          var target = new URI();
          // ...
          return target;

Looks perfectly legit right? It caused syntax errors in every browser I tried. Eventually, JSLint managed to tell me the problem: a missing semicolon after the URI.prototype.resolve definition. I hate JavaScript for trying to guess where the semicolon should go. Because it can get it so wrong like this.


Dynamic JUnit

Recently, I wanted to do something slightly unusual with JUnit. I’m working on a cocoon project, so there are squillions of little XML files floating around. These need to be all well-formed. So I want a test that parses each one. Then, CruiseControl can let us know when they get broken.

First, I gave the task to a colleague. He came up with something that checked each file, and returned a list of the bad ones. It then asserted that the nonWellFormed list had a zero length. Which is great and all, but didn’t tell you which file was broken, nor why.

What I really wanted to do was have a single test per file, so it could display the errors correctly. This seemed like an easy thing to do… Until I tried it. This is what I eventually arrived at:

  class WellFormedTest extends TestCase {
    public static Test suite() {
        final TestSuite suite = new TestSuite(WellFormedTest.class.getCanonicalName());
        // Stupid bloody Java regexes have to match from the beginning of the
        // string.
        Pattern p = Pattern.compile(".*\.(xml|xslt|xconf|xmap)$");
        FindFiles ff = new FindFiles(p) {
            protected void processFile(final File file) {
                suite.addTest(new WellFormedTest(file));
        return suite;

    private File file;

    public WellFormedTest(File file) {
      super("Well-Formed? " + file.toString());
      this.file = file;

    protected void runTest() throws Throwable {
      XmlValidator validator = new XmlValidator();
      String result = validator.isWellFormed(file);
      assertEquals(file.toString(), null, result);
  • FindFiles is a utility class to walk a directory tree. Tell me again why Java doesn’t have something this basic in it’s vast class libraries?
  • You have to call super("blah") in your constructor to name each test sensibly.
  • But if you do this, you have to override runTest() in order for things to actually work. The usual mechanism for determining which tests to run doesn’t work if you supply a custom name. This took forever to work out and required delving into the JUnit source. Halleluljah for Open Source.
    • As part of prodding around in the debugger, I noticed that JUnit creates a new TestCase object for each test in the class. So it’s OK to just do one thing in runTest(), as that’s all that’s going to happen anyway.
  • XmlValidator is another custom helper class. It just parses the file and returns a String containing the error (or null).
  • Yes, this is JUnit 3.8. I know I need to migrate to JUnit 4. That’s a battle for another day, dependent on upgrading ant first.

Originally, I tried to get the test done inside a nested anonymous subclass of TestCase, but there’s no constructor there, so that doesn’t work too well. Plus it bumps the ugliness of the source another level.

The end result works quite well and provides a useful example for doing dynamic tests with JUnit.


SmallTalk @ FP

I’ve just had a group introduction to Smalltalk. Piers was visiting Brighton and decided to teach us all about the roots of OO programming. In lieu of the usual 5-minute-on-keyboard sessions, Piers gave us a walk through of developing kata 4 in smalltalk.

This was a superb way to learn the language. There was more heckling than usual, but that was an important and necessary part of the process in this case. Watching smalltalk in action is absolutely necessary—it’s just such a visual experience (compared to regular programming languages, anyway). I now completely understand the smalltalk credo I’d read about: just type in what you want to happen, hit “run test” and when it breaks, write the code in the debugger window that pops up.

Just before half time, Piers started pairing properly. That was when the language differences really started to hit home. Smalltalk is so not an algol-influenced language. In particular, I adore the cascade:

  fromRow: aString
    ^ self basicNew;
      setRow: aString;

That semicolon is calling methods (sorry, passing messages) to the same object after each semicolon. The statement is only terminated with a period. Conceptually, it feels similar to Perl’s $_ (or “it”).

As usual, we didn’t get particularly far in the task, but we did learn a huge amount. A big thanks to Piers for visiting us and also to Devi for cat-herding in Joh’s absence (and doing an excellent job of keeping us on-topic).

Now, I’m off to download Squeak



After being a little bit ranty about cocoon yesterday, I thought I’d better take a closer look at Cocoon 2.2. But that means getting to know Maven first…

The premise of Maven is simple. Instead of having a scriptable build system (as with ant), the build system knows how to do things. And it does them in a standard fashion. You might call this convention over configuration. I like this idea, it simplifies things a lot.

To use Maven, you create a pom.xml (Project Object Model) file and then run mvn package. You should end up with a jar file for your project. This is a sample POM file:

    <description>My first Maven project???</description>

Ah, but how did I get this far? As it turns out, Maven has the equivalent of the rails command to generate new projects for you. Maven calls it an archetype. Here’s a simple example of generating a new project from an archetype from the tutorial:

  mvn archetype:create 

Piece of cake, huh? And guess what? mvn --help gives no clue about any of that.

The second annoyance is quite how verbose Maven is. I know I’m an old-school Unix guy (silence is golden), but this is taking the piss:

  % mvn package
  [INFO] Scanning for projects...
  [INFO] ----------------------------------------------------------------------------
  [INFO] Building my-app
  [INFO]    task-segment: [package]
  [INFO] ----------------------------------------------------------------------------
  [INFO] [resources:resources]
  [INFO] Using default encoding to copy filtered resources.
  [INFO] [compiler:compile]
  [INFO] Compiling 1 source file to /Users/dom/work/my-app/target/classes
  [INFO] [resources:testResources]
  [INFO] Using default encoding to copy filtered resources.
  [INFO] [compiler:testCompile]
  [INFO] Compiling 1 source file to /Users/dom/work/my-app/target/test-classes
  [INFO] [surefire:test]
  [INFO] Surefire report directory: /Users/dom/work/my-app/target/surefire-reports

   T E S T S
  Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.164 sec

  Results :

  Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

  [INFO] [jar:jar]
  [INFO] Building jar: /Users/dom/work/my-app/target/my-app-1.0-SNAPSHOT.jar
  [INFO] ------------------------------------------------------------------------
  [INFO] ------------------------------------------------------------------------
  [INFO] Total time: 9 seconds
  [INFO] Finished at: Tue Apr 17 08:51:40 BST 2007
  [INFO] Final Memory: 5M/14M
  [INFO] ------------------------------------------------------------------------

What’s worse are the error messages. Check this out:

  % mvn pakage
  [INFO] Scanning for projects...
  [INFO] ------------------------------------------------------------------------
  [INFO] ------------------------------------------------------------------------
  [INFO] Invalid task 'pakage': you must specify a valid lifecycle phase, or a goal in the format plugin:goal or pluginGroupId:pluginArtifactId:pluginVersion:goal
  [INFO] ------------------------------------------------------------------------
  [INFO] For more information, run Maven with the -e switch
  [INFO] ------------------------------------------------------------------------
  [INFO] Total time: < 1 second
  [INFO] Finished at: Tue Apr 17 08:53:08 BST 2007
  [INFO] Final Memory: 1M/2M
  [INFO] ------------------------------------------------------------------------

So buried somewhere in there is the problem (“pakage” instead of “package”), but it doesn’t even warrant an error? Pretty unfriendly behaviour.

The third problem is shown up the first time you run a Maven command. I didn’t show it above because I’ve been playing with it already. But most of Maven’s behaviour is implemented as plugins. And these plugins are downloaded on first use. The Maven download itself is only about 1Mb.

Now, I don’t have a problem with plugins, or them being downloaded. But, for a fresh install, I would really rather be able to unpack and go. I don’t always have Internet access. On top of this, I have no idea about the provenance of the code that’s being downloaded. And the source code isn’t automatically brought along with it (unlike CPAN or rubygems). So you really have no idea what’s going on and you just have to trust that it’s doing the right thing. Trust is earned, though. I don’t trust Maven to do the right thing yet.

With all those complaints, I still think Maven is useful. The dependency management (ability to pull in dependent jars) is fantastic. The standardization of project layout is a boon. The ability to simply produce code quality reports is kind of handy. The Maven eclipse plugin is pretty reasonable. Indeed, the eclipse support for Maven itself (used to create the .project and .classpath files for Eclipse) is great.

But if Maven wants to see wider adoption, it really, really needs to spend some time working on the UI issues. They’re not easily dismissable just because this is a developer tool.

Anyway, after working my way through the tutorial I’m comfortable enough to start playing with cocoon 2.2 now.



Surprising as it may seem if you only read this blog, I don’t do much Perl or Ruby or Rails. I try to in my spare time, but it’s not what I’m doing at $WORK. That’s mostly concerned with pushing around XML using Java. Right now, I’m trying to learn Cocoon.

Cocoon is a framework (in much the same way that Rails is), but it’s oriented to pushing around XML[1]. The basics of cocoon are pretty simple. There’s a “pipeline” for processing XML:

  • A generator produces XML. Usually, this is just reading a file. At $WORK, it’s pulled from an XML database.
  • Zero or more transformers munge the XML in various ways. Normally, this is XSLT.
  • Finally, it gets output through a serializer. Mostly this will be HTML.

There’s a little bit more to it, but that’s the basics. And for serving up XML directly, in a read-only fashion it actually works really well.

The problems start when you want to get a little bit more interactive. It seems that Cocoon has evolved a number of different approaches over the years, but the current favourite appears to be FlowScript.

FlowScript is server-side JavaScript2. When an URL is matched, a little bit of JavaScript gets run in order to determine what to do. It can interact with Java objects and when it’s figured out what to do, run the appropriate pipeline, passing in parameters. It’s effectively an MVC architecture, with the controller being JavaScript.

But what’s really neat about FlowScript is captured in a single call:

  function calculator()
    var a, b, operator;

    a = cocoon.request.get("a");

    b = cocoon.request.get("b");

    cocoon.sendPage("result.html", {result: a + b});

cocoon.sendPageAndWait() uses a continuation to effectively pause the execution of the JavaScript, return to the browser and when the user submits the form again, the FlowScript will carry on executing after the call to cocoon.sendPageAndWait(). Neat stuff.

Continuations are currently the hot thing because of seaside, a web framework for smalltalk. But cocoon’s had them for a couple of years.

Building on FlowScript is a framework for form handling called CForms. The idea is that you define a model for your form, which then gets rendered into HTML. I’m playing with this for a very complex form at the moment, and I’m not totally sold on the concept. Plus the generated result is some pretty yucky markup.

In fact, there are quite a few things about cocoon that make me feel uncomfortable about it.

  • It’s huge. The download is 50Mb, and you get a lot in that. The problem is two fold: firstly, you don’t need most of it most of the time. Secondly, figuring out what you do actually need is bloody hard work. e.g. I still haven’t figured out what the hell the “apples” block is.
  • It gets complicated very quickly when you step outside the core competencies. If you follow the CForms link, you’ll see what I mean.
  • Debugging is hard. Partially, this is down to the nature of XML (and in particular XML Namespaces), but in general, you’re not working with Java, so it’s difficult to get the level of debugging one would be used to. The error messages that do appear are somewhat vague.
  • Cocoon 2.2. The current version, 2.1, is a bit old now. I’ve been trying to find out more about cocoon 2.2 by poking around in the dev list. It appears that cocoon has been converted to a maven project and switched to use Spring internally. It’s Maven that I have a big issue with. It basically means that there isn’t a download any more. Instead, you just tell maven “make me a new cocoon 2.2 project” and it goes and downloads it. From somewhere you may or may not trust. That may or may not be compiled correctly. Oh, and they’ve completely reorganised how you integrate with a standard servlet container. And the docs aren’t updated yet. All this, combined with the fact that when maven blew up when I tried it means I’m not happy with the future direction of the project. Maybe with better docs, I’d be happier. We’ll see—the proper release should be “soon”.

Overall, I’m left with a mixed feeling about Cocoon. For it’s core purpose, I like it. Beyond that, I’m less certain. The trouble is that pretty much any web site you create these days falls into that “beyond” bit quite quickly—even the large, static ones like we create at $WORK. I kind of wish that it had some competition, but there doesn’t appear to be a lot out there that comes close to dealing with XML as well as Cocoon.

I’m going on a training course in a couple of weeks. We’ll have to see if that reassures me any that Cocoon is the correct choice.

1 XML oftens gets a lot of stick, but for its intended purpose (documents, as opposed to data), it’s a pretty reasonable solution.

2 Which appears to be coming back into fashion, what with things like Project Phobos and Zimki. Although it does go back a long way to the Netscape web server—see Server Side JavaScript.


Unread Books

I’ve been buying more books than I should have recently, with the result that a number are piling up behind me. Plus there are a few that I’ve not finished for one reason or another.

Now that I’m going to have minimize my monthly outgoings, I should revisit these instead of purchasing new fare…



I’m having a bit of a ranty evening, obviously. But when I see code like this, I give up on the whole article.

     int main(int argc, char *argv[])
     int i,j,k
     unsigned long acc=0;
     printf("acc = %lun",acc);
     return 0;

Really, if you’re preparing your code for publication, take the time and clean it up so it’s readable (hint: try pressing the space bar a bit more at a minimum). An editor won’t let bad spelling through into the article, so why does bad code get treated with such impunity?


Cross Site Scripting

I’ve just been listening to Security Now about Cross-Site Scripting. It makes my blood boil. No, not all the ads and endless, aimless waffling. The talk about Cross-Site Scripting (aka XSS) being a problem because code and data can be intermingled in the page.

No, it’s not.

XSS is a problem because we have dumb programmers using even dumber tools1.

I’ve railed before about the fact that if you’re outputting HTML, then your tool should do HTML escaping for you by default.

It’s kind of understandable in systems like Template Toolkit, which are not specifically aimed at the web, but it’s completely inexcusable in PHP. It’s designed to create web pages. You’d think it’d be able to do it safely and easily. No chance.

But lest you think I rant at PHP, most other systems I’ve seen (in Java, Perl and Ruby) make it nearly as hard to do correctly. Let me rephrase:

If you have to think about where to apply escaping, then your tool is letting you down.

This isn’t to say XSS is the only problem. There are plenty of other problems to be aware of. CSRF appears to be garnering lots of attention these days.

One really useful link did come out of the Security Now show: OWASP (Open Web Application Security Project). It’s really worth checking out the OWASP Guide in order to educate yourself about security on the web.

1 Before you start to feel offended about being called a “dumb programmer”, I most certainly include myself in this category too.


Skillswap: Intro to Rails

Last night I presented a skillswap, “Introduction to Rails”. This was meant to be a fairly quick overview for people who’ve done some web development before, but are completely new to Rails (and Ruby). The event was presented in two parts. First, a set of slides about what Rails is, why it works and a brief overview of Ruby. Then, a practical session.

For the practical, I installed Locomotive and we ran through a quick session of getting started with a rails application, building a model and putting up some scaffolding on top of that. There were only five macs, so people had to work together, which probably helped. I have to issue a huge thanks to lighthouse for the opportunity to use the fantastic venue.

I did actually have further slides and handouts, which progressed the practical, but it was already getting on for 20:30, so it seemed wiser to halt whilst things were still going well.

Like all live things, not all went to plan. The main annoyance was the fact that Locomotive-generated projects (well, Rails really) default to using MySQL. A quick switch to SQLite made things work a lot better. Servers can be a pain when you’re trying to get things running.

The slides and handout are available for download.

Also many thanks to Jane for the kind words. 🙂


65 years of debugging

I’ve recently been plowing through a lot of old Asimov books I had lying around. One that stands out in particular is anthology of his robot stories. Imaging what 2005 will be like is so much easier with hindsight!

But one group of stories is particularly enthralling to me: Powell & Donovan. Why? Because they’re debuggers. They get faced with a weird situation, have to figure out not only what’s gone wrong, but why so that they can fix it. They know about the three laws, and are exploring their unexpected implications and interactions in the real world. This feels exactly like what I do with computers.

Thankfully, the robots they deal with don’t come with a reset switch, which appears to be the limit of many people’s debugging these days. That would have made for a very short story indeed.