Jasypt

Posted by Dominic Mitchell Tue, 22 Jul 2008 15:30:40 GMT

One more little library that I’ve come to love: jasypt. It’s a simplified veneer over the top of the gargantuan java security apparatus. All I wanted to do was encrypt a String before putting it in a Cookie.

  BasicTextEncryptor encryptor = new BasicTextEncryptor();
  encryptor.setPassword(key);
  String cipherText = encryptor.encrypt(clearText);

It nicely base64 encodes the result, which is ideal for Cookie stuffing.

The reverse operation is just as simple.

  BasicTextEncryptor encryptor = new BasicTextEncryptor();
  encryptor.setPassword(key);
  String recoveredText = encryptor.decrypt(cipherText);

Google Collections to the rescue 4

Posted by Dominic Mitchell Tue, 22 Jul 2008 14:36:02 GMT

A few days ago, I was writing a piece of code that turned a line at a time into an Object. And it was using iterators. I had a RecordStream, which wrapped a LineStream (just a thin veneer over LineNumberReader).

Then I discovered that there was a terminating record at the end of each file. And it was in a completely different format to all the other lines. Bother.

Ok, I know, I’ll insert another iterator in the middle, which specifically ignores that record. Well, easier said than done as it turns out. I spent the best part of a day trying to create an Iterator which reads the next value and pretends that it’s not there. It turns out to have an awful lot of state.

Eventually I managed the task, and it worked. But boy, was it ugly. And it was long—about two pages of code.

Then the light bulb went off. I remembered that google collections had some tools for dealing with Iterators. In particular, there’s a function filter(), which takes a Predicate. And look! The Predicates class contains some handy builtins!

After about 5 minutes work, my two pages of code boiled down to three lines of code.

    import static com.google.common.base.Predicates.*;

    private static final String END_RECORD = "END RECORD,END RECORD,END RECORD";

    public Iterator<T> iterator() {
        // Produce an iterator that returns one line at a time.
        Iterator<String> lines = new LineStream(reader).iterator();
        // A predicate to return all records which are not the end record.
        Predicate<String> notEndRecord = not(isEqualTo(END_RECORD));
        // Apply the predicate to the iterator.
        final Iterator<String> it = Iterators.filter(lines, notEndRecord);
        return new Iterator<T>() { … };
    }

Marvellous and powerful stuff. It’s seriously worth checking out in case you haven’t played with it before. My favourite is the static factory methods. e.g.

  // Before
  Map<String, String> myMap = new HashMap<String,String>();

  // After
  Map<String, String> myMap = Maps.newHashMap();

Isn’t it lovely how the compiler just figures it all out for you? Anything that can save space like that has to be a Good Thing™.

There are a whole bunch of other useful things in there.

  • Preconditions.checkNotNull() is a compact way of validity checking your arguments.
  • Join.join()—I don’t know how many times I’ve written this by hand (usually badly). Much better to have somebody else do it for me.

Do yourself a favour and go check them out. You won’t regret it.

Busy Times

Posted by Dominic Mitchell Tue, 22 Jul 2008 14:13:47 GMT

Blimey, it’s been a while since I posted. Well, it’s been busy times.

I’ve mostly been working on products at $WORK, mostly in Java. I’ve got a whole series of posts from my internal blog that need to be reposted here. Suffice to say I’ve been having fun.

I’ve been on holiday a bit. We went to Cornwall for my daughter’s first birthday (superb weather and if you’re near Looe, you must visit trawlers). We’ve been to Sweden for midsummers celebrations with more family (superb mygga...).

My grandmother visited us from France a few weeks ago. With the rest of that side of the family we attended the annual battle of britain memorial service in order to remember my Grandfather (Wing Commander Henry Maynard Mitchell).

It looks like I won’t be getting any less busy either. Small children like attention, so we’re finding that family visits are a very regular occurrence. But this is a welcome distraction.

That Looks Somewhat Iffy

Posted by Dominic Mitchell Sun, 13 Apr 2008 09:32:00 GMT

Oh, go on, then.

  % history 1 | awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head
  320 git
  112 ll
  80 cd
  48 ls
  24 less
  20 man
  19 happygiraffe.net
  17 sudo
  15 rm
  14 xattr

NB:

  1. This is at home, not work. At work, there’s significantly more git, ant, mvn and mate.
  2. I turn all my ssh known_hosts entries into command line aliases, hence “happygiraffe.net”.
  3. I’ve only just found xattr, so I’ve been playing with it.
  4. zsh only lists 25 history items by default. I have to ask it to start from the beginning.
  5. I obviously do too much as root.

As always, I can’t resist tinkering with the shell. I’d have written it as history 1 | awk '{print $2}' | sort | uniq -c | sort -rn | head.

m2eclipse tip

Posted by Dominic Mitchell Wed, 09 Apr 2008 22:41:00 GMT

I was wondering why one of my computers seemed to have an older version of m2eclipse installed. It turns out that they moved the update site from m2eclipse.codehaus.org/update/ to m2eclipse.sonatype.org/update/. Doh. It’d be nice if they mentioned that one the front page.

Nostalgia (not by Veidt)

Posted by Dominic Mitchell Wed, 02 Apr 2008 22:47:31 GMT

I was thinking about what I needed to do tomorrow. One of the tasks involved writing a DOS batch file for a colleague. That got me thinking. When did I learn to write batch files? Scarily, I realised it was 20 years ago. I must have been 14 at the time. I had an Amstrad PC 1512 and I wanted to learn everything about it (and play Ultima V a lot).

So, I spent ages reading help, playing with commands to see what they did. I even managed to get a book or two (if the books seem expensive now, they’re even more to a 14 year old with practically no income).

Learning batch files was pretty much mandatory — you had to configure AUTOEXEC.BAT somehow. But learning why you needed to prefix “echo off” with an @ was fascinating[1].

Somehow this information has stayed relevant a lot longer than I thought it would. Certainly longer than the DOS assembly coding I did (a PSP was a Program Segment Prefix long before it was a Play Station Portable — but who cares these days?).

I had little realisation quite how bad batch files really were until I came across sh on SunOS 4 at University. The whole Unix thing (including the culture) was pretty mind-blowing.

I guess I’ve just officially joined the “old farts” club. :-)

1 “echo off” stops outputting commands to the console. But the “echo off” itself has already been output to the console by that point. The @ in front stops that. Looking back now, it seems remarkably similar to the syntax used by make(1).

for() $DEITY's sake, why? 2

Posted by Dominic Mitchell Wed, 02 Apr 2008 10:17:00 GMT

I used to think I had a reasonable grasp of Perl. Yesterday, I realised I didn’t even understand a basic foreach loop.

  my $val;
  my @values = qw( a b c );
  foreach $val (@values) {
    print $val, "\n";
  }
  print "[end] $val\n";

I reckoned that this should print:

  a
  b
  c
  [end] c

Instead, it prints:

  a
  b
  c
  Use of uninitialized value in concatenation (.) or string at foo.pl line 11.
  [end] 

This confused me no end. But it’s actually documented behaviour. From perlsyn

The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it’s still localized to the loop. This implicit localisation occurs only in a foreach loop.

Wow. You really do learn something new every day. I suspect that this is implementation behaviour that was documented post-fact, rather than designed that way.

Expanding Outline Views in Cocoa

Posted by Dominic Mitchell Sun, 30 Mar 2008 21:42:00 GMT

Thanks to a lengthy commute last week, I’ve been making a toy in Cocoa. It reads an URL, parses some XML and displays it in an NSOutlineView. Simple stuff, but it will hopefully make my life better at work, where I need to do this a fair bit.

By default, when I load the XML into the NSOutlineView, everything is closed up. So all you’re presented with is the root element. I’d like to expand that so it automatically includes all children of the root element. Nice and simple—there’s an expandItem: method.

Except that when I call it from the action that puts the XML into the NSOutlineView, it doesn’t work. Bugger.

After instrumenting my XML data source, I can see that nothing is really happening until after my action. My suspicion is that the NSOutlineView isn’t realising that it has any data until after the first call to display.

I tested this by hooking up the call to expandItem into a secondary action (on another button). And it works great.

So, I need a way to say “call this code back in the next idle period”. And this is where I start to get upset that Objective-C doesn’t have closures.

How to execute code soon isn’t easily determined from the apple docs. My guess is that you use an NSTimer with a very small NSTimeInterval . Let’s try that…

  NSTimer *idle = [NSTimer scheduledTimerWithTimeInterval:0.0
                                                   target:self
                                                 selector:@selector(expandRoot)
                                                 userInfo:nil
                                                  repeats:NO];
  [[NSRunLoop currentRunLoop] addTimer:idle forMode:NSDefaultRunLoopMode];

Well, it works. But! Further investigation finally reveals Deferring the Execution of Operations. This suggests that I should use an NSNotification instead, but posted with a NSPostWhenIdle flag. This means getting involved with the cocoa notifications system...

The code now ends up looking like this:

  - (void)awakeFromNib
  {
    // Listen out for notification's we've posted to ourselves.
    [[NSNotificationCenter defaultCenter] addObserver:self
                                             selector:@selector(expandRoot:)
                                                 name:@"expandRoot" 
                                               object:nil];
  }

  -(IBAction)fetch: (id)sender
  {
    // …
    NSNotification* todo = [NSNotification notificationWithName:@"expandRoot" 
                                                         object:self];
    [[NSNotificationQueue defaultQueue] enqueueNotification:todo
                                               postingStyle:NSPostWhenIdle];
  }

  - (void)expandRoot:(NSNotification *)notification
  {
    [outlineView expandItem:[[outlineViewDataSource doc] rootElement]];
  }

Which is quite a bit more code. But it feels more robust doing it this way.

The big take away from all this is how difficult it is to use a non-Open-Source framework. If I had the source to Cocoa, I’d be able to look inside and see what I needed to do simply and quickly. Instead, it took me three train journeys. But there’s still enough to like in Cocoa that I intend to carry on.

q4e 2

Posted by Dominic Mitchell Sat, 01 Mar 2008 10:12:00 GMT

I hate maven. The UI sucks so badly, it’s incredibly painful to use. Anything that can take the edge off this has to be a good thing. In the past, I’ve experimented with m2eclipse, but to be quite frank, it’s not much improvement over the command line. And the tools are why we use Java instead of Perl/Python/Ruby, right?

So now I’m trying out q4e, which is being promoted as the “official” maven integration for Eclipse. At some point. Hopefully.

After installing (0.4.0), first impressions are good. There’s a “new maven project” wizard, which knows about archetypes. Creating archetypes manually is incredibly annoying (I can never remember how). Now, the wizard just lists them for you:

The q4e archetypes wizard

Afterwards, it nicely prompts you for details like groupId and artifactId. Plus a description, which I’d never have thought of otherwise. The more metadata the better! (Actually, the description appears to vanish once the project has been created).

Once created, this is what you end up with.

A new maven project created by q4e

Nice and simple so far. Most of the maven commands are on the right-click menu:

q4e's main menu

When you run something, you get a log viewer, which is a little nicer than the m2eclipse console. At least it’s timestamped.

The q4e log view

Unfortunately, the dependency management appears to be broken. I can’t search for dependencies in the way I could in m2eclipse. That’s not good, as there are lots of them available and the computer should be telling me what the heck they are.

Annoyingly, it seemed to “lose” my src/main/java folder. I had to recreate it in order to get it to work. Very odd.

It will graph dependencies, which is probably more useful on a larger project than my test app.

The q4e dependency viewer

There’s no help yet, which is annoying, though not critical (I don’t know anyone who uses Eclipse help much anyway. Developers—who’d believe it, eh?)

The one thing that’s been really bugging me with maven is the complete inability to get at the source code of dependencies. Oh sure, it’s available for download, and there are features to ask for the source to be downloaded. But I have no idea how on earth to make it work. Not having source code is a real problem when it comes to understanding software.

It didn’t offer much help in the way of creating tests (A default src/test/java would have been nice).

Overall, it’s pretty clear that this is a very early stage of development. But it still looks more promising than m2eclipse, and anything that can make maven easier to use is most welcome.

That was all written last night. This morning, I’ve updated to the development build (0.5.0). It’s managed to grow a “Fetch Source JARs” command, which is great. Unfortunately, I’ve just bumped into issue 153—the compiler settings aren’t synced between maven and eclipse. Oh well. I stand by the previous paragraph. This has potential, but it’s not there yet.

Vendor branches in git

Posted by Dominic Mitchell Thu, 07 Feb 2008 21:00:00 GMT

Recently, I’ve been playing with git. It’s been pretty good so far. There are two things that have really impressed me:

  1. The subversion integration with subversion via git-svn is superb. I’m busy coding in git and I can just push all my changes back into subversion with nobody knowing I’m doing anything different.
  2. The visualisation provided by gitk is wonderful. One of our projects at work has about 10 open branches. gitk allowed me to see this quickly and easily. Most importantly, it immediately alerted me to changes on branches that had not been merged back into the trunk.

There are downsides of course. Support for Windows and Eclipse still isn’t wonderful (though it is progressing). But for my personal use, it’s damned handy.

There’s one thing I make use of in subversion that I didn’t immediately understand how to do in git: vendor branches. If you haven’t come across them before, they’re a way of tracking third party software and incorporating your changes (where you either don’t have access to the repository, or don’t want that level of granularity). At work, I have a wordpress blog, and I keep it up to date (including all my customizations) using a vendor branch. It’s worked really well.

It took a bit of searching, but eventually I found a mail message describing tracking changes when the upstream uses snapshots, which is the same thing as “vendor branches”. The process is really quite elegant and simple, and makes use of git’s superb branching.

To start with, you just unpack the software and set it up as a new git repository.

  $ unzip wordpress-2.3.zip
  $ cd wordpress
  $ git init
  $ git add .
  $ git commit -m "Import wordpress 2.3" 
  $ git tag v2.3
  $ git branch upstream

The clever bit is the last line: create an “upstream” branch, which you can use to store the newer releases.

With that, you can go away and start making changes: wp-config.php, add a few themes and plugins, that sort of thing.

Now, when a new release comes along, there’s a simple process (easily wrapped up in a small shell script) to incorporate it.

  $ cd wordpress
  $ git checkout upstream
  $ rm -r *
  $ (cd .. && unzip wordpress-2.3.1.zip)
  $ git add .
  $ git commit -a -m 'Import wordpress 2.3.1.'
  $ git tag v2.3.1
  $ git checkout master
  $ git merge upstream

There’s a few tricks in there.

  • I know that wordpress always unzips into a wordpress directory, so I can unzip it on top of the current code.
  • git add . catches all the modified and added files.
  • git commit -a catches the removed files.
  • The magic is in the last two lines—switch back to your own lineage, and pull in all the changes in the latest version of the code. If there are any conflicts, git will point them out to you and let you deal with them.

After a few tries, you end up with something like this.

Wordpress on a vendor branch in git

That is, you have a wordpress 2.3.3 install with all your customizations applied.

I confess, this is all probably overkill for something like wordpress. But it’s still a useful technique to know about.

Update: I’ve just realised that the above solution isn’t perfect, particularly for things like wordpress and mediawiki, where there’s a tendency to upload files into the repository area. Frequently these aren’t in version control so the rm -r * above could blow the uploads away. There are two choices:

  1. Configure the software to not modify anything in the repository (e.g. in wordpress, change Options → Miscellaneous → Store uploads in this folder).
  2. Have a cron job to regularly add & commit uploads into the repository. Depending on the size and frequency, this may or may not be preferrable.

Code Examples Should Work

Posted by Dominic Mitchell Wed, 30 Jan 2008 22:48:00 GMT

I’m looking at Wicket this evening. There’s a wicket tutorial on theserverside.com. However, the first code example fails!

Initially I tried to get things working by downloading wicket. However, the article recommends maven (as does the quickstart), so I grit my teeth and went ahead.

Once again, I get 50 lines of gibberish, including an error. But everything seems to work. Clench teeth some more, and move on.

The first code example is the definition of the WicketApplication class, which Maven has helpfully created for us in outline. The first method to add is:

  private ApplicationContext getContext() {
    return WebApplicationContextUtils
      .getRequiredWebApplicationContext(getServletContext());
  }

As usual, I get compile errors. This is normal, just hit quick-fix in Eclipse to import the classes ApplicationContext. Except that there isn’t one.

Strike one.

I take a “lucky guess” and add a dependency on spring-web to my pom.xml. That makes both ApplicationContext and WebApplicationContextUtils appear on my classpath. Yay.

The next problematic method is this one:

  public static WicketApplication get() {
      return (WicketApplication) WebApplication.get();
  }

According to the tutorial:

The third overrides the Application#get() method to allow for covariant return types, in this case the WicketApplication class. This example uses the WicketApplication as a form of Service Locator.

Great! Except for the minor problem that it doesn’t compile:

  The return type is incompatible with Application.get()

Strike two.

For now, I’m going to comment it out and carry on playing until I get by bitten by that. But it’s not what I’d call a great start to a tutorial when your first code example is this buggy.

Update: I spoke to the author of the article, Nick Heudecker, and he pointed out:

  • I should have downloaded the example code, that works.
    • Well, great, but it would have been nice to have that mentioned.
  • Variant return types are supported by Java 5.
    • Which I certainly thought I was using. I don’t know what went wrong here. But I didn’t know that Java 5 could do this, so it’s good to know.

Trusting your tools 3

Posted by Dominic Mitchell Sun, 27 Jan 2008 22:40:00 GMT

After Grzegorz’s piping up, I’m giving cocoon 2.2 another try. Here are some selected errors.

  javax.servlet.ServletException: No block for /favicon.ico
          at org.apache.cocoon.servletservice.DispatcherServlet.service(DispatcherServlet.java:84)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServlet.service(ReloadingServlet.java:89)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1093)
          at org.apache.cocoon.servlet.multipart.MultipartFilter.doFilter(MultipartFilter.java:119)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.apache.cocoon.servlet.DebugFilter.doFilter(DebugFilter.java:169)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingSpringFilter.doFilter(ReloadingSpringFilter.java:69)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
          at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
          at org.mortbay.jetty.Server.handle(Server.java:313)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:506)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:830)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:396)
          at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

How fabulous! 30 lines to tell me about a 404 I couldn’t care less about! (this is from mvn jetty:run). And in the process, obliterating any messages I did care about.

  [ERROR] VM #displayTree: error : too few arguments to macro. Wanted 2 got 0
  [ERROR] VM #menuItem: error : too few arguments to macro. Wanted 1 got 0
  [INFO] ------------------------------------------------------------------------
  [INFO] BUILD SUCCESSFUL
  [INFO] ------------------------------------------------------------------------

There’s an error, but the build was successful. That makes sense. Not.(from mvn site).

  Caused by: org.codehaus.plexus.util.xml.pull.XmlPullParserException: TEXT must be immediately followed by END_TAG and not START_TAG (position: START_TAG seen ...<reports>\n            <report>... @118:21) 
          at org.codehaus.plexus.util.xml.pull.MXParser.nextText(MXParser.java:1063)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseReportPlugin(MavenXpp3Reader.java:3572)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseReporting(MavenXpp3Reader.java:3709)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseModel(MavenXpp3Reader.java:2347)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read(MavenXpp3Reader.java:4422)
          at org.apache.maven.project.DefaultMavenProjectBuilder.readModel(DefaultMavenProjectBuilder.java:1412)
          ... 17 more
  [INFO] ------------------------------------------------------------------------
  [INFO] Total time: 2 seconds
  [INFO] Finished at: Sun Jan 27 22:25:35 GMT 2008
  [INFO] Final Memory: 1M/2M
  [INFO] ------------------------------------------------------------------------

XML parser exception (and in only 40+ lines!). Complaining about unbalanced tags? Must be non-well-formed XML, right? Wrong. This is down to copying an example from “Better builds with maven”. But the example’s wrong—I’m missing a tag. Can you guess what’s missing from this XML?

  <reportSets>
    <reports>
      <report>dependencies</report>
    </reports>
  </reportSets>

You mean you didn’t spot the missing reportSet element? I’m shocked, I tell you, shocked. Plus the lack of indication that an error actually occurred. The stacktrace is a good indication, but an actual “ERROR” or “BUILD FAILED” message would be nice (there is an error line, but it zoomed past three screens ago. I blinked and missed it).

So that’s two strikes to maven and one to cocoon. My trust in them is basically non-existent at this point. But at least the RCL worked as documented.

Argumentative 1

Posted by Dominic Mitchell Fri, 25 Jan 2008 07:29:00 GMT

I spent a little while looking at mootools yesterday for a colleague. It’s yet another JavaScript library. My colleague was wondering how to restrict the Accordion effect so it applied once to each area of content on the page, rather than once for the whole page (each content area has multiple bits of content where only one should be visible).

As usual, I just headed straight for the source code (Accordion.js). Inside, I found the best abuse of JavaScript I’ve seen in a while.

  initialize: function(){
    var options, togglers, elements, container;
    $each(arguments, function(argument, i){
            switch($type(argument)){
                    case 'object': options = argument; break;
                    case 'element': container = $(argument); break;
                    default:
                            var temp = $$(argument);
                            if (!togglers) togglers = temp;
                            else elements = temp;
            }
    });
    // …
  }

Now what exactly does this do? It’s pretty different from the usual JavaScript function usage:

  var initialize = function (options, togglers, elements, container) { … }

The first thing of interest is the first parameter to $each: arguments. A little known feature of JavaScript is that every function has an array-like object of all its arguments available. You can find get a reference to the function that you’re in, which is sometimes useful.

Here, it’s being used to accept an arbitrary number of arguments, in any order. To be quite frank, it’s a bit of a mess. This is my understanding of the above code:

  • You can pass in up to four arguments.
    • You can pass in more, but we only get 4 parameters out of it.
  • The last JavaScript object will get treated as a set of options.
  • The last element passed in will be used as “container”.
  • The first non object|element will be passed to $$ and treated as the set of “togglers”.
  • The last non object|element arguments will be passed through $$ and treated as the elements to be shown/hidden.

Is that right? I’ve been looking at it for a bit and still not 100% sure.

To me, this is a fine example of taking the flexibility of JavaScript just a little too far.

A Year in Cocoon 7

Posted by Dominic Mitchell Mon, 21 Jan 2008 20:01:00 GMT

The other large part of the project at $WORK I’ve just finished was Cocoon. Cocoon is a Java web framework. It’s got some really neat ideas in it, and it’s main purpose in life is transforming XML. It is (or should be) a perfect match for XML databases.

I described Cocoon about a year ago, towards the start of this project. But how do I feel about it now? Looking back, was it the right choice?

To start with, I’m still very impressed with the core Cocoon technologies. The pipelines are perfect for dealing with XML. FlowScript still impresses the heck out of me.

But there’s a lot that leaves a sour taste. My first original complaint was about the size. Well, 50Mb isn’t so big. But the fact of the matter is that there’s an awful lot of stuff in there, quite a bit of which you don’t want to touch with a bargepole. We wasted a lot of time looking at things like XSP, Actions and implementing our own custom Generators. I wish I’d been made more aware of what FlowScript was up-front, and what it could do for more. I wish I’d realised that it’s basically the “C” in MVC.

Which dovetails straight into another complaint: documentation. There is quite a bit of documentation for Cocoon. But it’s still inadequate given the gargantuan size of the project. And the coverage is extremely spotty. Normally, I’d jump straight to the published literature, but the most recent Cocoon book I could find was hideously out of date. In fact, that’s what caused me to go down several of the rabbit-holes mentioned above.

When I’ve really needed to figure out what’s going on, I’ve invariable had to turn to the cocoon source code. Which due to it’s dependence on the weird-yet-not-wonderful avalon framework made it less than simple to understand.

My complaint about debugging still holds, although less severely. You get used to the seemingly-intractible error messages. You spot the patterns that are causing trouble. Like most things, logging goes a long way.

And then there’s Cocoon 2.2. My site was developed entirely on Cocoon 2.1. This certainly had it’s flaws—figuring out how to deploy it as a war file sensibly was a pain. But Cocoon 2.2 has Maven.

I’ve pointed out my dislike for Maven before. As have other people. Recently, other people in my office have been using it and I’ve witnessed the project overruns thanks to trying to figure out what Maven is doing. Nice idea, bad implementation.

Cocoon 2.2 uses Maven because it’s “modularized”. What this means is you can’t have a single project with everything in it any more. You have a “webapp” project and a “block” project. And when something changes you have to build the block, install it, build the webapp, mount the installed block in the webapp and fire up jetty. It doesn’t make for a good development environment.

Now I could be completely wrong and missing the obvious way to do seamless Cocoon 2.2 web development from with Eclipse. I’d love to be corrected. But for now, Cocoon 2.2 has shot itself in the foot as far as I’m concerned.

So I’m not happy with the future direction of Cocoon. I need to look again at why I chose it in the first place. Initially, it was because all the other web frameworks for Java (Struts, Tapestry, Wicket) all seemed totally focussed on form-based CRUD-style web apps. Cocoon focussed on documents and URLs instead. So it’s time to start working my way through the tutorials of the many java web frameworks in order to find a more suitable one. I may not find a good replacement for Cocoon. But I certainly need to try.

Update: In the comments, Grzegorz Kossakowski points out a screencast about the RCL which slightly lessens the pain of interactive development with Cocoon 2.2.

A Year in XQuery 4

Posted by Dominic Mitchell Sun, 20 Jan 2008 22:45:00 GMT

About a year ago, I wrote The year of XQuery?. I’ve just finished my involvement with the large project at $WORK that was using XQuery. So it’s time to reflect over it a little.

First, a bit of background. The site I was developing is essentially a more-or-less static view of 112,1491 XML documents in 4 collections2. The main interface is a search page, plus there’s a browse-by-title if you really fancy getting lost. It sounds simple, but as with any large collection of data, there is lots of variation in the data, leading to unexpected complications. Plus there are several different views of each document, just to make life more fun.

The site is based on Cocoon talking to an XML database (MarkLogic Server in this instance). We get XML back from the database, and render it through an XSLT pipeline into HTML[3]. Plus there’s some nice jQuery on top to smooth the ride.

Looking inside the codebase, we’re presently running at 5200 lines of XQuery code across 39 files (admittedly, that includes a 1000 line “lookup table”). But that doesn’t include any of the search code, which is dynamically generated using Java.

But in many senses, that’s not the remarkable thing about the XQuery. One of the most important aspects has been the ability to query the existing data. This may not sound particularly remarkable for a query language. But for a collection of XML this size, the ability to do ad-hoc queries that understand the document structure is truly remarkable.

For example, several months ago, I received a bug report stating “the tables are not rendering in article 12345.” I was able to look at the article, see that there were no tables, examine the source markup in the database and discover a TAB element that I’d never seen before4. But how widespread is this problem? Three seconds later, I have:

  distinct-values(
    for $tab in //TAB
    return base-uri($tab)
  )

Which tells me the 99 affected articles. Now I know that I only need to reload those from the source data instead of all 112,149.

Looking back over my original criticisms, how do they stand up to my experience?

  • The development environment.
    • Well, it’s slightly better than I originally thought. But not good enough when placed next to Java IDEs like Eclipse.
    • I’ve been using my TextMate XQuery Bundle with reasonable success (although it still needs a great deal of improvement). There are modes for Emacs and Vim.
    • I’ve managed to get the OxygenXML debugger talking to MarkLogic, but it was less useful than it initially appeared. The XQuery editor turned out to be worse than useless, because MarkLogic uses an outdated version of XQuery, leading to a lack of syntax colouring and a plethora of error reports.
    • MarkLogic has an addon CQ which is a browser based interactive query tool. It’s pretty useful.
    • Fundamentally, sharing a database between developers (the stored procedure model) doesn’t work well when you have multiple people updating it.
      • We solved this expediently by kicking everybody else off the project. :)
  • The verbosity.
    • Like most things, you kind of get used to it. Although I confess that when I started to understand what I was doing, I found that I could write code in a significantly less verbose way.
  • The SQL / functional nature.
    • This is another one of those things that you just get used to. And in this case, start to enjoy.
  • Not a standard.
    • Fixed!—although not MarkLogic, as mentioned above.
  • XML Namespaces bite.
    • And continue to do so. Let’s blame Tim Bray. :)
    • Seriously, this continues to be a problem, even now a year later. I lost a whole morning two weeks ago due to my inability to query the correct namespace. My main advice now is to never use a default namespace—prefix everything.
  • The type system.
    • Over time, now that I have come to understand it, I can begin to use the type system to my advantage. In fact, it’s one of the things that I usually have to reinforce in developers just starting in the project.
    • Thankfully, I’ve never needed to dabble in XML schema.
  • Implementation defined areas.
    • It’s a concern, I’ll admit. MarkLogic is profligate in this area, and to get decent performance, you need to use the extensions.
  • Smiley comments
    • They’re just about getting to the point of being ignorable.

But what new things have I learned?

  • I seriously underestimated the utility of function libraries. You can have two kinds of query in XQuery: an inline query, or a “module”. The advantage of a module is that it’s a lot easier to reach in and test an individual function by hand when needed.
  • Speaking of testing, I haven’t come up with a good solution for unit testing. This pains me greatly. I realise that it’s similar to the RDBMS unit testing problem and basically I need a known test database. MarkLogic doesnt make automating this easy (there are no management APIs).
  • Performance is very unpredictable. Not unpredictable as in “varys a lot”, but difficult to tell the performance of a given statement by visual inspection. MarkLogic comes with profiling APIs, which helps somewhat. But compared to EXPLAIN in SQL, it still feels a bit primitive.
    • For example, my XSLT experience told me to avoid things like //p to examine all paragraphs. But in XQuery, everything is indexed up the wazzoo, so it’s likely to be faster than an XPath statement with an explicit path.
  • Thinking in a functional style is an art. I’ve had a few problems, which cry out for an accumulator of some sorts. My whiteboard and I have had some long, intimate moments.
  • Having regexes available is a godsend, after XSLT 1.0.
  • I still really need a decent XQuery pretty printer, alá perltidy.

Overall, I have to ask myself: would I do it the same again? And I probably would. For this particular project, I would try to place more emphasis on the XQuery than the XSLT (this was down to our inexperience—you should always try to work as close to the data store as possible). Despite the initial strong learning curve, the XQuery itself was rarely the main problem. But that’s leading into a whole new post…

In short: if you have a bunch of XML data lying around, XQuery is an excellent way to get the most use out it5.

1 count(/doc)

2 There’s also a second site, almost identical, but on a different topic which has 46,876 documents.

3 HTML 4.01. Sadly, XHTML and browsers still interact badly.

4 This is a rather baroque DTD unfortunately.

5 If you’re not up to paying for a MarkLogic licence (it’s pricey), then eXist might be worth checking out.

Older posts: 1 2 3 ... 24