Categories
Uncategorized

Code Examples Should Work

I’m looking at Wicket this evening. There’s a wicket tutorial on theserverside.com. However, the first code example fails!

Initially I tried to get things working by downloading wicket. However, the article recommends maven (as does the quickstart), so I grit my teeth and went ahead.

Once again, I get 50 lines of gibberish, including an error. But everything seems to work. Clench teeth some more, and move on.

The first code example is the definition of the WicketApplication class, which Maven has helpfully created for us in outline. The first method to add is:

  private ApplicationContext getContext() {
    return WebApplicationContextUtils
      .getRequiredWebApplicationContext(getServletContext());
  }

As usual, I get compile errors. This is normal, just hit quick-fix in Eclipse to import the classes ApplicationContext. Except that there isn’t one.

Strike one.

I take a “lucky guess” and add a dependency on spring-web to my pom.xml. That makes both ApplicationContext and WebApplicationContextUtils appear on my classpath. Yay.

The next problematic method is this one:

  public static WicketApplication get() {
      return (WicketApplication) WebApplication.get();
  }

According to the tutorial:

The third overrides the Application#get() method to allow for covariant return types, in this case the WicketApplication class. This example uses the WicketApplication as a form of Service Locator.

Great! Except for the minor problem that it doesn’t compile:

  The return type is incompatible with Application.get()

Strike two.

For now, I’m going to comment it out and carry on playing until I get by bitten by that. But it’s not what I’d call a great start to a tutorial when your first code example is this buggy.

Update: I spoke to the author of the article, Nick Heudecker, and he pointed out:

  • I should have downloaded the example code, that works.
    • Well, great, but it would have been nice to have that mentioned.
  • Variant return types are supported by Java 5.
    • Which I certainly thought I was using. I don’t know what went wrong here. But I didn’t know that Java 5 could do this, so it’s good to know.
Categories
Uncategorized

Trusting your tools

After Grzegorz’s piping up, I’m giving cocoon 2.2 another try. Here are some selected errors.

  javax.servlet.ServletException: No block for /favicon.ico
          at org.apache.cocoon.servletservice.DispatcherServlet.service(DispatcherServlet.java:84)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServlet.service(ReloadingServlet.java:89)
          at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1093)
          at org.apache.cocoon.servlet.multipart.MultipartFilter.doFilter(MultipartFilter.java:119)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.apache.cocoon.servlet.DebugFilter.doFilter(DebugFilter.java:169)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingSpringFilter.doFilter(ReloadingSpringFilter.java:69)
          at org.apache.cocoon.tools.rcl.wrapper.servlet.ReloadingServletFilter.doFilter(ReloadingServletFilter.java:50)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
          at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
          at org.mortbay.jetty.Server.handle(Server.java:313)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:506)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:830)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:396)
          at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

How fabulous! 30 lines to tell me about a 404 I couldn’t care less about! (this is from mvn jetty:run). And in the process, obliterating any messages I did care about.

  [ERROR] VM #displayTree: error : too few arguments to macro. Wanted 2 got 0
  [ERROR] VM #menuItem: error : too few arguments to macro. Wanted 1 got 0
  [INFO] ------------------------------------------------------------------------
  [INFO] BUILD SUCCESSFUL
  [INFO] ------------------------------------------------------------------------

There’s an error, but the build was successful. That makes sense. Not.(from mvn site).

  Caused by: org.codehaus.plexus.util.xml.pull.XmlPullParserException: TEXT must be immediately followed by END_TAG and not START_TAG (position: START_TAG seen ...<reports>n            <report>... @118:21)
          at org.codehaus.plexus.util.xml.pull.MXParser.nextText(MXParser.java:1063)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseReportPlugin(MavenXpp3Reader.java:3572)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseReporting(MavenXpp3Reader.java:3709)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.parseModel(MavenXpp3Reader.java:2347)
          at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read(MavenXpp3Reader.java:4422)
          at org.apache.maven.project.DefaultMavenProjectBuilder.readModel(DefaultMavenProjectBuilder.java:1412)
          ... 17 more
  [INFO] ------------------------------------------------------------------------
  [INFO] Total time: 2 seconds
  [INFO] Finished at: Sun Jan 27 22:25:35 GMT 2008
  [INFO] Final Memory: 1M/2M
  [INFO] ------------------------------------------------------------------------

XML parser exception (and in only 40+ lines!). Complaining about unbalanced tags? Must be non-well-formed XML, right? Wrong. This is down to copying an example from “Better builds with maven”. But the example’s wrong—I’m missing a tag. Can you guess what’s missing from this XML?

  <reportSets>
    <reports>
      <report>dependencies</report>
    </reports>
  </reportSets>

You mean you didn’t spot the missing reportSet element? I’m shocked, I tell you, shocked. Plus the lack of indication that an error actually occurred. The stacktrace is a good indication, but an actual “ERROR” or “BUILD FAILED” message would be nice (there is an error line, but it zoomed past three screens ago. I blinked and missed it).

So that’s two strikes to maven and one to cocoon. My trust in them is basically non-existent at this point. But at least the RCL worked as documented.

Categories
Uncategorized

Argumentative

I spent a little while looking at mootools yesterday for a colleague. It’s yet another JavaScript library. My colleague was wondering how to restrict the Accordion effect so it applied once to each area of content on the page, rather than once for the whole page (each content area has multiple bits of content where only one should be visible).

As usual, I just headed straight for the source code (Accordion.js). Inside, I found the best abuse of JavaScript I’ve seen in a while.

  initialize: function(){
    var options, togglers, elements, container;
    $each(arguments, function(argument, i){
            switch($type(argument)){
                    case 'object': options = argument; break;
                    case 'element': container = $(argument); break;
                    default:
                            var temp = $$(argument);
                            if (!togglers) togglers = temp;
                            else elements = temp;
            }
    });
    // …
  }

Now what exactly does this do? It’s pretty different from the usual JavaScript function usage:

  var initialize = function (options, togglers, elements, container) { … }

The first thing of interest is the first parameter to $each: arguments. A little known feature of JavaScript is that every function has an array-like object of all its arguments available. You can find get a reference to the function that you’re in, which is sometimes useful.

Here, it’s being used to accept an arbitrary number of arguments, in any order. To be quite frank, it’s a bit of a mess. This is my understanding of the above code:

  • You can pass in up to four arguments.
    • You can pass in more, but we only get 4 parameters out of it.
  • The last JavaScript object will get treated as a set of options.
  • The last element passed in will be used as “container”.
  • The first non object|element will be passed to $$ and treated as the set of “togglers”.
  • The last non object|element arguments will be passed through $$ and treated as the elements to be shown/hidden.

Is that right? I’ve been looking at it for a bit and still not 100% sure.

To me, this is a fine example of taking the flexibility of JavaScript just a little too far.

Categories
Uncategorized

A Year in Cocoon

The other large part of the project at $WORK I’ve just finished was Cocoon. Cocoon is a Java web framework. It’s got some really neat ideas in it, and it’s main purpose in life is transforming XML. It is (or should be) a perfect match for XML databases.

I described Cocoon about a year ago, towards the start of this project. But how do I feel about it now? Looking back, was it the right choice?

To start with, I’m still very impressed with the core Cocoon technologies. The pipelines are perfect for dealing with XML. FlowScript still impresses the heck out of me.

But there’s a lot that leaves a sour taste. My first original complaint was about the size. Well, 50Mb isn’t so big. But the fact of the matter is that there’s an awful lot of stuff in there, quite a bit of which you don’t want to touch with a bargepole. We wasted a lot of time looking at things like XSP, Actions and implementing our own custom Generators. I wish I’d been made more aware of what FlowScript was up-front, and what it could do for more. I wish I’d realised that it’s basically the “C” in MVC.

Which dovetails straight into another complaint: documentation. There is quite a bit of documentation for Cocoon. But it’s still inadequate given the gargantuan size of the project. And the coverage is extremely spotty. Normally, I’d jump straight to the published literature, but the most recent Cocoon book I could find was hideously out of date. In fact, that’s what caused me to go down several of the rabbit-holes mentioned above.

When I’ve really needed to figure out what’s going on, I’ve invariable had to turn to the cocoon source code. Which due to it’s dependence on the weird-yet-not-wonderful avalon framework made it less than simple to understand.

My complaint about debugging still holds, although less severely. You get used to the seemingly-intractible error messages. You spot the patterns that are causing trouble. Like most things, logging goes a long way.

And then there’s Cocoon 2.2. My site was developed entirely on Cocoon 2.1. This certainly had it’s flaws—figuring out how to deploy it as a war file sensibly was a pain. But Cocoon 2.2 has Maven.

I’ve pointed out my dislike for Maven before. As have other people. Recently, other people in my office have been using it and I’ve witnessed the project overruns thanks to trying to figure out what Maven is doing. Nice idea, bad implementation.

Cocoon 2.2 uses Maven because it’s “modularized”. What this means is you can’t have a single project with everything in it any more. You have a “webapp” project and a “block” project. And when something changes you have to build the block, install it, build the webapp, mount the installed block in the webapp and fire up jetty. It doesn’t make for a good development environment.

Now I could be completely wrong and missing the obvious way to do seamless Cocoon 2.2 web development from with Eclipse. I’d love to be corrected. But for now, Cocoon 2.2 has shot itself in the foot as far as I’m concerned.

So I’m not happy with the future direction of Cocoon. I need to look again at why I chose it in the first place. Initially, it was because all the other web frameworks for Java (Struts, Tapestry, Wicket) all seemed totally focussed on form-based CRUD-style web apps. Cocoon focussed on documents and URLs instead. So it’s time to start working my way through the tutorials of the many java web frameworks in order to find a more suitable one. I may not find a good replacement for Cocoon. But I certainly need to try.

Update: In the comments, Grzegorz Kossakowski points out a screencast about the RCL which slightly lessens the pain of interactive development with Cocoon 2.2.

Categories
Uncategorized

A Year in XQuery

About a year ago, I wrote The year of XQuery?. I’ve just finished my involvement with the large project at $WORK that was using XQuery. So it’s time to reflect over it a little.

First, a bit of background. The site I was developing is essentially a more-or-less static view of 112,1491 XML documents in 4 collections2. The main interface is a search page, plus there’s a browse-by-title if you really fancy getting lost. It sounds simple, but as with any large collection of data, there is lots of variation in the data, leading to unexpected complications. Plus there are several different views of each document, just to make life more fun.

The site is based on Cocoon talking to an XML database (MarkLogic Server in this instance). We get XML back from the database, and render it through an XSLT pipeline into HTML[3]. Plus there’s some nice jQuery on top to smooth the ride.

Looking inside the codebase, we’re presently running at 5200 lines of XQuery code across 39 files (admittedly, that includes a 1000 line “lookup table”). But that doesn’t include any of the search code, which is dynamically generated using Java.

But in many senses, that’s not the remarkable thing about the XQuery. One of the most important aspects has been the ability to query the existing data. This may not sound particularly remarkable for a query language. But for a collection of XML this size, the ability to do ad-hoc queries that understand the document structure is truly remarkable.

For example, several months ago, I received a bug report stating “the tables are not rendering in article 12345.” I was able to look at the article, see that there were no tables, examine the source markup in the database and discover a TAB element that I’d never seen before4. But how widespread is this problem? Three seconds later, I have:

  distinct-values(
    for $tab in //TAB
    return base-uri($tab)
  )

Which tells me the 99 affected articles. Now I know that I only need to reload those from the source data instead of all 112,149.

Looking back over my original criticisms, how do they stand up to my experience?

  • The development environment.
    • Well, it’s slightly better than I originally thought. But not good enough when placed next to Java IDEs like Eclipse.
    • I’ve been using my TextMate XQuery Bundle with reasonable success (although it still needs a great deal of improvement). There are modes for Emacs and Vim.
    • I’ve managed to get the OxygenXML debugger talking to MarkLogic, but it was less useful than it initially appeared. The XQuery editor turned out to be worse than useless, because MarkLogic uses an outdated version of XQuery, leading to a lack of syntax colouring and a plethora of error reports.
    • MarkLogic has an addon CQ which is a browser based interactive query tool. It’s pretty useful.
    • Fundamentally, sharing a database between developers (the stored procedure model) doesn’t work well when you have multiple people updating it.
      • We solved this expediently by kicking everybody else off the project. πŸ™‚
  • The verbosity.
    • Like most things, you kind of get used to it. Although I confess that when I started to understand what I was doing, I found that I could write code in a significantly less verbose way.
  • The SQL / functional nature.
    • This is another one of those things that you just get used to. And in this case, start to enjoy.
  • Not a standard.
    • Fixed!—although not MarkLogic, as mentioned above.
  • XML Namespaces bite.
    • And continue to do so. Let’s blame Tim Bray. πŸ™‚
    • Seriously, this continues to be a problem, even now a year later. I lost a whole morning two weeks ago due to my inability to query the correct namespace. My main advice now is to never use a default namespace—prefix everything.
  • The type system.
    • Over time, now that I have come to understand it, I can begin to use the type system to my advantage. In fact, it’s one of the things that I usually have to reinforce in developers just starting in the project.
    • Thankfully, I’ve never needed to dabble in XML schema.
  • Implementation defined areas.
    • It’s a concern, I’ll admit. MarkLogic is profligate in this area, and to get decent performance, you need to use the extensions.
  • Smiley comments
    • They’re just about getting to the point of being ignorable.

But what new things have I learned?

  • I seriously underestimated the utility of function libraries. You can have two kinds of query in XQuery: an inline query, or a “module”. The advantage of a module is that it’s a lot easier to reach in and test an individual function by hand when needed.
  • Speaking of testing, I haven’t come up with a good solution for unit testing. This pains me greatly. I realise that it’s similar to the RDBMS unit testing problem and basically I need a known test database. MarkLogic doesnt make automating this easy (there are no management APIs).
  • Performance is very unpredictable. Not unpredictable as in “varys a lot”, but difficult to tell the performance of a given statement by visual inspection. MarkLogic comes with profiling APIs, which helps somewhat. But compared to EXPLAIN in SQL, it still feels a bit primitive.
    • For example, my XSLT experience told me to avoid things like //p to examine all paragraphs. But in XQuery, everything is indexed up the wazzoo, so it’s likely to be faster than an XPath statement with an explicit path.
  • Thinking in a functional style is an art. I’ve had a few problems, which cry out for an accumulator of some sorts. My whiteboard and I have had some long, intimate moments.
  • Having regexes available is a godsend, after XSLT 1.0.
  • I still really need a decent XQuery pretty printer, alΓ‘ perltidy.

Overall, I have to ask myself: would I do it the same again? And I probably would. For this particular project, I would try to place more emphasis on the XQuery than the XSLT (this was down to our inexperience—you should always try to work as close to the data store as possible). Despite the initial strong learning curve, the XQuery itself was rarely the main problem. But that’s leading into a whole new post…

In short: if you have a bunch of XML data lying around, XQuery is an excellent way to get the most use out it5.

1 count(/doc)

2 There’s also a second site, almost identical, but on a different topic which has 46,876 documents.

3 HTML 4.01. Sadly, XHTML and browsers still interact badly.

4 This is a rather baroque DTD unfortunately.

5 If you’re not up to paying for a MarkLogic licence (it’s pricey), then eXist might be worth checking out.

Categories
Uncategorized

Google Collections

I’ve been rather stuck in the Java world recently. One of the things that makes this bearable is developing in Java 1.5. For all the flak that generics take, they are far superior to littering your code with casts, every single one an admission that the type system is utterly broken. The most common usage of generics tends to be part of the collections API.

But the collections only go so far. In particular, I noticed my code filling up with a pattern:

  class Thingummybob {
    Map<String, List<String>> params = new HashMap<String, List<String>>();

    void addParam(String name, String value) {
      List<String> values = params.get(name);
      if (values == null) {
        values = new ArrayList<String>();
        params.put(name, values)
      }
      values.add(value);
    }
  }

In short, ridiculously complicated1. Look at that code, then think about fetching or removing a value. It’s ridiculously complex for something that should be simple.

Then I found the google collections framework. It’s an extension of the collections framework and includes the marvellous MultiMap. It’s like an ordinary Map, but it takes multiple values automatically. The above code now looks like this:

  class Thingummybob {
    Multimap<String, String> params = Multimaps.newHashMultimaps();

    void addParam(String name, String value) {
      params.put(name, value);
    }
  }

Much, much nicer. You’ll also notice that factory function newHashMultimaps(). It cunningly avoids having to specify the types twice thanks to the compiler inferring what they should be.

Once I started using google collections, I noticed a few other niceties contained within.

  • There’s a Join class for joining Strings together, just like Perl’s join. This is a serious lack in the standard Java library, and something I keep reinventing (usually badly).
  • The Preconditions class provides a nice standard way of doing assert-like behaviour.
  • See Coding in the small with Google Collections for more examples of little things that can make your life simpler.

Overall, I’d rate the google collections API as “well worth a look.” It has the potential to make your code simpler, and that can only be a good thing.

1 Oh, what I’d give for autovivification!