Aristotle Pagaltzis has mentioned XML::Genx. Yay, it needs all the publicity that it can get. He highlighted some issues in XML::Genx::SAXWriter, which I need to address, although getting some consensus from the list first would be good.

He also mentions a problem with default namespaces a little further on. I’ll have to look into that one. I’m not sure if it’s a bug or Genx is supposed to work that way.

But it has made me realise that I need to upgrade the documentation that comes with Genx. The API usage is not always as clear as it could be, particularly when it comes to namespaces and optional function arguments. So a very good EXAMPLES section would do well I think.

Not only that, it’s also made realise that there’s some bad behaviour in there.

  1. Passing in a namespace object to StartElementLiteral() doesn’t work properly. It assumes that the stringification of the namespace object is the URL.
  2. The manner for using StartElementLiteral() to declare a default namespace sucks, badly. You have to declare the namespace object with the default prefix (ie: ””). Then, you have to call StartElementLiteral() passing in the URI yourself. And then you have to call AddNamespace() on the namespace object to switch from the genx created prefix back to the default. Very weird, but fixing the bug above would render this much simpler.

It’s really important to get this sort of thing fixed up so that the API is easy to use and works as expected.

Finally, I’ve also noticed that the last release had an unexpected failure (the Win32 stuff is at least expected failure). What’s irritating is that it’s nothing to do with my code. For some reason, version.xs is being picked up and mingled into my module. I blame the person submitting the CPAN tester report (for now, anyway). I’m sure I saw something about this on a journal recently…


Dave Fulton

We went to see Dave Fulton as part of the Brighton Comedy Festival. His show is “We’re all Americans” and it rings true. Whilst he starts off slating our cousins across the pond, he quickly turns it around and starts biting into British people with equal ferocity. And it’s all deliciously non P.C.

I particularly liked his line on the British capacity for booze:

The English are the only people I know who drink on the train on the way to go drinking!

Definitely worth catching if he’s playing near you.


Google Reader

Like many other people in the blogosphere, I’ve been playing with Google Reader, and I’ve found it to be a pretty good aggregator. But it’s lacking one big, important feature: The “catchup” button. I want to be able to mark all my feeds read with a single keystroke.

What would be really nice would be the ability to mark all articles from a selected feed read. But I think that’s difficult in the current interface because it seems to melt all articles into the same pot.

Before I sound too critical, I have to point out that what Google Reader does, it does very well. I found it quick, responsive and easy on the eyes. Good work, chaps!

Oh, and there’s one really, really important innovation: It brings vi-style keystrokes to the people. Yay Google!


Conference in Town

Clear:Left are putting on a Web 2.0 conference, d.Construct in town. I think it would be remiss of me not to go. 🙂

Plus it’s got Simon Willison, Cory Doctorow speaking. It should be good.


subversion diff

Normally, subversion uses a builtin diff command to show you your changes. This works pretty well most of the time, except that you can’t use it to disable whitespace-only changes. How to do this isn’t spelled out 100% clearly in the FAQ, so here’s what I found out:

  1. Make a shell script ~/bin/svn_diff. It should look like this:
    exec diff -bu "$@"
  2. chmod +x ~/bin/svn_diff
  3. Put these lines in ~/.subversion/config:
    diff-cmd = /home/dom/bin/svn_diff

Naturally, you can adjust the diff flags to taste in the svn_diff script..


Java Unicode Characters

Working on jenx, I’ve started looking at Characters. In particular, Astral characters. My first question was “how do I create one in a string literal?” Well I still don’t know. But my researches have shown that to do anything outside the Basic Multilingual Plane (BMP) requires JDK 5. Drat. That kind of limits the usefulness of this library. But I really need the stuff in JSR 204.

Which is a story in itself. It’s good thing that Java can handle the full Unicode range. But the support is (to be quite frank) a bit crap. Mostly down to the fact that char is a UTF-16 codepoint, not a “Unicode Character.” I personally don’t find it helpful that they’ve propogated the C problem of confusing char and int, and generally allowing the two to roam freely amongst each other. Plus, JSR 204 looks like it was extremely careful to avoid breaking backwards compatibility, which is always a noble goal, but in the case makes the end result incredibly difficult to use. I shouldn’t have to test each codepoint to see whether or not it’s a surrogate. Really. This is an OO language, I should be able to get the next Character object from the String. Shocking, I know.

It strikes me that Python is pretty much the only language I know that got Unicode right by making an entirely separate object, the “Unicode String.”

Update: Oh alright. In my whinging, I managed to miss String.codePointAt, which does what I need.



Recently, in yet another fit of distraction from my existing projects, I’ve started working on jenx. It’s an XML writer for Java along similiar lines to GenX. At the moment, I’m just at the stage of banning invalid characters that go through it. So it’s extremely fortuitous that I’ve just seen a link to HOWTO Avoid Being Called a Bozo When Producing XML.

I’ve mostly been basing it on the GenX source code and it’s certainly made me realise quite how complicated a job it is to produce well-formed XML reliably. In particular, namespaces add a very large amount of complexity.

This just underlines how important it is to have good libraries to produce this sort of thing.

Hmmm, looking through that HOWTO makes me realise that I need to check that XML::Genx correctly supports astral characters. I’m sure it does, but I’d better double check in a test… For that matter, does Perl support them?