Recently, in yet another fit of distraction from my existing projects, I’ve started working on jenx. It’s an XML writer for Java along similiar lines to GenX. At the moment, I’m just at the stage of banning invalid characters that go through it. So it’s extremely fortuitous that I’ve just seen a link to HOWTO Avoid Being Called a Bozo When Producing XML.
I’ve mostly been basing it on the GenX source code and it’s certainly made me realise quite how complicated a job it is to produce well-formed XML reliably. In particular, namespaces add a very large amount of complexity.
This just underlines how important it is to have good libraries to produce this sort of thing.
Hmmm, looking through that HOWTO makes me realise that I need to check that XML::Genx correctly supports astral characters. I’m sure it does, but I’d better double check in a test… For that matter, does Perl support them?
<foo> <bar:bar xmlns:bar="http://example.com/" /> </foo>
Then, I call
removeChild() on the <bar:bar> element, immediately followed by
$node->toString(). And this is what I get back:
Which is not well formed XML because the namespace declaration is missing. Somewhere along the line, the declaration is being lost. I suspect that it should be copied in to the $node when it’s removed from its parent document, but that’s not happening. As to whether this is a bug in XML::LibXML or libxml2, I don’t know. I’ll file a report and we’ll see what happens.
Update: how odd. This minimal test case works as expected. I’ll have to examine the original situation more closely to work out what I’m not reproducing…
Update#2: I got the example wrong. Actually, the source xml looked like this:
<foo xmlns:bar="http://example.com/"> <bar:bar /> </foo>
This fails as expected.
I’ve just finished reading Practical RDF for work. My conclusions:
- RDF/XML makes my brain hurt.
- RDF still seems a way off being “practical.” I’m with Tim Bray on this one.
- Some of the frameworks for RDF still look pretty cool. I had a quick play with the Python rdflib and was quite surprised at how easy it was. Jena and Redland also look good.
- Ontologies seem like a good idea, in the same way that DTD’s (well, RelaxNG+schematron for preference) are a good idea for plain XML. But they’re still way over my head. I’ll have to go back and reread that bit of the book when I’ve got a less fuzzy head.
- Mozilla’s use of RDF to build treeviews is pretty cool. I must play with that.
On the whole, I’ve learned a bit about how it works, and I can definitely see how it would be useful for some of the things we do at work with a relational database. But it’s a large rearchitecting to do so. Also, the book has let me know how much more I have to explore once I’ve put a bit of groundwork into understanding the basics.
So, I started hacking on it. Initially, it was to add support for XML::SAX, as a modern XML parsing system, instead of the old PerlSAX (which didn’t really understand namespaces). But then I added support for using XML::LibXML as well. And then I realised that the examples I was using had abstract tests, which weren’t supported, so I added that. And the name element didn’t work, so that got fixed. Just now, I’ve added support for namespace declarations in the ns element. It’s a lot more useful now than it was.
Now I just need to get these changes back to the maintainer, Kip Hampton. Alas, he seems to be away from his email recently. But all the changes are in my repository, so it should be easy to pass them on later as they’re unlikely to get lost. I hope these changes actually makes it back on to CPAN; they’re useful and should make schematron more readily available in Perl, which has got to be a good thing.
I’m looking heavily at schematron at the moment for work. It’s a fascinating tool, in contrast to the usual style of XML validation: DTD, RelaxNG and XML Schema.
Generally, from what I’ve seen of XML Schema, they are to be avoided. But then I saw these messages from Rick Jellife indicating how even the vendors are getting upset with XML Schema. I’m glad my instincts to avoid it after first looking are paying off…