I’ve just spent most of the afternoon on a character building exercise. I have some XML like this:
And I need to turn that into the numeric character reference
–. It’s perfectly possible to do so with a bit of fudging around with
<xsl:text disable-output-escaping="yes"/>. But there’s a slight caveat: You’re not creating a numeric character reference. You’re just creating something that looks like one. Really, it’s the characters “&”, ”#”, “x”, “2”, “0”, “1”, “3” and ”;”.
Now most of the time, this doesn’t matter. You just output XML that looks correct and the next parser along (probably a browser) will interpret it correctly. But it’s sleight of hand.
Today, I needed to copy the text contents of a node into an attribute. Unfortunately, that text content contained one of these symbol tags. But because it’s only a string, XSLT feels (correctly) that it needs to escape the leading ampersand. So, with this input:
<name>Fred <symbol unicode="2013"/> Bloggs</name>
I get this output:
<name attrib="Fred &#x2013; Bloggs">Fred – Bloggs</name>
Yes, I know that the input data is completely stupid. I can’t help that. Unfortunately I also have the restriction that I can’t do this in multiple passes.
I’ve looked at the standard XSLT functions and the standard XPath functions. I’ve looked at the EXSLT functions. All I want is something that works like Perl’s chr.
I noticed that Saxon has the saxon:entity-ref function, but annoyingly, libxslt doesn’t support it.
All I really need is some way of re-invoking the XML parser over a string of my choosing. That way I could just wrap the characters in an element, parse it and call
text() to get the character I need.
Right now, the only way that I can see of doing this is to turn UnicodeData.txt into one big XML lookup table, and lookup the numbers in that. Bleeeaaargh.
Thankfully, it’s not my project and the person doing it has just hacked around this in the output layer. But it bugs me that there’s no good way to achieve this.