Search & Replace in XSLT 2
For a project at $WORK, we want to implement Solr’s spelling suggestions. When you ask solr to provide suggestions, it comes back with something like this (the original search was spinish englosh):
<response> … <lst name="spellcheck"> <lst name="suggestions"> <lst name="spinish"> <int name="numFound">1</int> <int name="startOffset">19</int> <int name="endOffset">26</int> <arr name="suggestion"> <str>spanish</str> </arr> </lst> <lst name="englosh"> <int name="numFound">1</int> <int name="startOffset">27</int> <int name="endOffset">34</int> <arr name="suggestion"> <str>english</str> </arr> </lst> <lst name="spinish"> <int name="numFound">1</int> <int name="startOffset">60</int> <int name="endOffset">67</int> <arr name="suggestion"> <str>spanish</str> </arr> </lst> … </lst> </lst> </response>
What we want to do is transform this into:
<p>Did you mean <a href="?q=spanish%20english">spanish english</a>?</p>
As it turns out, this is a non-trivial task in XSLT. It’s doable, but significantly easier in XSLT 2, since you are less restricted by the rules on result-tree-fragments.
The first problem to solve is getting the data into a sensible data structure for further processing. In a real language, I’d want a list of (from, to) pairs. In XSLT, sequences are always flat. The way to simulate this is to construct an element for the pair.
<xsl:variable name="suggRoot" select="/response/lst[@name='spellcheck']/lst[@name='suggestions']" /> <xsl:variable name="suggestions" as="element(sugg)*"> <xsl:for-each select="distinct-values($suggRoot/lst/@name)"> <!-- Pick the first suggestion for this name. --> <sugg from="{.}" to="{($suggRoot/lst[@name=current()])[1]/arr[@name='suggestion']/str[1]}" /> </xsl:for-each> </xsl:variable>
Note the commented caveat: we always pick the first suggestion for any given name. From my (small) experience, this isn’t an issue as the suggestions for a given word are always identical.
This results in $suggestions containing a sequence of elements looking like this.
<sugg from="spinish" to="spanish" /> <sugg from="englosh" to="english" />
Now one of the nice things about XSLT 2 is that you can define functions which are visible to XPath. So we can write a fairly simple recursive function to do the search and replace.
<!-- Take some input and a list of suggestions, and do a recursive search and replace over the input until all have been applied. --> <xsl:function name="my:replaceSuggestions" as="xs:string"> <xsl:param name="input" as="xs:string" /> <xsl:param name="suggestions" as="element(sugg)*" /> <xsl:variable name="sugg" select="$suggestions[1]" /> <xsl:sequence select=" if (count($suggestions) > 0) then my:replaceSuggestions(replace($input, $sugg/@from, $sugg/@to), $suggestions[position() > 1]) else $input" /> </xsl:function>
There are a few things to note:
- You have to give your function a namespace prefix.
- The
xsl:param’s are used in order (not by name) to specify the arity of the function. - The
asattributes aren’t necessary, but the idea of types in XSLT is growing on me. I’d rather know about type problems as soon as possible. - The notion of cdr (tail) in XSLT is rather odd: the sequence of all nodes in the sequence whose position is greater than one.
- Even though I’m using
replace(), I’m not taking any precautions against escaping regex characters. I’m certain that these won’t occur given my data.
So finally, we end up with:
<xsl:variable name="newQuery"> <xsl:value-of select="my:replaceSuggestions($input, $suggestions)"/> </xsl:variable> <p class="spelling"> <xsl:text>Did you mean </xsl:text> <em> <a href="?q={encode-for-uri($newQuery)}"> <xsl:value-of select="$newQuery" /> </a> </em> <xsl:text>?</xsl:text> </p>
I don’t think all this will win any awards for elegance, but it does work.
Recent Comments