Search & Replace in XSLT 2

For a project at $WORK, we want to implement Solr’s spelling suggestions. When you ask solr to provide suggestions, it comes back with something like this (the original search was spinish englosh):

  <response><lst name="spellcheck">
      <lst name="suggestions">
        <lst name="spinish">
          <int name="numFound">1</int>
          <int name="startOffset">19</int>
          <int name="endOffset">26</int>
          <arr name="suggestion">
        <lst name="englosh">
          <int name="numFound">1</int>
          <int name="startOffset">27</int>
          <int name="endOffset">34</int>
          <arr name="suggestion">
        <lst name="spinish">
          <int name="numFound">1</int>
          <int name="startOffset">60</int>
          <int name="endOffset">67</int>
          <arr name="suggestion">

What we want to do is transform this into:

<p>Did you mean <a href="?q=spanish%20english">spanish english</a>?</p>

As it turns out, this is a non-trivial task in XSLT. It’s doable, but significantly easier in XSLT 2, since you are less restricted by the rules on result-tree-fragments.

The first problem to solve is getting the data into a sensible data structure for further processing. In a real language, I’d want a list of (from, to) pairs. In XSLT, sequences are always flat. The way to simulate this is to construct an element for the pair.

  <xsl:variable name="suggRoot" select="/response/lst[@name='spellcheck']/lst[@name='suggestions']" />
  <xsl:variable name="suggestions" as="element(sugg)*">
    <xsl:for-each select="distinct-values($suggRoot/lst/@name)">
      <!-- Pick the first suggestion for this name. -->
      <sugg from="{.}" to="{($suggRoot/lst[@name=current()])[1]/arr[@name='suggestion']/str[1]}" />

Note the commented caveat: we always pick the first suggestion for any given name. From my (small) experience, this isn’t an issue as the suggestions for a given word are always identical.

This results in $suggestions containing a sequence of elements looking like this.

  <sugg from="spinish" to="spanish" />
  <sugg from="englosh" to="english" />

Now one of the nice things about XSLT 2 is that you can define functions which are visible to XPath. So we can write a fairly simple recursive function to do the search and replace.

  <!-- Take some input and a list of suggestions, and do a recursive search and
       replace over the input until all have been applied. -->
  <xsl:function name="my:replaceSuggestions" as="xs:string">
    <xsl:param name="input" as="xs:string" />
    <xsl:param name="suggestions" as="element(sugg)*" />
    <xsl:variable name="sugg" select="$suggestions[1]" />
    <xsl:sequence select="
      if (count($suggestions) > 0) then
        my:replaceSuggestions(replace($input, $sugg/@from, $sugg/@to), $suggestions[position() > 1])
        $input" />

There are a few things to note:

  • You have to give your function a namespace prefix.
  • The xsl:param‘s are used in order (not by name) to specify the arity of the function.
  • The as attributes aren’t necessary, but the idea of types in XSLT is growing on me. I’d rather know about type problems as soon as possible.
  • The notion of cdr (tail) in XSLT is rather odd: the sequence of all nodes in the sequence whose position is greater than one.
  • Even though I’m using replace(), I’m not taking any precautions against escaping regex characters. I’m certain that these won’t occur given my data.

So finally, we end up with:

  <xsl:variable name="newQuery">
    <xsl:value-of select="my:replaceSuggestions($input, $suggestions)"/>
  <p class="spelling">
    <xsl:text>Did you mean </xsl:text>
      <a href="?q={encode-for-uri($newQuery)}">
        <xsl:value-of select="$newQuery" />

I don’t think all this will win any awards for elegance, but it does work. 🙂

Comments 1

  1. dom wrote:

    It’s probably worth pointing out that this is all happening inside a Cocoon pipeline, hence the XSLT. To be honest, I would have preferred to write this all using SolrJ in combination with some Java and FlowScript…

    Posted 23 Jul 2009 at 22:11