Categories
Uncategorized

Search & Replace in XSLT 2

For a project at $WORK, we want to implement Solr’s spelling suggestions. When you ask solr to provide suggestions, it comes back with something like this (the original search was spinish englosh):

  
    …
    
      
        
          1
          19
          26
          
            spanish
          
        
        
          1
          27
          34
          
            english
          
        
        
          1
          60
          67
          
            spanish
          
        
        …
      
    
  

What we want to do is transform this into:

Did you mean spanish english?

As it turns out, this is a non-trivial task in XSLT. It’s doable, but significantly easier in XSLT 2, since you are less restricted by the rules on result-tree-fragments.

The first problem to solve is getting the data into a sensible data structure for further processing. In a real language, I’d want a list of (from, to) pairs. In XSLT, sequences are always flat. The way to simulate this is to construct an element for the pair.

  
  
    
      
      
    
  

Note the commented caveat: we always pick the first suggestion for any given name. From my (small) experience, this isn’t an issue as the suggestions for a given word are always identical.

This results in $suggestions containing a sequence of elements looking like this.

  
  

Now one of the nice things about XSLT 2 is that you can define functions which are visible to XPath. So we can write a fairly simple recursive function to do the search and replace.

  
  
    
    
    
    
  

There are a few things to note:

  • You have to give your function a namespace prefix.
  • The xsl:param‘s are used in order (not by name) to specify the arity of the function.
  • The as attributes aren’t necessary, but the idea of types in XSLT is growing on me. I’d rather know about type problems as soon as possible.
  • The notion of cdr (tail) in XSLT is rather odd: the sequence of all nodes in the sequence whose position is greater than one.
  • Even though I’m using replace(), I’m not taking any precautions against escaping regex characters. I’m certain that these won’t occur given my data.

So finally, we end up with:

  
    
  
  

Did you mean ?

I don’t think all this will win any awards for elegance, but it does work. 🙂