For a project at $WORK, we want to implement Solr’s spelling suggestions. When you ask solr to provide suggestions, it comes back with something like this (the original search was spinish englosh):
… 1 19 26 spanish 1 27 34 english 1 60 67 spanish …
What we want to do is transform this into:
Did you mean spanish english?
As it turns out, this is a non-trivial task in XSLT. It’s doable, but significantly easier in XSLT 2, since you are less restricted by the rules on result-tree-fragments.
The first problem to solve is getting the data into a sensible data structure for further processing. In a real language, I’d want a list of (from, to)
pairs. In XSLT, sequences are always flat. The way to simulate this is to construct an element for the pair.
Note the commented caveat: we always pick the first suggestion for any given name. From my (small) experience, this isn’t an issue as the suggestions for a given word are always identical.
This results in $suggestions
containing a sequence of elements looking like this.
Now one of the nice things about XSLT 2 is that you can define functions which are visible to XPath. So we can write a fairly simple recursive function to do the search and replace.
There are a few things to note:
- You have to give your function a namespace prefix.
- The
xsl:param
‘s are used in order (not by name) to specify the arity of the function. - The
as
attributes aren’t necessary, but the idea of types in XSLT is growing on me. I’d rather know about type problems as soon as possible. - The notion of cdr (tail) in XSLT is rather odd: the sequence of all nodes in the sequence whose position is greater than one.
- Even though I’m using
replace()
, I’m not taking any precautions against escaping regex characters. I’m certain that these won’t occur given my data.
So finally, we end up with:
I don’t think all this will win any awards for elegance, but it does work. 🙂