<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Ronny's Giddy Geeky Blog</title>
	<atom:link href="http://rvdb.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://rvdb.wordpress.com</link>
	<description>what happens when i press that button?</description>
	<lastBuildDate>Wed, 07 Dec 2011 13:20:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='rvdb.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Ronny's Giddy Geeky Blog</title>
		<link>http://rvdb.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://rvdb.wordpress.com/osd.xml" title="Ronny&#039;s Giddy Geeky Blog" />
	<atom:link rel='hub' href='http://rvdb.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Internal URL Rewriting with eXist&#8217;s MVC Framework</title>
		<link>http://rvdb.wordpress.com/2011/12/07/internal-url-rewriting-with-exists-mvc-framework/</link>
		<comments>http://rvdb.wordpress.com/2011/12/07/internal-url-rewriting-with-exists-mvc-framework/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 12:29:53 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[XQuery]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/?p=177</guid>
		<description><![CDATA[Since version 1.4, the eXist native XML database has been equipped with a Model View Controller (MVC) framework designed to express the logic for request routing of eXist-based web applications in XQuery. In this post I’ll illuminate a (in my opinion) somewhat under-exposed feature of eXist’s MVC framework: internal URL rewriting. With this term, I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=177&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Since version 1.4, the eXist native XML database has been equipped with a Model View Controller (MVC) framework designed to express the logic for request routing of eXist-based web applications in XQuery. In this post I’ll illuminate a (in my opinion) somewhat under-exposed feature of eXist’s MVC framework: internal URL rewriting. With this term, I mean the fact that a URL, say <a href="http://localhost:8080/exist/urltest/test.xql" target="_blank">http://localhost:8080/exist/urltest/test.xql</a> is resolved internally to another URL like <a href="http://localhost:8080/exist/urltest/xquery/test.xql" target="_blank">http://localhost:8080/exist/urltest/xquery/test.xql</a>. Internally, meaning that the original request is <em>not</em> redirected to another one, and the user still sees the original URL in the browser address bar. As section 1of this post will illustrate, this works like a charm for ‘simple’ rewrites, like the previous one, but requires some thought if you would like to ‘chain’ multiple internal rewrite rules. In this post, I’ll try to provide a flexible coding pattern to achieve such internal rewriting with eXist’s MVC framework.</p>
<p><span id="more-177"></span>
<p>I’ll illustrate how this can be done with a toy web application ‘urltest’, that consists of only two basic scripts in two folders:</p>
<ul>
<li>/urltest/xquery/test.xql: a simple script that will generate a &lt;test&gt; element, containing a &#8216;hello $name&#8217; string, where $name can be passed as a request parameter:
<pre><font>
xquery version &quot;1.0&quot;;
&lt;test&gt;hello {request:get-parameter(&quot;name&quot;, &quot;anonymous&quot;)}&lt;/test&gt;
</font></pre>
</li>
<li>/urltest/stylesheets/test.xsl: an XSLT stylesheet that will just transform the &lt;test&gt; element to an HTML &lt;h1&gt; element:
<pre><span style="color:#0000ff;">&lt;?</span>xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;<span style="color:#0000ff;">?&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">xsl</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span>
    <span style="color:#ff0000;">exclude</span>-<span style="color:#ff0000;">result</span>-<span style="color:#ff0000;">prefixes</span>=<span style="color:#0000ff;">&quot;#all&quot;</span>
    <span style="color:#ff0000;">version</span>=<span style="color:#0000ff;">&quot;2.0&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;test&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>query output: <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span><span style="color:#0000ff;">/&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span>processed by test.xsl<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span><span style="color:#0000ff;">&gt;</span></pre>
</li>
</ul>
<p>You can <a href="http://ctb.kantl.be/download/urltest.zip">download the source files</a> for the mini-application right away. The next sections will gradually explain the steps from a very basic controller.xql version to the final version included in the application’s source code.</p>
<h2>1. Forwarding and redirecting</h2>
<p>The <a href="http://localhost:8080/exist/urlrewrite.xml#d2e269" target="_blank">eXist MVC documentation</a> nicely illustrates how above behaviour can be specified in a controller.xql file, by forwarding the request to a path that can be interpreted by the XQueryServlet. Note that the following discussion will take the concepts documented in the eXist MVC documentation for granted; the reader is directed there (externally) for full documentation. Following rules in a controller.xql file would match any request to an XQuery file ending in ‘.xql’ under the webapp’s root folder, forward it to the corresponding XQuery source file and apply an XSLT stylesheet of the same name as the requested XQuery script to the result:</p>
<pre><font>
(: [rule 1]: map all *.xql requests to their source files in xquery/, and corresponding xslt stylesheets in stylesheets/ :)
if (matches($exist:path, '^/[^/]*.xql$')) then
  let $query := substring-before(tokenize($exist:path, '/')[last()], '.')
  return
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
(: default rule: just pass through everything else :)
else
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;cache-control cache=&quot;yes&quot;/&gt;
    &lt;/dispatch&gt;
</font></pre>
<p>Though the XQuery source file lives in the ‘xquery’ subfolder of the webapp’s root folder (‘/urltest’), rule 1 makes it possible to execute the ‘xquery/test.xql’ script via a URL like <a href="http://localhost:8080/exist/urltest/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/test.xql?name=bobby</a>. As the $exist:path portion of the request doesn’t contain further subpaths, it will match the test condition of the first rule, so the request will be forwarded to the source file in the ‘xquery’ folder which will then be processed by the XQueryServlet, whose output will subsequently be processed by the ‘test.xsl’ XSLT stylesheet in the ‘stylesheets’ folder. This will produce following output:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>query output: hello bobby<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span>processed by test.xsl<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Compare this to what happens with the URL <a href="http://localhost:8080/exist/urltest/xquery/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/xquery/test.xql?name=bobby</a>, which addresses the XQuery source file more directly by including its containing folder ‘/xquery’ in the path. The $exist:path section of this request doesn’t match rule 1, and is hence caught by the default rule, which will just pass through the request to XQueryServlet (the default servlet for files ending in ‘.xql’). No further processing is applied, however, so the result looks like this:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">test</span><span style="color:#0000ff;">&gt;</span>hello bobby<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">test</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Now, suppose we wanted to add a rule that would intercept URLs with empty root paths, like <a href="http://localhost:8080/urltest/" target="_blank">http://localhost:8080/urltest/</a>, and rewrite them to <a href="http://localhost:8080/urltest/test.xql" target="_blank">http://localhost:8080/urltest/test.xql</a>. This can be done by adding another rule to the controller, for example:</p>
<pre><font>
(: [rule 2]: redirect empty paths to main query :)
else if ($exist:path eq '/') then
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;redirect url=&quot;test.xql&quot;/&gt;
    &lt;/dispatch&gt;
</font></pre>
<p>With this rule in place, the URL <a href="http://localhost:8080/urltest/" target="_blank">http://localhost:8080/urltest/</a> will be redirected and hence generate a new request to <a href="http://localhost:8080/urltest/test.xql" target="_blank">http://localhost:8080/urltest/test.xql</a>. Because of this redirection, the original URL in the browser will be changed to the latter one. </p>
<p>Now, suppose you’re not interested in a redirection, but would rather want to express this as a background forwarding operation. The most naive solution would replace the &lt;redirect&gt; instruction with &lt;forward&gt;, which won’t work, unless the new URL is specified as &lt;forward url=”xquery/test.xql”/&gt;. While this will execute the ‘xquery/test.xql’ XQuery script and preserve the original URL in the browser, no further XSLT processing will be done on the XQuery output, unless the &lt;dispatch&gt; element is extended to something very similar to the content of rule 1:</p>
<pre><font>
(: [rule 2]: redirect empty paths to main query :)
else if ($exist:path eq '/') then
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;xquery/test.xql&quot;&gt;
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/test.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
</font></pre>
<p>On the other hand, both rules could be combined in a single rule by making some small modifications to rule 1:</p>
<pre><font>
(: [rule 1]: map all *.xql requests to their source files in xquery/, and corresponding xslt stylesheets in stylesheets/ :)
if (matches($exist:path, '^/[^/]*.xql$') <span style="background-color:yellow;">or $exist:path eq '/'</span>) then
  <span style="background-color:yellow;">let $query := (substring-before(tokenize($exist:path, '/')[last()], '.')[normalize-space()], 'test')[1]</span>
  return
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
</font></pre>
<p>This rule then combines both conditions, and derives $query, the variable containing the base name for the corresponding XQuery and XSLT scripts, from the $exist:path part of the request, or sets ‘test’ as default value. By the way, this way of expressing default values for variables is a very useful XQuery programming pattern that can be found at the <a href="http://en.wikibooks.org/wiki/XQuery/Ahhas#Default_values" target="_blank">XQuery/Ah-has</a> of the invaluable <a href="http://en.wikibooks.org/wiki/XQuery" target="_blank">XQuery Wikibook</a>. </p>
<p>To summarize this section, we now can specify eXist controller rules that forward URLs to specific internal paths. In order to provide multiple ‘entry points’ for those rules, two approaches have been illustrated:</p>
<ul>
<li>redirection: force a URL to be processed by another controller rule by issuing a redirection that will rewrite the original URL to one that will be matched by the rule of interest </li>
<li>forwarding: expand rules with multiple matching conditions and further modifications where necessary to cope with specifics of those entry points </li>
</ul>
<p>Both approaches have their drawbacks: external redirection of requests can disturb clean URLs and possibly threaten link stability (by exposing the possibly less stable internal URLs to the user of the web application). The forwarding approach illustrated above suffers from either code redundancy, or code bloating. Combining multiple matching conditions may introduce a lot of complexity in the further processing code as well. In the next section, a more elegant solution to internal URL rewriting will be explored.</p>
<h2>2. Internal Rewrites</h2>
<h3>2.1 Simple internal rewrites</h3>
<p>This brings me to the actual motivation for this blog post. I am converting Cocoon+eXist-based web applications to purely eXist-driven ones. This involves porting the Cocoon <a href="http://cocoon.apache.org/2.1/userdocs/concepts/sitemap.html" target="_blank">sitemap</a> logic to eXist’s <a href="http://exist.sourceforge.net/urlrewrite.html" target="_blank">controller</a> logic. Cocoon sitemaps explicitly allow for what the documentation calls <a href="http://cocoon.apache.org/2.1/userdocs/concepts/redirection.html#N10027" target="_blank">internal redirection</a>, where a sitemap pipeline rewrites a URL for further processing by another pipeline in the background. For example, both rule 1 and rule 2 of above controller could be expressed in a Cocoon sitemap as follows:</p>
<p>&#160;</p>
<pre><span style="color:#008000;">&lt;!-- [rule 1]: map all *.xql requests to their source files in xquery/, and corresponding xslt stylesheets in stylesheets/ --&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">match</span> <span style="color:#ff0000;">pattern</span>=<span style="color:#0000ff;">&quot;*.xql&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">generate</span> <span style="color:#ff0000;">src</span>=<span style="color:#0000ff;">&quot;xquery/{1}.xql&quot;</span> <span style="color:#ff0000;">type</span>=<span style="color:#0000ff;">&quot;xquery&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">transform</span> <span style="color:#ff0000;">src</span>=<span style="color:#0000ff;">&quot;stylesheets/{1}.xsl&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">serialize</span> <span style="color:#ff0000;">type</span>=<span style="color:#0000ff;">&quot;xhtml&quot;</span><span style="color:#0000ff;">/&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">match</span><span style="color:#0000ff;">&gt;</span>

<span style="color:#008000;">&lt;!-- [rule 2]: redirect empty paths to main query --&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">match</span> <span style="color:#ff0000;">pattern</span>=<span style="color:#0000ff;">&quot;/&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">redirect</span>-<span style="color:#ff0000;">to</span> <span style="color:#ff0000;">uri</span>=<span style="color:#0000ff;">&quot;cocoon:/test.xql&quot;</span><span style="color:#0000ff;">/&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">map</span>:<span style="color:#800000;">match</span><span style="color:#0000ff;">&gt;</span></pre>
<p>The “cocoon:/” prefix before the redirect URL will force Cocoon to redirect the original request to “/test.xql”, but to do so silently, without issuing a real HTTP redirect. Result: rule 2 instructs the pipeline processor to look for other matching pipelines, in this case rule 1. The URL <a href="http://localhost:8080/urltest/" target="_blank">http://localhost:8080/urltest/</a> will execute the XQuery script at <a href="http://localhost:8080/urltest/test.xql" target="_blank">http://localhost:8080/urltest/test.xql</a>, without being replaced in the browser address bar. </p>
<p>As hinted at in the last part of the previous section, eXist does not offer such a shortcut for internal rewrites of URLs. Forwarding the original request to just “test.xql” resulted in an error because eXist’s XQueryServlet can’t find a source file at &#8216;”/urltest/test.xql”; just forwarding it to “xquery/test.xql” didn’t apply any XSLT stylesheet to the XQuery result. In short: rule 2 doesn’t fire rule 1. In fact, this seems to be an explicit limitation of the eXist controller mechanism, as noted in the <a href="http://localhost:8080/exist/urlrewrite.xml#d2e269" target="_blank">documentation</a>: </p>
<blockquote>
<p>It is important to understand that only one (!) controller will ever be applied to a given request. It is not possible to forward from one controller to another (or the same). Once you either ignored or forwarded a request in the controller, it will be directly passed to the servlet which handles it or &#8211; if it references a resource &#8211; it will be processed by the servlet engine itself. The controller will not be called again for the same request.</p>
</blockquote>
<p>This set me back for a while and made me resort to external redirection. For the example given so far, it doesn’t matter that much if an URL with an empty root path is explicitly rewritten to the URL for the default start page. However, for <a href="https://www.ibm.com/developerworks/webservices/library/ws-restful/#N1013C" target="_blank">RESTful URLs</a>, it would be less convenient to have e.g. <a href="http://localhost:8080/exist/urltest/hello/bobby" target="_blank">http://localhost:8080/exist/urltest/hello/bobby</a> explicitly redirected to <a href="http://localhost:8080/exist/urltest/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/test.xql?name=bobby</a>. Fortunately, it took me only one question on the eXist-open mailing list to find a way around this limitation. <a href="http://markmail.org/message/ll3g5ph5ykzqfqkr" target="_blank">Loren Cahlander</a> kindly pointed me toward an elegant solution, that delegates common behaviour of different rules in a controller to separate XQuery functions. This looks attractive to me for two reasons:</p>
<ul>
<li>brevity: common code is isolated, and redundancy can be reduced </li>
<li>flexibility: multiple entry points can still be provided in different controller rules, thus avoiding to create monolithic rules with unwieldy matching conditions that could further complicate their internal logic </li>
</ul>
<p>Let’s see how both controller rules we specified so far can be refactored by using functions. As a first step, let’s isolate the &lt;dispatch&gt; instruction of rule 1 in a separate function local:basicXQuery():</p>
<pre><font>
declare function local:basicXQuery($query) {
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
};
</font></pre>
<p>If we specify that this function takes a $query argument, all we have to do is call this function in both rules, with the right $query argument:</p>
<pre><font>
(: [rule 1]: map all *.xql requests to their source files in xquery/, and corresponding xslt stylesheets in stylesheets/ :)
if (matches($exist:path, '^/[^/]*.xql$')) then
  let $query := substring-before(tokenize($exist:path, '/')[last()], '.')
  return local:basicXQuery($query)
(: [rule 2]: redirect empty paths to main query :)
  else if ($exist:path eq '/') then
  local:basicXQuery('test')
</font></pre>
<p>This will have the same effect as both approaches concluding the previous section, but with the advantages mentioned above: brevity and flexibility, in short <em>elegance</em>.</p>
<h3>2.2 Passing Parameters</h3>
<p>So far for the simple case. Suppose we want to take this a step further and develop an embryonic system for RESTful URLs for our baby application that would map requests for <a href="http://localhost:8080/exist/urltest/hello/bobby" target="_blank">http://localhost:8080/exist/urltest/hello/bobby</a> in the background to <a href="http://localhost:8080/exist/urltest/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/test.xql?name=bobby</a>. This can be done easily by adding a new rule to the controller:</p>
<pre><font>
(: [rule 3]: forward RESTful urls expressed as /hello/name to xquery/test.xql with appropriate $name parameter :)
else if (starts-with($exist:path, '/hello/')) then
  local:basicXQuery('test')
</font></pre>
<p>At first sight, this seems to work fairly well: a request for <a href="http://localhost:8080/exist/urltest/hello/bobby" target="_blank">http://localhost:8080/exist/urltest/hello/bobby</a> produces following output:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>query output: hello anonymous<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h1</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span>processed by test.xsl<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">h2</span><span style="color:#0000ff;">&gt;</span></pre>
<p>There’s clearly one detail we have to fix: although we did supply a name in the URL (the part after ‘/hello/’, i.e. ‘bobby’), this was not passed to the local:basicXQuery() function. Since this name wasn’t supplied as a request parameter, it won’t be directly available to the ‘test.xql&#8217; XQuery file. Fortunately, eXist’s MVC framework provides a means to inject request parameters from a controller, by adding following element to a &lt;forward&gt; or &lt;redirect&gt; action:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">add</span>-<span style="color:#ff0000;">parameter</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;[name]&quot;</span> <span style="color:#ff0000;">value</span>=<span style="color:#0000ff;">&quot;[value]&quot;</span><span style="color:#0000ff;">/&gt;</span></pre>
<p>This should be done in the &lt;forward url=”{$exist:controller}/xquery/{$query}.xql”&gt; action of the local:baseXQuery() function. Of course, this could be done by modifying it to:</p>
<pre><font>
&lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
  { if (starts-with($exist:path, '/hello/')) then
      &lt;add-parameter name=&quot;name&quot; value=&quot;{tokenize($exist:path, '/')[3]}&quot;/&gt;
    else ()
  }
  &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
  &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
&lt;/forward&gt;
</font></pre>
<p>Here, the value for the “name” parameter will be derived from the part of the $exist:path variable following “/hello/”. This would work, but still wouldn’t be much removed from the code bloat the unified controller function at the end of section 1 suffered from.Instead, it would be more interesting to remove this case-specific code from the general local:baseXQuery() function, and move it to the calling functions instead. </p>
<p>Though rule 3 of our controller can’t add this parameter directly to the request, we can pass them in a useful form to the local:baseXQuery() function. In order make this mechanism as generic as possible, local:baseXQuery() would ideally remain agnostic about the actual (number of) parameters that could be passed from other controller rules. Therefore, it makes most sense to group all parameters in one single element, for example &lt;params&gt;, which can then specify all parameters to be passed in child elements. Now, nothing prevents us to construct those parameter definitions as separate &lt;add-parameter&gt; child elements of that &lt;params&gt; element:</p>
<pre><font>
(: [rule 3]: forward RESTful urls expressed as /hello/name to xquery/test.xql with appropriate $name parameter :)
else if (starts-with($exist:path, '/hello/')) then
  let $name := tokenize($exist:path, '/')[3]
  let $params :=
    &lt;params xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;add-parameter name=&quot;name&quot; value=&quot;{$name}&quot;/&gt;
    &lt;/params&gt;
  return local:basicXQuery('test', $params)
</font></pre>
<p>In order to process these parameters, the local:basicXQuery() function has to be changed so it accepts the second parameter $params. Then, it only requires a minimal change to make the function copy the &lt;add-parameter&gt; children of $params inside the first &lt;forward&gt; instruction:</p>
<pre><font>
declare function local:basicXQuery($query<span style="background-color:yellow;">, $params</span>) {
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
  	<span style="background-color:yellow;">{$params//add-parameter}</span>
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
};
</font></pre>
<p>Of course since XQuery doen’t allow to omit empty arguments from function calls, the other calls to local:basicXQuery() must be changed as well, so they pass an empty sequence () for the $params parameter. This adds up to the following final version of our controller:</p>
<pre><font>
declare function local:basicXQuery($query, $params) {
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;forward url=&quot;{$exist:controller}/xquery/{$query}.xql&quot;&gt;
        {$params//add-parameter}
        &lt;set-attribute name=&quot;xslt.input&quot; value=&quot;model&quot;/&gt;
        &lt;set-attribute name=&quot;xslt.stylesheet&quot; value=&quot;stylesheets/{$query}.xsl&quot;/&gt;
      &lt;/forward&gt;
      &lt;view&gt;
        &lt;forward servlet=&quot;XSLTServlet&quot;&gt;
          &lt;clear-attribute name=&quot;xslt.input&quot;/&gt;
        &lt;/forward&gt;
      &lt;/view&gt;
    &lt;/dispatch&gt;
};

(: [rule 1]: map all *.xql requests to their source files in xquery/, and corresponding xslt stylesheets in stylesheets/ :)
if (matches($exist:path, '^/[^/]*.xql$')) then
  let $query := substring-before(tokenize($exist:path, '/')[last()], '.')
  return local:basicXQuery($query, ())
(: [rule 2]: redirect empty paths to main query :)
  else if ($exist:path eq '/') then
  local:basicXQuery('test', ())
(: [rule 3]: forward RESTful urls expressed as /hello/name to xquery/test.xql with appropriate $name parameter :)
else if (starts-with($exist:path, '/hello/')) then
  let $name := tokenize($exist:path, '/')[3]
  let $params :=
    &lt;params xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;add-parameter name=&quot;name&quot; value=&quot;{$name}&quot;/&gt;
    &lt;/params&gt;
  return local:basicXQuery('test', $params)
(: default rule: just pass through everything else :)
else
    &lt;dispatch xmlns=&quot;http://exist.sourceforge.net/NS/exist&quot;&gt;
      &lt;cache-control cache=&quot;yes&quot;/&gt;
    &lt;/dispatch&gt;
</font></pre>
<p>Et voila: the test.xql XQuery file of our application can now be called by means of following URLs:</p>
<ul>
<li><a href="http://localhost:8080/exist/urltest/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/test.xql?name=bobby</a> </li>
<li><a href="http://localhost:8080/exist/urltest/" target="_blank">http://localhost:8080/exist/urltest/</a> (will be rewritten internally to <a href="http://localhost:8080/exist/urltest/test.xql" target="_blank">http://localhost:8080/exist/urltest/test.xql</a>) </li>
<li><a href="http://localhost:8080/exist/urltest/hello/bobby" target="_blank">http://localhost:8080/exist/urltest/hello/bobby</a> (<a href="http://localhost:8080/exist/urltest/test.xql?name=bobby" target="_blank">http://localhost:8080/exist/urltest/test.xql?name=bobby</a>) </li>
</ul>
<p>If you want to see it In action yourself, just <a href="http://ctb.kantl.be/download/urltest.zip">download the application source files</a>, extract the contents of the zip file to the %EXIST_HOME%/webapps, folder and navigate to any of the URLs mentioned above. I swear one day I’ll be able to produce a nice installable <a href="http://demo.exist-db.org/repo/repo.xml" target="_blank">XQuery package</a>…</p>
<p>On the other hand, I realize I’m just touching on the wealth of possibilities for developing advanced MVC patterns with eXist. For a much more advanced example, see how <a href="http://markmail.org/message/ej35xhnrzk4tkh6p" target="_blank">Joe Wicentowski</a> even further abstracted the MVC instructions (forward, redirect, ignore, and add-parameter) into dedicated functions. But this first step really was an eye-opener to me. Once again, thanks, Loren for pointing me in the right direction!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/177/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=177&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2011/12/07/internal-url-rewriting-with-exists-mvc-framework/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>From KWIC display to KWIC(er) processing with eXist</title>
		<link>http://rvdb.wordpress.com/2011/07/20/from-kwic-display-to-kwicer-processing-with-exist/</link>
		<comments>http://rvdb.wordpress.com/2011/07/20/from-kwic-display-to-kwicer-processing-with-exist/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 20:12:42 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[KWIC]]></category>
		<category><![CDATA[XQuery]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/?p=136</guid>
		<description><![CDATA[The eXist XML database has a dedicated XQuery module for displaying search results in a fixed context window, a visualization that is commonly known as a KeyWord In Context view. Search results are presented with a preceding and following text context (called further in this text left and right text context): &#60;p&#62; &#60;span class="previous"&#62;... s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=136&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The eXist XML database has a <a href="http://demo.exist-db.org/exist/kwic.xml" target="_blank">dedicated XQuery module</a> for displaying search results in a fixed context window, a visualization that is commonly known as a <u>K</u>ey<u>W</u>ord <u>I</u>n <u>C</u>ontext view. Search results are presented with a preceding and following text context (called further in this text <em>left </em>and <em>right </em>text context):</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">span</span> <span style="color:#ff0000;">class</span>=<span style="color:#0000ff;">"previous"</span><span style="color:#0000ff;">&gt;</span>... s effect, sir; after what flourish your <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">span</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">span</span> <span style="color:#ff0000;">class</span>=<span style="color:#0000ff;">"hi"</span><span style="color:#0000ff;">&gt;</span>nature<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">span</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">span</span> <span style="color:#ff0000;">class</span>=<span style="color:#0000ff;">"following"</span><span style="color:#0000ff;">&gt;</span> will.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">span</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This formatting of search results invites to exploit its particular features, such as sorting the search results according to their left or right contexts, or even according to the n<sup>th</sup> word preceding or following the search term. This is heavily facilitated by the XML representation of the KWIC search results, where all three parts are isolated in their own XML element. However, while eXist’s current KWIC <em>display</em> module (as it is consistently called) does its job in presenting a KWIC display, in my opinion it is too much display-oriented: </p>
<ul>
<li>it lacks performance on large result sets, and / or wide context widths, which is crucial for further processing, since sorting requires pre-computation of the entire result set
</li>
<li>(though this is nitpicking:) the output is presentational HTML; while this is irrelevant from a processing point of view, I would prefer a semantically more ‘neutral’ format and defer presentational formatting to a later display phase</li>
</ul>
<p>This post will address both objections and present alternatives. Additionally, ways for processing these KWIC results are discussed in the last section.</p>
<p><span id="more-136"></span>
<p>Disclaimer: this will be the most technical post so far, an non-programmer’s attempt to explain some algorithms. The last parts, however, return to practical grounds. Furthermore, a basic knowledge of the KWIC display function is assumed; see the <a href="http://demo.exist-db.org/exist/kwic.xml" target="_blank">eXist documentation</a> and <a href="http://demo.exist-db.org/exist/xquery/functions.xql?module=http://exist-db.org/xquery/kwic&amp;action=Browse" target="_blank">function reference</a> for full documentation.</p>
<h3>1. Strategies for improving the KWIC display module</h3>
<p>In order to anchor this discussion of the KWIC display module, I will refer to the <a href="http://exist.svn.sourceforge.net/viewvc/exist/trunk/eXist/src/org/exist/xquery/lib/kwic.xql?revision=9892" target="_blank">last changed revision (9892)</a> in the SVN trunk of the eXist project, rather than duplicating it here. It currently has 2 main entry points:</p>
<ul>
<li>kwic:summarize($node, $config, $callback?): a simplified function, allowing to pass a node containing search hits, a configuration element specifying display options like the context width or display format, and a callback function (optional) for specific processing of the left and right text contexts
</li>
<li>kwic:get-summary($root, $node, $config, $callback?): a more complex function, adding one more parameter: a root node that serves as a cutoff point for the left or right text contexts </li>
</ul>
<p>Actually, the kwic:get-summary() function is the basic function, since all the kwic:summarize() function does is determine a $root element for the text contexts and push further processing to kwic:get-summary().</p>
<p>The meat of the kwic:get-summary() function is a routine that determines the right amount of left and right text context to be included in the KWIC display. This is done by 2 helper functions, respectively kwic:truncate-previous() and kwic:truncate-following(). Both accept following arguments:</p>
<ul>
<li>$root: the root node that serves as a cutoff point for the text contexts
</li>
<li>$node: the current search term
</li>
<li>$truncated: the text context text constructed so far
</li>
<li>$max: the length of the text context constructed so far
</li>
<li>$chars: the remaining available length of the text context
</li>
<li>$callback: a reference to a specific function for further processing of the left or right text contexts </li>
</ul>
<p>Starting from the current search term, these functions will test if there is a preceding or following text node. If so, they will test whether this text node and the currently constructed text context exceed the maximum context length. If so, the text context is expanded with a substring of this text node, and this result is returned. If not, the truncation function calls itself recursively, to test for the text node preceding or following the currently selected text node, until the maximum context length is reached.</p>
<p>Since these recursive functions are called for each search hit, this means that their efficiency is cumulative, and bottlenecks add up. Actually, there is some room for improvement, by reducing the choices to be made and functions to be called. Let’s examine a tuned version of kwic:truncate-following():</p>
<pre><font color="#0000ff" face="Courier New">1 declare function kwic:truncate-following($root as node()?, $node as node()?,
    $truncated as item()*, $width as xs:int, $callback as function?)
2  {
3    let $nextProbe := $node/following::text()[1]
4    let $next :=
5      if ($root[not(. intersect $nextProbe/ancestor::*)]) then ()
6      else $nextProbe
7    let $probe :=
8      if (exists($callback)) then
9        concat($truncated, for $a in $next return kwic:callback($callback, $nextProbe, "after"))
10      else concat($truncated, ' ', $nextProbe)
11   return
12     if (string-length($probe) gt $width) then
13       let $norm := concat(' ', normalize-space($probe))
14       return
15         if (string-length($norm) le $width and $next) then
16           kwic:truncate-following($root, $next, $norm, $width, $callback)
17         else if ($next) then
18           concat(substring($norm, 1, $width), '...')
19         else
20           $norm
21     else if ($next) then
22       kwic:truncate-following($root, $next, $probe, $width, $callback)
23     else for $str in normalize-space($probe)[.] return concat(' ', $str)
24  };
</font></pre>
<p>This version has one argument less: the $chars argument is abandoned, and the $max argument renamed to $width (line 1). Like the original version, this function starts by determining a next candidate text node ($nextProbe), by selecting the first following text() node. Next (line 4), the $next variable will check whether a $root argument was supplied, and whether it contains the candidate text node. If so, or if no $root argument was supplied, the $nextProbe node is copied; else it is emptied. Finally (line 7), the currently truncated text is updated by concatenating the previously truncated context ($truncated) with the current candidate node, in the $probe variable. When a $callback argument was supplied, the candidate node is first processed by this function.</p>
<p>Next, the string length of this concatenated string is tested (line 12). If it is smaller than the maximum context width ($width), while a following text() node exists (line 21), the kwic:truncate-following() function is called anew with the updated truncated context. If there’s no following text() node, the $probe string is returned (line 23). If, on the other hand, the $probe string length exceeds the maximum context width, a substring is returned that is exactly as long as the context width specified in $width (line 20); possibly followed with ‘…’ to indicate that there’s more following text that has been truncated (line 18). In order to make sure that whitespace (which is mostly irrelevant in XML) doesn’t eat all the context width, a further test compares the normalized value of this string to $width. If this normalization ends up smaller than $width, a next iteration of kwic:truncate-following() is called (line 16).</p>
<p>The kwic:truncate-previous() function is analogous, but obviously constructs the $truncated string in the opposite direction. Note how the correct substring length can be determined by subtracting the $width from the candidate truncated context ($probe):</p>
<pre><font color="#0000ff" face="Courier New">concat('...', substring($norm, string-length($norm) - $width + 1))</font></pre>
<p>This design of the truncation functions tries to minimize the processing, by:</p>
<ul>
<li>minimizing the amount of tests and reducing redundancy (only 2 branching levels instead of 3 in the original KWIC module)
</li>
<li>isolating the functions to be performed on the truncated text in the relevant branches of the decision tree</li>
</ul>
<p>One costly operation is the determination whether the candidate text node still belongs to the $root cutoff node (line 5). This involves scanning all ancestors of the text node and checking whether the $root node is among them:</p>
<pre><font color="#0000ff" face="Courier New">if ($root[not(. intersect $nextProbe/ancestor::*)]) then () </font></pre>
<p>(Note how the intersect operator is used, instead of the lazy evaluation $root//$hits in the original version of this function, which can be considered a <a href="https://sourceforge.net/tracker/?func=detail&amp;atid=117691&amp;aid=3192401&amp;group_id=17691" target="_blank">bug</a>). This has to be performed for each new text() node the truncation functions iterate over, which quickly amounts when the number of search hits is and/or the context $width are large. This is another area for improvement of the original KWIC functions, that expect a mandatory root argument. This means that even for simple kwic:summarize() calls, a $root node <em>is</em> determined, namely the current node containing the search hits. Actually, this is redundant: when those nodes are expanded with <a href="http://demo.exist-db.org/exist/functions/util/expand" target="_blank">util:expand()</a>, they are returned as root nodes anyway (without the wider document context), containing a copy of their internal structure, injected with &lt;exist:match&gt; elements around the matching text fragments. Having a $root argument in the truncation functions is more expensive than not having it (since its presence triggers the ancestor lookup for the current text node), so it is unwise, performance-wise, to pass it by default (when the default use case for a KWIC display probably won’t need it anyway).</p>
<p>That’s why the truncation functions can be tuned further by making the $root argument optional (see the “$root as node()?” definition in line 1 above). In order to propagate this optional $root parameter to the higher-level functions, those have to be adapted as well. In the kwic:get-summary() function, this simply requires declaring the $root argument as “node()?” (the question mark indicating it can be empty). For the kwic:summarize() function it then suffices to just pass an empty node as the $root argument in its call to the kwic:get-summary() function. Following overview highlights the changes from the original functions in yellow:</p>
<pre><font color="#0000ff" face="Courier New">declare function kwic:get-summary($root as node()<span style="background-color:#ffff00;">?</span>, $node as element(exist:match),
  $config as element(config)) as element()
{
  kwic:get-summary($root, $node, $config, ())
};

declare function kwic:get-summary($root as node()<span style="background-color:#ffff00;">?</span>, $node as element(exist:match),
  $config as element(config), $callback as function?) as element()
{
  let $<span style="background-color:#ffff00;">width</span> := xs:int($config/@width)
  <span style="background-color:#ffff00;">let $format := $config/@format</span>
  <span style="background-color:#ffff00;">let $ps := $config/@preserve-space = ('yes', 'true')</span>

  let $prevTrunc := <span style="background-color:#ffff00;">if ($ps) then kwic:truncate-previous-ps($root, $node, (), $width, $callback)</span>
    <span style="background-color:#ffff00;">else</span> kwic:truncate-previous($root, $node, (), <span style="background-color:#ffff00;">$width</span>, $callback)
  let $followingTrunc := <span style="background-color:#ffff00;">if ($ps) then kwic:truncate-following-ps($root, $node, (), $width, $callback)</span>
    <span style="background-color:#ffff00;">else</span> kwic:truncate-following($root, $node, (), <span style="background-color:#ffff00;">$width</span>, $callback)
  return
    if (<span style="background-color:#ffff00;">$format eq 'p'</span>) then
      &lt;p&gt;
        &lt;span class="previous"&gt;{$prevTrunc}&lt;/span&gt;
        {
          if ($config/@link) then
            &lt;a class="hi" href="{$config/@link}"&gt;{ $node/text() }&lt;/a&gt;
          else
            &lt;span class="hi"&gt;{ $node/text() }&lt;/span&gt;
        }
        &lt;span class="following"&gt;{$followingTrunc}&lt;/span&gt;
      &lt;/p&gt;
    else <span style="background-color:#ffff00;">if ($format eq 'table') then</span>
      &lt;tr&gt;
        &lt;td class="previous"&gt;{$prevTrunc}&lt;/td&gt;
        &lt;td class="hi"&gt;
        {
          if ($config/@link) then
            &lt;a href="{$config/@link}"&gt;{$node/text()}&lt;/a&gt;
          else
            $node/text()
        }
        &lt;/td&gt;
        &lt;td class="following"&gt;{$followingTrunc}&lt;/td&gt;
      &lt;/tr&gt;
    <span style="background-color:#ffff00;">else</span>
      <span style="background-color:#ffff00;">&lt;KWIC xmlns="http://exist-db.org/xquery/kwic"&gt;</span>
        <span style="background-color:#ffff00;">&lt;prev&gt;{$prevTrunc}&lt;/prev&gt;</span>
        <span style="background-color:#ffff00;">&lt;hit&gt;{$node/text()}&lt;/hit&gt;</span>
        <span style="background-color:#ffff00;">&lt;next&gt;{$followingTrunc}&lt;/next&gt;</span>
      <span style="background-color:#ffff00;">&lt;/KWIC&gt;</span>
};

declare function kwic:summarize($hit as element(),
  $config as element(config), $callback as function?) as element()*
{
  let $expanded := util:expand($hit, "expand-xincludes=no")
  for $match in $expanded//exist:match
  return
    kwic:get-summary(<span style="background-color:#ffff00;">()</span>, $match, $config, $callback)
};</font></pre>
<h3>2. An improved KWIC display module</h3>
<p>The strategies discussed above have been implemented in an <a href="http://www.kantl.be/ctb/download/kwic.xql" target="_blank">updated KWIC module</a> that you can download and import in any XQuery. This updated KWIC display module provides exactly the same functionality as the original KWIC module, with improvements on following points:</p>
<ul>
<li>performance: for large node sets, this version of the function performs 30 to 35% faster (tested for very broad searches on 2 different test collections (non-public), for a number of search scenarios: varying context widths, varying $root nodes)
</li>
<li>accuracy: this version of the function normalizes the whitespace in the left and right text context</li>
</ul>
<ul>Of course, the truncation functions could be simplified even further by avoiding whitespace normalization (which reduces both the number of operations and decisions when truncating the text contexts). While this definitely can gain a couple of seconds on large node sets with large context widths, I feel strongly for the whitespace normalization, since it does a slightly more accurate job. But then, there’s no reason why the simplified truncation functions couldn’t be included in the KWIC module as well, and have the kwic:get-summary() function decide what version to call based on an extra configuration parameter. In the updated KWIC module, the left and right text contexts are normalized by default. This can be overridden by passing the value ‘yes’ (or ‘true’) to an extra attribute @preserve-space on the &lt;config&gt; element:</ul>
<ul>
<blockquote style="margin-right:0;">
<p>kwic:summarize(., &lt;config width=”40” preserve-space=”yes”/&gt;)</p>
<p>kwic:get-summary((), ., &lt;config width=”40” preserve-space=”true”/&gt;, ())</p>
</blockquote>
</ul>
<p>Another (cosmetic) change concerns the output formatting. The original KWIC module directly outputs HTML fragments, either as paragraph or table. Probably due to my biased text encoding background, this seems too display-oriented to my taste. In the updated KWIC module, a third (default) output display is provided, structured as follows:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>{$prevTrunc}<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>{$node/text()}<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>{$followingTrunc}<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
</pre>
<p>This format is taken as the default one, while both other formats can be output by providing appropriate values for the @format attribute on &lt;config&gt;: ‘p’ for paragraphs, ‘table’ for tables.</p>
<p>To wrap up, here is an overview of the configuration options in the updated KWIC module:</p>
<ul>
<li>width: a number indicating the context width
</li>
<li>link: an URL to which the hit is linked
</li>
<li>format: output format
<ul>
<li><em>p</em>: an HTML paragraph, containing &lt;span class=”previous”&gt;, &lt;span class=”hi”&gt;, and &lt;span class=”following”&gt;
</li>
<li>table: HTML table rows, containing &lt;td class=”previous&gt;, &lt;td class=”hi”&gt;, and &lt;td class=”following”&gt;
</li>
<li><em>KWIC</em> (default): a &lt;KWIC&gt; element, containing &lt;prev&gt;, &lt;hit&gt;, and &lt;next&gt; elements</li>
</ul>
</li>
<li>preserve-space: whitespace normalization
<ul>
<li><em>yes</em>|<em>true</em>: preserve original whitespace inside left and right text contexts
</li>
<li><em>no</em> (default): normalize whitespace inside left and right text contexts</li>
</ul>
</li>
</ul>
<p>These improvements open the way for more powerful exploitation of the KWIC results, as will be illustrated in the next section.</p>
<h3>3. Processing KWIC results</h3>
<h4>3.1 Contextual Sorting</h4>
<p>A more performant KWIC module eases further processing of the KWIC search results, beyond mere display functionality. Traditionally, applications offering KWIC display provide the option to sort the search results along the left and right text context. In order to do so in XQuery, the entire search result set must be collected and formatted as KWIC results first. Next, the left and right text contexts can be used to add sort keys. Let’s take following example, based on the ‘Keyword In Context with Callback’ example in the eXist <a href="http://demo.exist-db.org/exist/sandbox/sandbox.xql" target="_blank">Sandbox</a> application:</p>
<pre><font color="#0000ff" face="Courier New">import module namespace kwic="http://exist-db.org/xquery/kwic"
  at "xmldb:exist:///db/modules/kwic.xql";

declare function local:filter($node as node(), $mode as xs:string) as xs:string? {
  if ($node/parent::SPEAKER or $node/parent::STAGEDIR) then
      ()
  else if ($mode eq 'before') then
      concat($node, ' ')
  else
      concat(' ', $node)
};

let $config := &lt;config width="80" /&gt;
for $hit in doc("/db/shakespeare/plays/hamlet.xml")//SPEECH[ft:query(., "nature king sir")]
order by ft:score($hit) descending
return
  kwic:summarize($hit, $config,
            util:function(xs:QName("local:filter"), 2))</font></pre>
<p>For all &lt;SPEECH&gt; elements in the Shakespeare plays that contain either “nature”, “king”, or “sir”, this query returns all of these search hits, together with their left and right text context. Since no additional configuration options were passed besides @width, the whitespace in these results is normalized, and they are presented as &lt;kwic:KWIC&gt; chunks:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>The <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> , sir ,--<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>The king , <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>sir<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> ,--<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>To this effect, sir ; after what flourish your <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>nature<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> will.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#008000;">&lt;!-- ... --&gt;</span></pre>
<p>Note how the “order by” expression specifies that these results are ordered per &lt;SPEECH&gt; element, in decreasing relevance order (determined by ft:score()). </p>
<p>Let’s refactor this query, so it will sort the results along the keyword:</p>
<pre><font color="#0000ff" face="Courier New">import module namespace kwic="http://exist-db.org/xquery/kwic"
  at "xmldb:exist:///db/modules/kwic.xql";

declare function local:filter($node as node(), $mode as xs:string) as xs:string? {
  if ($node/parent::SPEAKER or $node/parent::STAGEDIR) then
      ()
  else if ($mode eq 'before') then
      concat($node, ' ')
  else
      concat(' ', $node)
};

let $config := &lt;config width="80" /&gt;
let $hits := doc("/db/shakespeare/plays/hamlet.xml")//SPEECH[ft:query(., "nature king sir")]
for $KWIC in $hits/kwic:summarize(., $config,
            util:function(xs:QName("local:filter"), 2))
order by $KWIC/kwic:hit ascending
return $KWIC</font></pre>
<p>This time, the KWIC results are collected in their own variable $KWIC, and sorted on their &lt;kwic:hit&gt; elements. This will put the hits for ‘king’ first, followed by the hits for ‘nature’, and ‘sir’.</p>
<p>Suppose we’re interested in the words following those hits. It’s just a matter of adding a second sort key to regroup them. We just need to replace the “order by” line with the following one:</p>
<pre><font color="#0000ff" face="Courier New">order by lower-case($KWIC/kwic:hit) ascending,
         lower-case(replace($KWIC/kwic:next, '\W', ''))

</font></pre>
<p>Since sorting is case sensitive, the order by expression above transforms all sort keys to lower case, and ignores all non-word characters in the right text context. </p>
<pre><span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>A man may fish with the worm that hath eat of a <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> , and cat of the fish that hath fed of that worm.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>...ood liege, I hold my duty, as I hold my soul, Both to my God and to my gracious <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> : And I do think, or else this brain of mine Hunts not the trail of policy so s...<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>A bloody deed! almost as bad, good mother, As kill a <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> , and marry with his brother.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>The <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> and queen and all are coming down.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#008000;">&lt;!-- ... --&gt;</span></pre>
<p>For sorting on the left text context, some additional processing is required: we’re interested in the words immediately preceding the search term; not in the first word of the left text context. Therefore, the left text context has to be reversed before sorting. This can be achieved by first tokenizing the string, reversing this sequence with the XQuery reverse() function, and reassembling this sequence to a string with the XQuery string-join() function:</p>
<pre><font color="#0000ff" face="Courier New">
order by lower-case($KWIC/kwic:hit) ascending,
         lower-case(string-join(reverse(tokenize($KWIC/kwic:prev, '\W+')), ' ')) 

</font></pre>
<p>Likewise, instead of the entire left or right text, individual words at a certain position can be used for sorting. For example, suppose we want to sort the results on the third word preceding the search term (in descending order), and then on the second one following it (ascending). This can be achieved by the XQuery tokenize() function:</p>
<pre><font color="#0000ff" face="Courier New">
order by lower-case($KWIC/kwic:hit) ascending,
         lower-case(reverse(tokenize($KWIC/kwic:prev, '\W+')[.])[3]) descending,
         lower-case(tokenize($KWIC/kwic:next, '\W+')[.][2])
</font></pre>
<pre><span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>He that plays the <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> shall be welcome; his majesty shall have tribute of me; the adventurous knight ...<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>Of all the days i' the year, I came to't that day that our last <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> Hamlet overcame Fortinbras.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>...es, to be demanded of a sponge! what replication should be made by the son of a <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> ?<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>That can I; At least, the whisper goes so. Our last <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> , Whose image even but now appear'd to us, Was, as you know, by Fortinbras of N...<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#008000;">&lt;!-- ... --&gt;</span></pre>
<p>Of course, while these sorting examples are still fairly basic, they could be useful already for linguistic exploration of documents indexed with eXist. A full linguistic tool kit would require more advanced features for deriving collocation information (what words occur with other words?), and statistics. That’s probably future music, but it sets one dreaming… </p>
<h4>3.2 An (Experimental) Collocation Table</h4>
<p>As an experiment, let’s see how far we get with a collocation table (caution: highly experimental!). Starting from KWIC search results, it is possible to compose a limited window of n words preceding and following the search term. When for each of these context slots all distinct words occurring at that slot are collected, a table can be composed listing all of these words-per-position. Let’s examine the steps in constructing such a collocation table, starting from KWIC search results (the full XQuery code can be downloaded <a href="http://www.kantl.be/ctb/download/collocation_table.xq" target="_blank">here</a>) :</p>
<pre><font color="#0000ff" face="Courier New">import module namespace kwic="http://exist-db.org/xquery/kwic" at "xmldb:exist:///db/modules/kwic.xql";

declare function local:filter($node as node(), $mode as xs:string) as xs:string? {
  if ($node/parent::SPEAKER or $node/parent::STAGEDIR) then
      ()
  else if ($mode eq 'before') then
      concat($node, ' ')
  else
      concat(' ', $node)
};

(: context scope: number of preceding / following words :)
let $scope := 5
(: determine context width for KWIC results: 10 characters per context word, minimally 40 :)
let $cutoff := max(($scope * 10, xs:int(40)))
let $config := &lt;config width="{$cutoff}" /&gt;
let $hits := doc("/db/shakespeare/plays/hamlet.xml")//SPEECH[ft:query(., "nature king sir")]
let $KWIC := $hits/kwic:summarize(., $config,
            util:function(xs:QName("local:filter"), 2))
</font></pre>
<p>This will produce a $KWIC variable with the search results formatted in a KWIC display (note: unordered, this time, since the results will be ordered later on):</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>Long live the <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> !<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>In the same figure, like the <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> that's dead.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">KWIC</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">"http://exist-db.org/xquery/kwic"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>Looks it not like the <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">prev</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>king<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span> ? mark it, Horatio.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">next</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">KWIC</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#008000;">&lt;!-- ... --&gt;</span></pre>
<p>Further processing is split per distinct search term, in order to keep their collocations nicely separated. In a next step, both left and right contexts are prepared for further processing:</p>
<pre><font color="#0000ff" face="Courier New">
(: split up collocations per search term :)
for $term in distinct-values($KWIC//kwic:hit/lower-case(normalize-space(.)))
order by $term
return
  (: prepare entire left / right contexts for tokenization:
        -lower case
        -normalize whitespace
        -reverse $prev
  :)
  let $prev :=
    for $a in $KWIC[kwic:hit/lower-case(normalize-space(.)) eq $term]/kwic:prev/lower-case(normalize-space(.))
    return string-join(reverse(tokenize($a, '\W+')), ' ')
  let $next := $KWIC[kwic:hit/lower-case(normalize-space(.)) eq $term]//kwic:next/lower-case(normalize-space(.))
</font></pre>
<p>Apart from being transformed to lower case and whitespace-normalized, the left context is reversed, and after this stage looks as follows:</p>
<pre>the live long
laertes king be shall laertes clouds the to t
be shall laertes clouds the to it applaud gues
...</pre>
<p>Next, a selection of these $prev and $next contexts is made, cutting them off at the number of words preceding or following the search term as defined in the $scope variable at the start:</p>
<pre><font color="#0000ff" face="Courier New">
  (: per context position, retrieve all distinct words :)
  (: note: discard numbers :)
  let $prev :=
    for $context in ((0 - $scope) to -1)
    let $words :=
      let $tok :=
        for $a in $prev
        return tokenize($a, '\W+')[matches(., '[a-zA-Z]')][position() = abs($context)]
      for $b in distinct-values($tok) order by $b return &lt;w&gt;{$b}&lt;/w&gt;
    return &lt;context pos="{$context}"&gt;{$words}&lt;/context&gt;
  let $next :=
    for $context in (1 to $scope)
    let $words :=
      let $tok :=
        for $a in $next
        return tokenize($a, '\W+')[matches(., '[a-zA-Z]')][$context]
      for $b in distinct-values($tok) order by $b return &lt;w&gt;{$b}&lt;/w&gt;
    return &lt;context pos="{$context}"&gt;{$words}&lt;/context&gt;
</font></pre>
<p>For example, the left context is broken into the 5 distinct words that occur before the search term, starting from the 5th up to the last word before the search term. This will produce an updated $prev variable that looks as follows:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">context</span> <span style="color:#ff0000;">pos</span>=<span style="color:#0000ff;">"-5"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>an<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>and<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>battlements<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">context</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">context</span> <span style="color:#ff0000;">pos</span>=<span style="color:#0000ff;">"-4"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>and<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>aside<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>body<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">context</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">context</span> <span style="color:#ff0000;">pos</span>=<span style="color:#0000ff;">"-3"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>alone<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>as<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>be<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">context</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">context</span> <span style="color:#ff0000;">pos</span>=<span style="color:#0000ff;">"-2"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>a<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>and<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>before<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">context</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">context</span> <span style="color:#ff0000;">pos</span>=<span style="color:#0000ff;">"-1"</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>a<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>be<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>bloat<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">w</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">context</span><span style="color:#0000ff;">&gt;</span>
</pre>
<p>Finally, after these $prev and $next results have been collected, they are presented in an HTML table, where each position in the left and right contexts is represented in a column. All words occurring in that position are then presented in an own row. </p>
<pre><font color="#0000ff" face="Courier New">
  let $max := max(($next|$prev)/count(w))

  (: spread out words-per-context over table rows :)
  return
    &lt;table border="1"&gt;{
      &lt;tr&gt;
        &lt;th/&gt;
        {
          for $a in $prev
          return &lt;th&gt;{$a/@pos/string()}&lt;/th&gt;,
          &lt;th&gt;term&lt;/th&gt;,
          for $a in $next
          return &lt;th&gt;{$a/@pos/string()}&lt;/th&gt;
        }
      &lt;/tr&gt;,
      for $i in (1 to $max)
      return
        &lt;tr&gt;{
          &lt;td&gt;{$i}&lt;/td&gt;,
          for $a in $prev
          return
            &lt;td&gt;{$a/w[$i]/text()}&lt;/td&gt;,
          &lt;th&gt;{$term}&lt;/th&gt;,
          for $a in $next
          return
            &lt;td&gt;{$a/w[$i]/text()}&lt;/td&gt;
        }&lt;/tr&gt;
    }&lt;/table&gt;
</font></pre>
<p>This produces following collocation tables (split per search term) for our query (note: this is only a summary; a full version can be found <a href="http://www.kantl.be/ctb/download/collocationSample.htm" target="_blank">here</a>):</p>
<table border="1">
<tbody>
<tr>
<th>
</th>
<th>-5</th>
<th>-4</th>
<th>-3</th>
<th>-2</th>
<th>-1</th>
<th>term</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
</tr>
<tr>
<td>1</td>
<td>an</td>
<td>and</td>
<td>alone</td>
<td>a</td>
<td>a</td>
<th>king</th>
<td>and</td>
<td>a</td>
<td>a</td>
<td>a</td>
<td>and</td>
</tr>
<tr>
<td>2</td>
<td>and</td>
<td>aside</td>
<td>as</td>
<td>and</td>
<td>be</td>
<th>king</th>
<td>as</td>
<td>be</td>
<td>again</td>
<td>all</td>
<td>applaud</td>
</tr>
<tr>
<td>3</td>
<td>battlements</td>
<td>body</td>
<td>be</td>
<td>before</td>
<td>bloat</td>
<th>king</th>
<td>best</td>
<td>but</td>
<td>and</td>
<td>and</td>
<td>are</td>
</tr>
<tr>
<td>4</td>
<td>but</td>
<td>can</td>
<td>business</td>
<td>body</td>
<td>danish</td>
<th>king</th>
<td>but</td>
<td>cat</td>
<td>be</td>
<td>beggar</td>
<td>bed</td>
</tr>
<tr>
<td>5</td>
<td>by</td>
<td>clouds</td>
<td>conjuration</td>
<td>but</td>
<td>fat</td>
<th>king</th>
<td>caps</td>
<td>denmark</td>
<td>can</td>
<td>but</td>
<td>begin</td>
</tr>
<tr>
<td>6</td>
<td style="text-align:center;" colspan="10">&#8230;</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<th>
</th>
<th>-5</th>
<th>-4</th>
<th>-3</th>
<th>-2</th>
<th>-1</th>
<th>term</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
</tr>
<tr>
<td>1</td>
<td>action</td>
<td>after</td>
<td>a</td>
<td>capital</td>
<td>baser</td>
<th>nature</th>
<td>and</td>
<td>and</td>
<td>and</td>
<td>absurd</td>
<td>alone</td>
</tr>
<tr>
<td>2</td>
<td>audience</td>
<td>and</td>
<td>am</td>
<td>days</td>
<td>for</td>
<th>nature</th>
<td>are</td>
<td>any</td>
<td>as</td>
<td>compell</td>
<td>and</td>
</tr>
<tr>
<td>3</td>
<td>can</td>
<td>are</td>
<td>and</td>
<td>fault</td>
<td>hast</td>
<th>nature</th>
<td>as</td>
<td>between</td>
<td>bear</td>
<td>devil</td>
<td>awake</td>
</tr>
<tr>
<td>4</td>
<td>crimeful</td>
<td>as</td>
<td>canker</td>
<td>flourish</td>
<td>in</td>
<th>nature</th>
<td>cannot</td>
<td>burnt</td>
<td>ever</td>
<td>evil</td>
<td>away</td>
</tr>
<tr>
<td>5</td>
<td>done</td>
<td>change</td>
<td>commendable</td>
<td>fools</td>
<td>of</td>
<th>nature</th>
<td>come</td>
<td>by</td>
<td>exception</td>
<td>grow</td>
<td>by</td>
</tr>
<tr>
<td>6</td>
<td style="text-align:center;" colspan="10">&#8230;</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<th>
</th>
<th>-5</th>
<th>-4</th>
<th>-3</th>
<th>-2</th>
<th>-1</th>
<th>term</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
</tr>
<tr>
<td>1</td>
<td>a</td>
<td>a</td>
<td>a</td>
<td>a</td>
<td>ay</td>
<th>sir</th>
<td>after</td>
<td>a</td>
<td>are</td>
<td>an</td>
<td>a</td>
</tr>
<tr>
<td>2</td>
<td>against</td>
<td>away</td>
<td>approve</td>
<td>all</td>
<td>but</td>
<th>sir</th>
<td>an</td>
<td>aery</td>
<td>as</td>
<td>and</td>
<td>ambassador</td>
</tr>
<tr>
<td>3</td>
<td>dies</td>
<td>fell</td>
<td>are</td>
<td>am</td>
<td>cannot</td>
<th>sir</th>
<td>and</td>
<td>all</td>
<td>did</td>
<td>any</td>
<td>and</td>
</tr>
<tr>
<td>4</td>
<td>great</td>
<td>foolish</td>
<td>be</td>
<td>and</td>
<td>carriages</td>
<th>sir</th>
<td>are</td>
<td>answer</td>
<td>diligence</td>
<td>as</td>
<td>can</td>
</tr>
<tr>
<td>5</td>
<td>head</td>
<td>forget</td>
<td>but</td>
<td>ay</td>
<td>come</td>
<th>sir</th>
<td>but</td>
<td>away</td>
<td>done</td>
<td>but</td>
<td>clay</td>
</tr>
<tr>
<td>6</td>
<td style="text-align:center;" colspan="10">&#8230;</td>
</tr>
</tbody>
</table>
<p>Note how the value of these collocation tables is fairly limited, though. Since the context words have been ordered alphabetically, all these tables can provide is an alphabetical list of words occurring per position before or after the search term (when read per column). It would of course be more interesting to have access to statistical information that can indicate the significance of the individual words per position. That would allow to order them by statistical saliency, so the tables could provide an overview of the most frequent words at their respective positions. Unfortunately, as such data is not (yet?) available in eXist, the meaningfulness of the collocation data presented above is quite questionable, whence the experimental character of this illustration. Anyway, if you’d like to experiment with it, feel free to download the updated KWIC module (<a href="http://www.kantl.be/ctb/download/kwic.xql" target="_blank">kwic.xql</a>), the full XQuery script (<a href="http://www.kantl.be/ctb/download/collocation_table.xq" target="_blank">collocation_table.xq</a>) and&nbsp; sample output (<a href="http://www.kantl.be/ctb/download/collocationSample.htm" target="_blank">collocationSample.htm</a>).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/136/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=136&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2011/07/20/from-kwic-display-to-kwicer-processing-with-exist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>Venturing into versions: strategies for querying a TEI apparatus with eXist</title>
		<link>http://rvdb.wordpress.com/2011/04/20/venturing-into-versions-strategies-for-querying-a-tei-apparatus/</link>
		<comments>http://rvdb.wordpress.com/2011/04/20/venturing-into-versions-strategies-for-querying-a-tei-apparatus/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 15:38:28 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XQuery]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/2011/04/20/venturing-into-versions-strategies-for-searching-a-tei-apparatus/</guid>
		<description><![CDATA[When encoding a critical edition in XML, one of the challenges facing the text encoder is finding a way to represent multiple versions of a work in a sensible way. As usual when it comes to the electronic representation of texts in the field of the humanities, such a sensible way is provided by the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=132&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When encoding a critical edition in XML, one of the challenges facing the text encoder is finding a way to represent multiple versions of a work in a sensible way. As usual when it comes to the electronic representation of texts in the field of the humanities, such a sensible way is provided by the <a href="http://www.tei-c.org/" target="_blank">Text Encoding Initiative</a> (TEI). Actually, three ways are offered, though this post will focus on the so-called <em>parallel-segmentation </em>method (for extensive reference, the reader is directed to <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TC.html" target="_blank">chapter 12: Critical Apparatus</a> of the TEI Guidelines). In short: this method allows an encoder to represent <em>all </em>text versions of a work within a single XML source, where places with variant text are encoded as an inline <em>apparatus </em>(&lt;app&gt;), in which the distinct variants are identified as <em>readings</em> (&lt;rdg wit=”[sigil]”&gt;), whose @wit attribute links them to (an) identified version(s) of the work. At this point, a lot more could be said about both edition and markup theoretic aspects, but this won’t be the focus of this post.</p>
<p>Instead, this post will focus on a topic I saw myself confronted with when developing an application (i.e. a web interface) for such an edition: how do you search within such ‘multiversion’ texts? Most probably, users of the edition would want to focus on one (or a selection of) text version(s). Of course, when version 1 contains the word ‘hope’, which in version 2 had been changed to ‘despair’, (only) the right readings should be retrieved for the respective text version.</p>
<p><span id="more-132"></span>
<p>Enter XQuery and the <a href="http://exist-db.org/" target="_blank">eXist XML database</a>. As with the specific background of scholarly editing, a decent familiarity with the basic concepts of indexing and searching documents with eXist will be assumed. The focus will lie on theoretical aspects of two key problems: 1) full-text searching and 2) index retrieval of terms in ‘multiversion’ texts. Two possible approaches are discussed and evaluated:</p>
<ol>
<li>Querying a single ‘multiversion’ source text </li>
<li>Splitting up the text in separate source texts per text version and querying those </li>
</ol>
<p>Note: though this is quite a technical discussion, you can download <a href="http://ctb.kantl.be/download/appSearch_db.zip" target="_blank">appSearch_db.zip</a>, an eXist backup file containing all files discussed here. They can be installed in one click in an eXist database by choosing to <a href="http://exist-db.org/backup.html" target="_blank">restore</a> this backup file.</p>
<h3>1. Single source text</h3>
<p>I’ve started exploring this approach from the desire to do something clever with a ‘multiversion’ text containing all variants occurring in all text versions included in the edition. It starts from a single indexed source text containing transcriptions for all distinct text versions. For example, let’s take a 2-paragraph text for which 4 text versions have been collated (which will be indexed as <em>test.xml</em> in a dedicated collection in the eXist database, namely <em>/db/test</em><em>)</em>:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">div</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">listWit</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">witness</span> <span style="color:#ff0000;">xml</span>:<span style="color:#ff0000;">id</span>=<span style="color:#0000ff;">&quot;w1&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">witness</span> <span style="color:#ff0000;">xml</span>:<span style="color:#ff0000;">id</span>=<span style="color:#0000ff;">&quot;w2&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">witness</span> <span style="color:#ff0000;">xml</span>:<span style="color:#ff0000;">id</span>=<span style="color:#0000ff;">&quot;w3&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">witness</span> <span style="color:#ff0000;">xml</span>:<span style="color:#ff0000;">id</span>=<span style="color:#0000ff;">&quot;w4&quot;</span><span style="color:#0000ff;">/&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">listWit</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>This is a paragraph with common text.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>This paragraph has
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">app</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">rdg</span> <span style="color:#ff0000;">wit</span>=<span style="color:#0000ff;">&quot;#w1 #w3&quot;</span><span style="color:#0000ff;">&gt;</span>variants<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">rdg</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">rdg</span> <span style="color:#ff0000;">wit</span>=<span style="color:#0000ff;">&quot;#w2&quot;</span><span style="color:#0000ff;">&gt;</span>variant text<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">rdg</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">rdg</span> <span style="color:#ff0000;">wit</span>=<span style="color:#0000ff;">&quot;#w4&quot;</span><span style="color:#0000ff;">&gt;</span>variant test<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">rdg</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">app</span><span style="color:#0000ff;">&gt;</span>.
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">div</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This ‘single source’ approach requires some consideration w.r.t. index configuration, and search and index lookup scripts. Assume the &lt;p&gt; elements will be the text unit we’re interested in for querying this document. Since the &lt;rdg&gt; elements containing text variants are contained by &lt;p&gt;, there’s no way to exclude the contents of irrelevant &lt;rdg&gt;s from the search space. Without precautions, hence, fulltext searches on paragraphs will always include text variants from all text versions enclosed in &lt;app&gt; elements. In order to be able to address the contents of their embedded &lt;rdg&gt; elements separately, both should be indexed separately. This can be specified in a dedicated configuration file at <em>/db/system/conf/db/test/collection.xconf</em>:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">collection</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://exist-db.org/collection-config/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">validation</span> <span style="color:#ff0000;">mode</span>=<span style="color:#0000ff;">&quot;auto&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">index</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">tei</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">fulltext</span> <span style="color:#ff0000;">default</span>=<span style="color:#0000ff;">&quot;none&quot;</span> <span style="color:#ff0000;">attributes</span>=<span style="color:#0000ff;">&quot;no&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:p&quot;</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">ignore</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:rdg&quot;</span><span style="color:#0000ff;">/&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">text</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:rdg&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">create</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;@wit&quot;</span> <span style="color:#ff0000;">type</span>=<span style="color:#0000ff;">&quot;xs:string&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">index</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">collection</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This index definition excludes the content of &lt;rdg&gt; elements from searches focusing on &lt;p&gt;, like //tei:p[ft:query(., ‘test’)], while still allowing to query them directly with queries like //tei:rdg[ft:query(., ‘test’)].</p>
<h4>1.1 Search script</h4>
<p>The main concern for the search script is to separate the search contexts with common text (shared among all text versions) from version-specific search contexts (represented as &lt;rdg&gt; elements within &lt;app&gt;). Following XQuery script illustrates a possible approach:</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
let $docs := doc('/db/test/test.xml')
let $rdgs :=
  for $rdg in $docs//tei:listWit//tei:witness/@xml:id
  return concat('#', $rdg)
let $pool := $docs//tei:p
let $query := 'text'
return $pool[ft:query((.|.//tei:rdg[tokenize(string(@wit), '\s+') = $rdgs]), $query)]</font></pre>
<p>Note, how in this script the $rdgs variable is populated with the sigla of <em>all</em> text versions that have been defined in the @xml:id attributes of the different &lt;witness&gt; elements in test.xml. This might seem to undermine the exact point of wanting to address the text variants separately, but is used here to illustrate the maximal possible query scenario (searching in all text versions). Narrower selections are possible, of course, and can be mimicked by replacing it with e.g. $rdgs := (‘#1’, ‘#4’).</p>
<p>In this query, first the nodes to be queried are collected in a $pool variable (in this case, all &lt;p&gt; nodes inside test.xml). Note, how for real-life documents, this set of nodes could be expanded with all desired text elements one would want to query. The actual differentiation lies in the search expression, which performs a ft:query() full-text search on both this set of nodes <em>and</em> any embedded &lt;rdg&gt; elements whose @wit attributes contain references to (one of) the text version(s) selected for the search (by checking whether any of the tokens for the tokenized @wit attribute value equals any of the text versions defined in the $rdgs variable).</p>
<p>Now, above query probably won’t yield much useful results: while all matching paragraphs are retrieved, they are presented as a bag of undifferentiated nodes without any duplicates. In other words: from these results it’s impossible to tell what search result occurs in what text version. This can be improved in the next version:</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
let $docs := doc('/db/test/test.xml')
let $rdgs := for $rdg in $docs//tei:listWit//tei:witness/@xml:id return concat('#', $rdg)
let $pool := $docs//tei:p
let $query := 'text'

for $rdg in $rdgs
let $results := $pool[ft:query((.|.//tei:rdg[tokenize(string(@wit), '\s+') = $rdg]), $query)]
for $result in $results
return &lt;result wit=&quot;{$rdg}&quot;&gt;{$result}&lt;/result&gt;</font></pre>
<p>By looping over the selected text versions defined in $rdgs, this script will repeat the full-text search for each text version and present all results <em>per version</em>, while identifying the associated text version by its siglum in a @wit attribute on the &lt;result&gt; element.</p>
<h4>1.2 Index lookup script</h4>
<p>The same differentiation technique can be applied in a script for lookup of indexed terms in a (selection of) text version(s):</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
declare function local:term-callback($term, $data) {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot; n=&quot;{$data[3]}&quot;&gt;{$term}&lt;/term&gt;
};

let $callback := util:function(xs:QName('local:term-callback'), 2)
let $docs := doc('/db/test/test.xml')
let $rdgs := for $rdg in $docs//tei:listWit//tei:witness/@xml:id return concat('#', $rdg)
let $pool := $docs//tei:p
let $query := 'text'
let $nodes :=
  for $a in $pool|$pool//tei:rdg[tokenize(string(@wit), '\s+') = $rdgs]
  (: here nodes can be refined first by querying, if required :)
  return $a(:[ft:query(., $query)]:)
for $term in util:index-keys($nodes, '', $callback, 15000, 'lucene-index')
(:order by $term/@freq/number() descending:)
return $term</font></pre>
<p>This script first selects the paragraphs and (only) relevant &lt;rdg&gt; elements in the $nodes variable, using the technique sketched above. (Note how the comments are provided as hooks for further refinement of search contexts and ordering of the search results. ) Next, those selected nodes are passed to the eXist-specific util:index-keys() function that will collect their different index terms in the $terms variable. When inspecting the results of the script above, some things catch the eye:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>common<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>has<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>paragraph<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>test<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;5&quot;</span><span style="color:#0000ff;">&gt;</span>text<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;6&quot;</span><span style="color:#0000ff;">&gt;</span>variant<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;7&quot;</span><span style="color:#0000ff;">&gt;</span>variants<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span></pre>
<p>While this script does a decent job w.r.t. separating the common from the version-specific search contexts (try with separating out some sigla and compare the results), there is a problem with the statistics:</p>
<ul>
  </ul>
<ul>
<li>
<p>term frequency: all index terms in common text (i.e. text shared between all text versions) are counted only once</p>
</li>
<li>
<p>document count: the number of documents in which the terms occur is always 1 (as only one indexed document is queried)</p>
</li>
</ul>
<p>Instead, the frequencies and document count of terms in common text should be multiplied with the number of text versions being queried. For example, the word ‘common’ occurs once in all text versions and hence should amount to 4 occurrences in 4 documents. Yet, it’s less straightforward for index terms occurring both in common and version-specific text: the term ‘text’ occurs once within common text, and once within version ‘w2’, totaling 5 (4 + 1) occurrences in 4 documents. When statistics matter, this differentiation between the occurrences in the selected text versions should be accounted for in the index lookup script. This <i>can</i> be worked around (theoretically) by first generating separate index lists per version, and later adding all separate statistics:</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
declare function local:term-callback($term, $data) {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot;&gt;{$term}&lt;/term&gt;
};

let $callback := util:function(xs:QName('local:term-callback'), 2)
let $docs := doc('/db/test/test.xml')
let $rdgs := for $rdg in $docs//tei:listWit//tei:witness/@xml:id return concat('#', $rdg)
let $pool := $docs//tei:p
let $query := 'text'

let $terms :=
  for $rdg in $rdgs
  let $nodes :=
    for $a in $pool|$pool//tei:rdg[tokenize(string(@wit), '\s+') = $rdg]
    (: here nodes can be refined first by querying, if required :)
    return $a(:[ft:query(., $query)]:)
  return
    for $term in util:index-keys($nodes, '', $callback, 15000, 'lucene-index')
    return &lt;term wit=&quot;{$rdg}&quot;&gt;{
      $term/(@*, node())
    }&lt;/term&gt;

let $conflateTerms :=
  for $term in distinct-values($terms)
  let $groupTerms := $terms[. eq $term]
  return &lt;term&gt;
  {
    for $att in $groupTerms[1]/@*[name() != 'wit']
    return attribute {$att/name()} {sum($groupTerms/@*[name() eq $att/name()])}
  }
  {$term}
  &lt;/term&gt;

for $a in $conflateTerms
(:order by $a/@freq/number() descending:)
return $a</font></pre>
<p>In this script, the $terms variable first collects all index terms&#160; per selected text version, by looping over the $rdgs variable and performing an index lookup on the nodes occurring in that text version. A next step conflates the distinct terms in the $conflateTerms variable, by grouping the unique terms in $terms and accumulating the stats for their respective occurrences. This produces the correct statistics:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;4&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>common<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;4&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>has<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;8&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>paragraph<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;5&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>text<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>variants<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>variant<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>test<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Due to the lack of a dedicated grouping mechanism in XQuery 1.0 (which should be addressed with the &#8216;<a href="http://www.w3.org/TR/xquery-30/#id-group-by" target="_blank">group by</a>’ clause in the upcoming XQuery 3.0 specification), the distinct-values() route used in the $conflateTerms variable is the only way to achieve this grouping, without resorting to the undocumented ‘group by‘ extension in eXist). This doesn’t scale well on large documents with many versions. </p>
<p>One way of speeding this up a bit, is by delegating the conflation of the separate index terms to XSLT, by using eXist’s transform:transform() function. Since version 2.0, native grouping capabilities have been added to XSLT, which definitely outperform the distinct-values() approach with XQuery. Hence, the $conflateTerms variable could instead be computed via XSLT:</p>
<pre><font color="#0000ff" face="Courier New">let $conflateXSLT :=
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
    xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
    exclude-result-prefixes=&quot;xs&quot;
    version=&quot;2.0&quot;&gt;
    &lt;xsl:template match=&quot;terms&quot;&gt;
        &lt;xsl:for-each-group select=&quot;term&quot; group-by=&quot;.&quot;&gt;
            &lt;term&gt;
                &lt;xsl:for-each select=&quot;@*[name() != 'wit']&quot;&gt;
                    &lt;xsl:sort select=&quot;.&quot;/&gt;
                    &lt;xsl:variable name=&quot;attName&quot; select=&quot;name()&quot;/&gt;
                    &lt;xsl:attribute name=&quot;{{$attName}}&quot;&gt;
                        &lt;xsl:value-of select=&quot;sum(current-group()//@*[name() = $attName])&quot;/&gt;
                    &lt;/xsl:attribute&gt;
                &lt;/xsl:for-each&gt;
                &lt;xsl:value-of select=&quot;current-grouping-key()&quot;/&gt;
            &lt;/term&gt;
        &lt;/xsl:for-each-group&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

let $conflateTerms := transform:transform(&lt;terms&gt;{$terms}&lt;/terms&gt;, $conflateXSLT, ()</font></pre>
<p>Still, this approach remains very fragile when it comes to performance: the main bottleneck is the requirement that <em>full</em> index lookups must be repeated for <em>all </em>selected text versions. Next, those (possibly huge) term collections must be ordered and their frequencies added. Performance of this approach hence is entirely dependent on:</p>
<ul>
<li>the number of text versions selected for the search </li>
<li>the size of the node set to be searched </li>
</ul>
<p>Again, further optimisation is possible, by restricting the number of index lookups as much as possible. This can be achieved by splitting up the previous $terms variable into two variables, containing only the common index terms and only the version-specific index terms, respectively:</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
declare function local:term-callback($term, $data) {
 &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot;&gt;{$term}&lt;/term&gt;
};

let $callback := util:function(xs:QName('local:term-callback'), 2)
let $docs := doc('/db/test/test.xml')
let $rdgs := for $rdg in $docs//tei:listWit//tei:witness/@xml:id return concat('#', $rdg)
let $pool := $docs//tei:p
let $query := 'text'

let $commonTerms :=
  let $nodes :=
    for $a in $pool
    (: here nodes can be refined first by querying, if required :)
    return $a(:[ft:query(., $query)]:)
  return
    for $term in util:index-keys($nodes, '', $callback, 15000, 'lucene-index')
    return &lt;term&gt;{
      (
      for $att in $term/@*
      return attribute {$att/name()} {$att * count($rdgs)}
      ,
      $term/node()
      )
    }&lt;/term&gt;

let $rdgTerms :=
  for $rdg in $rdgs
  let $nodes :=
    for $a in $pool//tei:rdg[tokenize(string(@wit), '\s+') = $rdg]
    (: here nodes can be refined first by querying, if required :)
    return $a(:[ft:query(., $query)]:)
  return
    for $term in util:index-keys($nodes, '', $callback, 15000, 'lucene-index')
    return &lt;term wit=&quot;{$rdg}&quot;&gt;{
      $term/(@*, node())
    }&lt;/term&gt;

let $conflateXSLT :=
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
    xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
    exclude-result-prefixes=&quot;xs&quot;
    version=&quot;2.0&quot;&gt;
    &lt;xsl:template match=&quot;terms&quot;&gt;
        &lt;xsl:for-each-group select=&quot;term&quot; group-by=&quot;.&quot;&gt;
            &lt;term&gt;
                &lt;xsl:for-each select=&quot;@*[name() != 'wit']&quot;&gt;
                    &lt;xsl:sort select=&quot;.&quot;/&gt;
                    &lt;xsl:variable name=&quot;attName&quot; select=&quot;name()&quot;/&gt;
                    &lt;xsl:attribute name=&quot;{{$attName}}&quot;&gt;
                      &lt;xsl:choose&gt;
                        &lt;xsl:when test=&quot;name() = 'docs'&quot;&gt;
                          &lt;xsl:value-of select=&quot;(current-group()[not(@wit)]//@*[name() = $attName],
                                                 sum(current-group()//@*[name() = $attName]))[1]&quot;/&gt;
                        &lt;/xsl:when&gt;
                        &lt;xsl:otherwise&gt;
                          &lt;xsl:value-of select=&quot;sum(current-group()//@*[name() = $attName])&quot;/&gt;
                        &lt;/xsl:otherwise&gt;
                      &lt;/xsl:choose&gt;
                    &lt;/xsl:attribute&gt;
                &lt;/xsl:for-each&gt;
                &lt;xsl:value-of select=&quot;current-grouping-key()&quot;/&gt;
            &lt;/term&gt;
        &lt;/xsl:for-each-group&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;

let $conflateTerms := transform:transform(&lt;terms&gt;{$commonTerms, $rdgTerms}&lt;/terms&gt;, $conflateXSLT, ())

for $a in $conflateTerms
(:order by $a/@freq/number() descending:)
return $a</font></pre>
<p>The common terms (i.e. terms occurring outside of &lt;rdg&gt; elements, which can be assumed to be the majority of terms in a document) are now retrieved with a single util:index-keys() lookup within the $commonTerms variable. Note how the statistics are adjusted: since all of these terms occur in all selected text versions, their occurrences and document count numbers are multiplied with the number of text versions selected. This leaves the version-specific terms (i.e. those occurring within the relevant &lt;rdg&gt; elements), to be collected in the $rdgTerms variable. Again, the number of selected text versions determines the number of index lookups, but this time the node set on which the lookups are performed is substantially cut down to only the (relevant) &lt;rdg&gt; nodes. Consequently, the subsequent conflation of the index terms has to deal with much less nodes and gains in efficiency. Note, how the XSLT script had to be adapted for the computation of the total @docs metric: for terms occurring both within common and version-specific contexts, the total number of documents should be taken as cutoff point. This total is retrieved by selecting the @docs value of the &lt;term&gt; without a @wit attribute, when available. In human language: this prevents that such terms would be computed to occur in more documents than the number of text versions selected.</p>
<p>Still, in a maximal scenario where full index scans are performed for large texts with many versions, this strategy could be prohibitively expensive. Before evaluating it, let’s have a look at another alternative for querying ‘multiversion’ texts, in the next section.</p>
<h3>2. Index all versions separately</h3>
<p>This approach simplifies on-the-fly computation of the different text versions enclosed in a <em>parallel-segmented</em> TEI text, by separating out all distinct text versions into complete XML source texts in their own right, and indexing and querying those ‘single-version’ texts separately in eXist. </p>
<p>Splitting up a ‘multiversion’ text into distinct text versions can be done easily enough via a batch XSLT script prior to indexing those texts, but adds to the maintenance cost: if something changes to the ‘multiversion’ text, all derived versions should be updated as well, and indexed again. However, an appealing alternative may be found in eXist’s <a href="http://exist-db.org/triggers.html" target="_blank">trigger facilities</a>. Instead of prior batch processing, eXist can be made to automatically apply the XSLT transformation upon indexing of the ‘multiversion’ text. </p>
<h4>2.1 Trigger setup</h4>
<p>In order to configure triggers for the <em>/db/test </em>collection, a &lt;trigger&gt; section should be added to <em>/db/system/config/db/test/collection.xconf</em>:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">collection</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://exist-db.org/collection-config/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">validation</span> <span style="color:#ff0000;">mode</span>=<span style="color:#0000ff;">&quot;auto&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">index</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">tei</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">fulltext</span> <span style="color:#ff0000;">default</span>=<span style="color:#0000ff;">&quot;none&quot;</span> <span style="color:#ff0000;">attributes</span>=<span style="color:#0000ff;">&quot;no&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:p&quot;</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">ignore</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:rdg&quot;</span><span style="color:#0000ff;">/&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">text</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;tei:rdg&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">index</span><span style="color:#0000ff;">&gt;</span>

<span style="color:#008000;">&lt;!-- to be added to collection.xconf file of target collection --&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">triggers</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">trigger</span> <span style="color:#ff0000;">event</span>=<span style="color:#0000ff;">&quot;store,update&quot;</span> <span style="color:#ff0000;">class</span>=<span style="color:#0000ff;">&quot;org.exist.collections.triggers.XQueryTrigger&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">parameter</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;url&quot;</span> <span style="color:#ff0000;">value</span>=<span style="color:#0000ff;">&quot;xmldb:exist://localhost/db/test/triggers/splitRdgs.xql&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">trigger</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">triggers</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">collection</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This tells eXist to run the <em>/db/test/triggers/splitRdgs.xql</em> XQuery script for all documents that are added to or updated in the collection <em>/db/test</em>. Since this includes the subcollection <em>/db/test/triggers</em> that will hold the scripts needed for executing this trigger, it’s safest to prevent them from activating this trigger when they are stored or updated themselves. This can be done by adding an empty <em>collection.xconf</em> file in the subcollection <em>/db/system/conf/db/test/triggers</em>:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">collection</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://exist-db.org/collection-config/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">index</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">tei</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span><span style="color:#0000ff;">/&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">collection</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Now, let’s have a look at the XQuery script at <em>/db/test/triggers/splitRdgs.xql </em>(note: syntax is based on the trigger implementation in eXist-1.4.x, and subject to change after the rework of this implementation in the current development version):</p>
<pre><font color="#0000ff" face="Courier New">declare namespace  xmldb=&quot;http://exist-db.org/xquery/xmldb&quot;;
declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;

declare variable $local:triggerEvent external;
declare variable $local:eventType external;
declare variable $local:collectionName external;
declare variable $local:documentName external;
declare variable $local:document external;
declare variable $local:triggersLogFile := &quot;triggersLog.xml&quot;;

(: create the log file if it does not exist :)
let $logfile :=
  if(not(doc-available($local:triggersLogFile)))then
	xmldb:store(&quot;/db&quot;, $local:triggersLogFile, &lt;triggers/&gt;)
  else()
let $doc := doc($local:documentName)
return
if ($local:eventType eq 'finish' and $doc//tei:rdg) then
  let $xsl := doc(concat($local:collectionName, '/triggers/splitRdgs.xsl'))
  let $rdgs := $doc//tei:witness/@xml:id
  for $rdg in $rdgs
  let $params :=
    &lt;parameters&gt;
      &lt;param name=&quot;t&quot; value=&quot;#{$rdg}&quot;/&gt;
    &lt;/parameters&gt;
  let $transformDoc := transform:transform($doc, $xsl, $params)
  let $docName := replace(tokenize($local:documentName, '/')[last()],
                          '(.+)(\.[^.]+)$', concat('$1', '_', $rdg, '$2'))
  let $storeDoc :=  xmldb:store($local:collectionName, $docName, $transformDoc)
  return update
    insert
      &lt;trigger event=&quot;{$local:triggerEvent}&quot; eventType=&quot;{$local:eventType}&quot;
               collectionName=&quot;{$local:collectionName}&quot; documentName=&quot;{$local:documentName}&quot;
               timestamp=&quot;{current-dateTime()}&quot;&gt;{$rdg}&lt;/trigger&gt;
    into doc(&quot;/db/triggersLog.xml&quot;)/triggers
else ()</font></pre>
<p>Basically, what this script does, is detect if the document being stored or updated contains any &lt;rdg&gt; elements. If so, it applies a splitting XSLT stylesheet per identified text version, and stores the results of that transformation as documents whose name is based on the ‘multiversion’ filename, suffixed with an underscore and the sigil of that version. The XSLT stylesheet referred to points towards <em>/db/test/triggers/splitRdgs.xsl</em> and could look as follows:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">xsl</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span>
    <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">xs</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/2001/XMLSchema&quot;</span>
    <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">tei</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span>
    <span style="color:#ff0000;">exclude</span>-<span style="color:#ff0000;">result</span>-<span style="color:#ff0000;">prefixes</span>=<span style="color:#0000ff;">&quot;#all&quot;</span>
    <span style="color:#ff0000;">version</span>=<span style="color:#0000ff;">&quot;2.0&quot;</span><span style="color:#0000ff;">&gt;</span>

    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">param</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;t&quot;</span><span style="color:#0000ff;">/&gt;</span>

    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;tei:app&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;*[tokenize(@wit, '\s+') = $t]/node()&quot;</span> <span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>                

    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This effectively replaces all &lt;app&gt; elements with the content of the relevant text version, and copies all other content literally.</p>
<p>To summarise, this trigger setup will store the original document, as well as a separate text per text version, with following characteristics: </p>
<ul>
<li>
<p>document name: original document name suffixed with &#8216;_&#8217; + sigil</p>
</li>
<li>
<p>all &lt;app&gt; elements are removed; only content of &lt;rdg&gt; elements relevant to that version is preserved</p>
</li>
</ul>
<h4>2.2 Search script</h4>
<p>The separation of all text versions in separate texts greatly alleviates the complexity of the search script. The only place where versions should be taken into account, is the determination of the file names to be included in the search:</p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
let $rdgs := for $a in doc('/db/test/test.xml')//tei:listWit//tei:witness/@xml:id return concat('#', $a)
let $docs := for $rdg in $rdgs return doc(concat('/db/test/test', replace($rdg, '#', '_'), '.xml'))
let $pool := $docs//tei:p
let $query := 'text'
for $hit in $pool[ft:query(., $query)]
return &lt;result wit=&quot;{$hit/replace(substring-after(util:document-name(.), '_'), '\.[^.]+$', '')}&quot;&gt;{
  $hit
}&lt;/result&gt;</font></pre>
<p>Note how the version sigla are used to determine the document name of the documents to be included in the search. Like its ‘multiversion’ counterpart discussed in section 1.2 above, this will return all search hits while identifying the text version in which they occur in a @wit attribute on the &lt;result&gt; element.</p>
<h4></h4>
<h4>2.3 Index lookup script</h4>
<p>Likewise, the index lookup script is much simplified, in that only a single call to util:index-keys() needs to be made, irrespective of the number of text versions selected: </p>
<pre><font color="#0000ff" face="Courier New">declare namespace tei=&quot;http://www.tei-c.org/ns/1.0&quot;;
declare function local:term-callback($term, $data) {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot; n=&quot;{$data[3]}&quot;&gt;{$term}&lt;/term&gt;
};

let $callback := util:function(xs:QName('local:term-callback'), 2)
let $rdgs := for $a in doc('/db/test/test.xml')//tei:listWit//tei:witness/@xml:id return concat('#', $a)
let $docs := for $rdg in $rdgs return doc(concat('/db/test/test', replace($rdg, '#', '_'), '.xml'))
let $pool := $docs//tei:p
let $query := 'text'
let $nodes :=
  for $a in $pool
  (: here nodes can be refined first by querying, if required :)
  return $a(:[ft:query(., $query)]:)
for $term in util:index-keys($nodes, '', $callback, 15000, 'lucene-index')
(:order by $term/@freq/number() descending:)
return $term</font></pre>
<p>Note how no mention needs to be made to any &lt;rdg&gt; elements, since those have been filtered out at the indexing stage.</p>
<h3>3. Evaluation</h3>
<p>This exercise started from the theoretical desire to be able to search ‘multiversion’ TEI source texts out of the box. eXist’s indexing implementation allows for index definitions flexible enough to construct XQuery scripts that neatly cut their way through the different versions encoded within the single source text. While searching is quite performant, there’s an important performance bottleneck when it comes to index lookup, whose performance is heavily dependent on the number of text versions and the size of their (selected) node sets to be searched. Although there is room for optimisation, this approach clearly has its limits and could hardly be defendable in a maximal scenario, where a complete index scan is requested for all versions of a real-life text (say, to produce a frequency list). Yet, it could be considered an option for more limited scenarios:</p>
<ul>
<li>when the scope of the index lookup is restricted by requiring one or more start letters, this greatly improves performance </li>
<li>when statistics are not important (in –say- an autocomplete scenario), the initial index lookup script performs quite well </li>
</ul>
<p>On the other hand, splitting up a ‘multiversion’ text into its constituent text versions at first sight looked unwieldy from a maintenance perspective. Yet, eXist’s trigger implementation relieves this burden. It is beyond doubt that this approach greatly alleviates the complexity of search and index lookup scripts, and has superior performance. After all, the util:index-keys() function is there precisely for efficiently collecting index statistics; generating partial lists of index terms and conflating them via XQuery, while feasible, has its price performance-wise.</p>
<p><strong>Download: </strong><a href="http://ctb.kantl.be/download/appSearch_db.zip" target="_blank">appSearch_db.zip</a>, an eXist backup file containing all files discussed here. Just download and <a href="http://exist-db.org/backup.html" target="_blank">restore</a> this backup file in your eXist database.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/132/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=132&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2011/04/20/venturing-into-versions-strategies-for-querying-a-tei-apparatus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>XQuery Unit testing in eXist-1.4</title>
		<link>http://rvdb.wordpress.com/2010/12/01/xquery-unit-testing-in-exist-1-4/</link>
		<comments>http://rvdb.wordpress.com/2010/12/01/xquery-unit-testing-in-exist-1-4/#comments</comments>
		<pubDate>Wed, 01 Dec 2010 12:06:48 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/2010/12/01/xquery-unit-testing-in-exist-1-4/</guid>
		<description><![CDATA[[UPDATE 2011-01-19]: As of revisions 13587 and 13589, the XQuery Unit Testing framework has been ported back from eXist-trunk to the eXist-1.4.x branch. While obsolescing the need for the XSLT stylesheet presented in this blog post, I&#8217;ll leave the latter here for the sake of documentation. eXist users who want to test XQueries in eXist-1.4 now [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=115&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>[<strong>UPDATE 2011-01-19]</strong>: As of revisions 13587 and 13589, the XQuery Unit Testing framework has been ported back from eXist-trunk to the eXist-1.4.x branch. While obsolescing the need for the XSLT stylesheet presented in this blog post, I&#8217;ll leave the latter here for the sake of documentation. eXist users who want to test XQueries in eXist-1.4 now are encouraged to use its built-in XQuery Unit Testing framework instead.</p>
<p>[<strong>UPDATE 2011-01-05]</strong>: The XSLT stylesheet has been extended with missing features:</p>
<ul>
<li>[feature]: added @trace handling</li>
<li>[feature]: added &lt;xpath&gt; handling</li>
<li>[feature]: added &lt;store-files&gt; handling</li>
<li>[feature]: added context handling for util:eval()</li>
<li>[fix]: &lt;![CDATA[ ]]&gt; in output: spaces required&#8230;</li>
</ul>
<p>[<strong>UPDATE 2010-12-09]</strong>: The XSLT stylesheet has been substantially reworked, to produce</p>
<ul>
<li>more legible XQuery code</li>
<li>more reliable XQuery code, taking into account serialization options, and deriving the most sensible highlight-matches settings where necessary</li>
</ul>
<p>Currently, I’m heavily porting old XQuery code to the latest version of the eXist XML database’s new Lucene FT index and search capabilities. In doing so, I’m hitting a couple of bugs in this area, that I’m trying to isolate, test and report as clearly as possible. This post discusses a means to use the same test files for both eXist-1.4 and eXist-trunk.</p>
<p><span id="more-115"></span></p>
<p>The addition of the <a href="http://demo.exist-db.org/testing/testing.xml#N10138" target="_blank">XQuery Unit testing framework</a> in eXist-trunk code has proven to be a tremendously great help in developing test cases. On the other hand, at the time of writing this post, this functionality was not yet available in eXist-1.4, making it cumbersome to (manually) set up a testing environment, executing different tests and compare their results with expected results. Still, I wanted to be able to compare the same test queries in both versions, and see where differences are (and in my case, help deciding on which version to continue with, as my development schedule is tightening up).</p>
<p>In order to do so, I’ve created an XSLT stylesheet <a href="http://www.kantl.be/ctb/download/UnitTest2XQuery.xsl" target="_blank">UnitTest2XQuery.xsl</a> that converts XQuery Unit test XML files to XQuery code which is natively executable in eXist-1.4. It should generate identical reports to those of the XQuery Unit testing framework in eXist-trunk, making it easy to quickly compare results of both eXist versions for identical queries.</p>
<p>I’m not going into detail on the stylesheet here (nothing fancy, really), but will add some remarks on how to use it. Ideally, this stylesheet could be stored in the db (say, as ‘/db/UnitTest2XQuery.xsl’) and put to work as follows:</p>
<pre>let $test :=
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">TestSet</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">testName</span><span style="color:#0000ff;">&gt;</span>really simple test<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">testName</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">description</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>just a simple test, really<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">author</span><span style="color:#0000ff;">&gt;</span>Ron Van den Branden<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">author</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">description</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">setup</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">create</span>-<span style="color:#ff0000;">collection</span> <span style="color:#ff0000;">parent</span>=<span style="color:#0000ff;">"/db"</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">"coll"</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">store</span> <span style="color:#ff0000;">collection</span>=<span style="color:#0000ff;">"/db/coll"</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">"test.xml"</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">p</span> <span style="color:#ff0000;">att1</span>=<span style="color:#0000ff;">"val1"</span> <span style="color:#ff0000;">att2</span>=<span style="color:#0000ff;">"val2"</span><span style="color:#0000ff;">&gt;</span>this is a test document<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">p</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">store</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">setup</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">functions</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span>![CDATA[
        declare function local:echo($node) {
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">echo</span><span style="color:#0000ff;">&gt;</span>{$node/node()}<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">echo</span><span style="color:#0000ff;">&gt;</span>
        };
        ]]&gt;<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">functions</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">tearDown</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">remove</span>-<span style="color:#ff0000;">collection</span> <span style="color:#ff0000;">collection</span>=<span style="color:#0000ff;">"/db/coll"</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">tearDown</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">test</span> <span style="color:#ff0000;">output</span>=<span style="color:#0000ff;">"text"</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">task</span><span style="color:#0000ff;">&gt;</span>really simple test, text<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">task</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">code</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span>![CDATA[
            local:echo(collection('/db/coll')//p)/string()
            ]]<span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">code</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">expected</span><span style="color:#0000ff;">&gt;</span>this is a test document<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">expected</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">test</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">test</span> <span style="color:#ff0000;">output</span>=<span style="color:#0000ff;">"xml"</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">task</span><span style="color:#0000ff;">&gt;</span>really simple test, xml<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">task</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">code</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span>![CDATA[
            local:echo(collection('/db/coll')//p)
            ]]<span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">code</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">expected</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">echo</span><span style="color:#0000ff;">&gt;</span>this is a test document<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">echo</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">expected</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">test</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">TestSet</span><span style="color:#0000ff;">&gt;</span>

return util:eval(transform:transform($test, doc('/db/UnitTest2XQuery.xsl'), ()))</pre>
<p>Unfortunately, eXist’s transform:transform() function does ugly things to text that has been processed with disable-output-escaping. Instead of properly escaped text, the unescaped text is copied, wrapped in processing instructions:</p>
<pre><span style="color:#0000ff;">&lt;?</span>javax.xml.transform.disable-output-escaping<span style="color:#0000ff;">?&gt;</span>

        declare function local:echo($node) {
        &amp;lt;echo&amp;gt;{$node/node()}&amp;lt;/echo&amp;gt;
        };

<span style="color:#0000ff;">&lt;?</span>javax.xml.transform.enable-output-escaping<span style="color:#0000ff;">?&gt;</span></pre>
<p>Clearly, this can’t be interpreted as XQuery. Hence, the best way to use the stylesheet is by batch-transforming XQuery Unit test files with an XSLT processor, and then execute the resulting XQuery code with eXist.</p>
<h2>Conclusion</h2>
<p>I guess the most important conclusion is: grab the <a href="http://www.kantl.be/ctb/download/UnitTest2XQuery.xsl" target="_blank">XQueryUnit2XQuery.xsl</a> stylesheet and see if it helps you in developing XQuery code for eXist!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/115/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=115&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2010/12/01/xquery-unit-testing-in-exist-1-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>As a matter of fac(e)t: (mimicking) faceted searching in eXist</title>
		<link>http://rvdb.wordpress.com/2010/10/06/mimicking-faceted-searching-in-exist/</link>
		<comments>http://rvdb.wordpress.com/2010/10/06/mimicking-faceted-searching-in-exist/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 12:03:49 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[faceted search]]></category>
		<category><![CDATA[XQuery]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/2010/10/06/mimicking-faceted-searching-in-exist/</guid>
		<description><![CDATA[In hindsight, since I set out developing search interfaces for XML text collections with the marvelous eXist XML database, I’ve been drawn to the concept of faceted search, even long before I knew it was called that way. The recent integration of Lucene indexing and searching capabilities into eXist (since version 1.4) holds promises for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=99&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In hindsight, since I set out developing search interfaces for XML text collections with the marvelous <a href="http://exist-db.org" target="_blank">eXist XML database</a>, I’ve been drawn to the concept of <a href="http://en.wikipedia.org/wiki/Faceted_search" target="_blank">faceted search</a>, even long before I knew it was called that way. The recent integration of Lucene indexing and searching capabilities into eXist (since version 1.4) holds promises for efficient facet-oriented search features such as integrating Lucene fields in search queries. </p>
<p><span id="more-99"></span>
<p>Although these features are not yet implemented, exchanges on the eXist-open mailing list suggest:</p>
<ul>
<li>
<div align="left">their implementation is not that far off (<a title="http://markmail.org/message/yee3q4a6x6lhrwph" href="http://markmail.org/message/yee3q4a6x6lhrwph" target="_blank">http://markmail.org/message/yee3q4a6x6lhrwph</a>)</div>
</li>
<li>
<div>and how they could be used for implementing faceted searching (<a title="http://markmail.org/message/r56zbcfb3m5p64xz" href="http://markmail.org/message/r56zbcfb3m5p64xz" target="_blank">http://markmail.org/message/r56zbcfb3m5p64xz</a>).</div>
</li>
</ul>
<p>(Currently) eXist lacks optimised features for performant faceted searching (like e.g. the Lucene-based <a href="http://lucene.apache.org/solr/" target="_blank">solr</a> search engine). Yet, some index-based features of eXist can be used to add facets to XQuery search scripts already. Key to this approach is the <a href="http://demo.exist-db.org/exist/functions/util/index-keys" target="_blank">util:index-keys()</a> function, which returns indexed terms with some distribution statistics. Note, that the discussion in this post doesn’t claim any originality: the basic idea is already implemented in the eXist <a href="http://demo.exist-db.org/biblio/" target="_blank">bibliographic demo</a> web application, to which this discussion adds a small mechanism for defining and looping over multiple facets in an XQuery script.</p>
<p>I’ll illustrate this approach with an example based on the Shakespeare example XML example files that are shipped with eXist. This assumes that a) eXist-1.4 is installed, and b) the eXist examples have been set up with the <a href="http://localhost:8080/exist/admin/admin.xql?panel=setup" target="_blank">admin web application</a>. First, let’s add some more index definitions to the index configuration file that is stored in the database at ‘/db/system/config/db/shakespeare/collection.xconf’. Add two more <a href="http://www.exist-db.org/indexing.html#rangeidx" target="_blank">range index</a> definitions for the elements named ‘SPEAKER’, and ‘TITLE’, so the index configuration looks as follows:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">collection</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://exist-db.org/collection-config/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">index</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">fulltext</span> <span style="color:#ff0000;">default</span>=<span style="color:#0000ff;">&quot;none&quot;</span> <span style="color:#ff0000;">attributes</span>=<span style="color:#0000ff;">&quot;no&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;SPEECH&quot;</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">ignore</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;SPEAKER&quot;</span><span style="color:#0000ff;">/&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">text</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">text</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;TITLE&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">lucene</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">ngram</span> <span style="color:#ff0000;">qname</span>=<span style="color:#0000ff;">&quot;SPEAKER&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#008000;">&lt;!-- range indexes --&gt;</span>
        <span style="color:#008000;">&lt;!-- note: although path-based range indexes are liable
             to get deprecated (in favour of qname based ones),
             following range indexes are path-based in order to
             avoid a bug with util:index-keys() in eXist trunk
        --&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">create</span> <span style="color:#ff0000;">path</span>=<span style="color:#0000ff;">&quot;//SPEAKER&quot;</span> <span style="color:#ff0000;">type</span>=<span style="color:#0000ff;">&quot;xs:string&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">create</span> <span style="color:#ff0000;">path</span>=<span style="color:#0000ff;">&quot;//TITLE&quot;</span> <span style="color:#ff0000;">type</span>=<span style="color:#0000ff;">&quot;xs:string&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">index</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">collection</span><span style="color:#0000ff;">&gt;</span></pre>
<p>In order to actually put these index definitions to use, the ‘/db/shakespeare’ collection needs to be reindexed with the <a href="http://www.exist-db.org/client.html" target="_blank">Java admin client</a>.</p>
<h3>A simple search</h3>
<p>First, let’s issue a simple search on the Shakespeare plays:</p>
<pre><font color="#0000ff" face="Courier New">let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
return $hits</font></pre>
<p>This will perform a straightforward search on the &lt;SPEECH&gt; nodes that contain the string ‘lord’. The ft:query() function operates on the Lucene full text index, which in this case was defined on all &lt;SPEECH&gt; elements. As the content of the &lt;SPEAKER&gt; elements has been ignored from the full text index (see the index definition above), this search will return 272 &lt;SPEECH&gt; elements whose &lt;LINE&gt; children contain the string ‘lord’:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>HAMLET<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>Not so, my lord; I am too much i' the sun.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>HORATIO<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>The same, my lord, and your poor servant ever.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>MARCELLUS<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>My good lord--<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#008000;">&lt;!-- ... --&gt;</span></pre>
<h3>A simple facet with standard XQuery functions</h3>
<p>This simple search can return more than just the speeches holding the word ‘lord’. Suppose that -besides the actual speeches- we are interested in the different dramatic characters that utter these speeches. We could thus add ‘speakers’ as a facet to the search. Speakers are encoded in a &lt;SPEAKER&gt; element per &lt;SPEECH&gt;. In the context of the previous XQuery snippet, they are easily accessible as $hits//SPEAKER.</p>
<p>Before implementing this in XQuery code, let’s consider what we’re after at this moment:</p>
<ul>
<li>all speeches containing the word ‘lord’ </li>
<li>all different speakers uttering those speeches </li>
</ul>
<p>This is easy enough with XQuery’s <a href="http://www.w3.org/TR/xpath-functions/#func-distinct-values" target="_blank">distinct-values()</a> function:</p>
<pre><font color="#0000ff" face="Courier New">let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
let $speakers :=
  for $a in distinct-values($hits/SPEAKER)
  return &lt;speaker&gt;{$a}&lt;/speaker&gt;
return &lt;results&gt;{
  &lt;speakers&gt;{$speakers}&lt;/speakers&gt;,
  &lt;hits&gt;{$hits}&lt;/hits&gt;
}&lt;/results&gt;</font></pre>
<p>These results could then be presented to the user in an intelligible way. Typically, the facets (‘speakers’, in this case) will be offered as a kind of search refinement alongside the actual search results. However, often such facets are accompanied by an indication of the number of search results that correspond to that additional search filter. This information can be computed with standard XQuery functions as well, simply by counting how many times each distinct value occurs for a facet within the search results:</p>
<pre><font color="#0000ff" face="Courier New">let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
let $speakers :=
  for $a in distinct-values($hits/SPEAKER)
  return &lt;speaker freq=&quot;{count($hits/SPEAKER[. eq $a])}&quot;&gt;{$a}&lt;/speaker&gt;
return &lt;results&gt;{
  &lt;speakers&gt;{$speakers}&lt;/speakers&gt;,
  &lt;hits&gt;{$hits}&lt;/hits&gt;
}&lt;/results&gt;</font></pre>
<p>As the example illustrates, the construction of the $speakers facet now adds a @freq attribute that counts how many times each unique speaker name occurs within the set of speakers that utter the word ‘lord’:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speakers</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speaker</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;10&quot;</span><span style="color:#0000ff;">&gt;</span>LAERTES<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speaker</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speaker</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;30&quot;</span><span style="color:#0000ff;">&gt;</span>LORD POLONIUS<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speaker</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speaker</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;13&quot;</span><span style="color:#0000ff;">&gt;</span>HAMLET<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speaker</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speaker</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;48&quot;</span><span style="color:#0000ff;">&gt;</span>HORATIO<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speaker</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speakers</span><span style="color:#0000ff;">&gt;</span></pre>
<p>However, this approach soon hits its limits: the performance of such queries directly correlates to a) the size of the search space (collection size and search hits), and b) the number of facets. The standard XQuery functions (‘distinct-values()’) and operators (‘eq’) don’t operate on eXist indexes and hence quickly suffer from degrading performance when complexity increases. Therefore, in the next section, an eXist-specific alternative will be proposed.</p>
<h3>A simple facet with eXist-specific functions</h3>
<p>The eXist XML database tries to optimise query performance, both by giving standard XQuery functions and operators access to its efficient indexes, and defining eXist-specific functions for querying those indexes. One such function is <a href="http://demo.exist-db.org/exist/functions/util/index-keys" target="_blank">util:index-keys($node, $start-value, $function-reference, $max-number-returned)</a>, which looks up the different indexed values for a given node set ($node) that start with a specific start value ($start-value), delegates the processing of this index information to a dedicated function ($function-reference), and returns a maximal number of index entries ($max-number-returned).</p>
<p>The information returned to the helper function consists of:</p>
<ul>
<li>the index key </li>
<li>the overall frequency of that index key within the specified node set ($node) </li>
<li>the number of documents in the specified node set ($node) in which the key occurs </li>
<li>the cardinal number of the index key in the total number of keys returned </li>
</ul>
<p>Let’s illustrate this by reformulating the previous example. Note how in the facet generating part of the XQuery script the standard distinct-values() function is replaced with the eXist-specific util:index-keys() function.</p>
<pre><font color="#0000ff" face="Courier New">declare function local:term-callback($term as xs:string, $data as xs:int+) as element() {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot; n=&quot;{$data[3]}&quot;&gt;{$term}&lt;/term&gt;
}; </font></pre>
<pre><font color="#0000ff" face="Courier New">let $callback := util:function(xs:QName(&quot;local:term-callback&quot;), 2)
let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
let $speakers := util:index-keys($hits, '', $callback, 10000)
return &lt;results&gt;{
  &lt;speakers&gt;{$speakers}&lt;/speakers&gt;,
  &lt;hits&gt;{$hits}&lt;/hits&gt;
}&lt;/results&gt;</font></pre>
<p>This produces the same list of speakers (only in alphabetical order by default), but this time they come enriched with basic statistical index information, e.g.: </p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">speakers</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#008000;">&lt;!-- ... --&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;7&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;18&quot;</span><span style="color:#0000ff;">&gt;</span>LADY MACBETH<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;10&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;19&quot;</span><span style="color:#0000ff;">&gt;</span>LAERTES<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;5&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;20&quot;</span><span style="color:#0000ff;">&gt;</span>LENNOX<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;30&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;21&quot;</span><span style="color:#0000ff;">&gt;</span>LORD POLONIUS<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#008000;">&lt;!-- ... --&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">speakers</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This call to the util:index-keys() function works directly on eXist indexes, and avoids expensive comparison and counting of the speakers for all unique speaker names. Hence, (when operating on range indexes – cf. infra) the util:index-keys() function can function as a ‘power’ version of the standard distinct-values() function.</p>
<h3>There’s more facets to a gem</h3>
<p>Now, let’s illustrate how the example query can be complicated by adding more facets. Suppose that besides the speakers, we are interested in the sections in which the search hits occur. In the Shakespeare documents, the distinct text divisions each have their own title:</p>
<ul>
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="67">play</td>
<td valign="top" width="333">/PLAY/TITLE </td>
</tr>
<tr>
<td valign="top" width="67">act</td>
<td valign="top" width="333">/PLAY/ACT/TITLE </td>
</tr>
<tr>
<td valign="top" width="67">scene</td>
<td valign="top" width="333">/PLAY/ACT/SCENE/TITLE </td>
</tr>
</tbody>
</table>
</ul>
<p>Remember how earlier we defined a range index on the &lt;TITLE&gt; element? This can be put to use for all of these facets. In order to keep the code minimal and easily extensible to other facets, facets can be defined as a set of XML elements:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facets</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;scenes&quot;</span><span style="color:#0000ff;">&gt;</span>$hits/ancestor::SCENE/TITLE<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;acts&quot;</span><span style="color:#0000ff;">&gt;</span>$hits/ancestor::ACT/TITLE<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;plays&quot;</span><span style="color:#0000ff;">&gt;</span>$hits/ancestor::PLAY/TITLE<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;speakers&quot;</span><span style="color:#0000ff;">&gt;</span>$hits/SPEAKER<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facets</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Each facet definition then consists of two pieces of information:</p>
<ul>
<li>facet path: an XPath expressions for the facet, relative to the search hits (the text content of each &lt;facet&gt; element)</li>
<li>facet label: a string label to be used for identifying the different facets in the search results (the value for the @label attribute for each &lt;facet&gt; element)</li>
</ul>
<p>The facet generation then loops over the defined facets, evaluates their XPath expressions using the eXist-specific <a href="http://demo.exist-db.org/exist/functions/util/eval" target="_blank">util:eval()</a> function, looks up the corresponding facet label , and generates the list of index entries for the facet:</p>
<pre><font color="#0000ff" face="Courier New">declare function local:term-callback($term as xs:string, $data as xs:int+) as element() {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot; n=&quot;{$data[3]}&quot;&gt;{$term}&lt;/term&gt;
};</font></pre>
<pre><font color="#0000ff" face="Courier New">let $callback := util:function(xs:QName(&quot;local:term-callback&quot;), 2)
let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
(: declare facets as XPath expressions, relative to the search hits :)
let $facets :=
  &lt;facets&gt;
    &lt;facet label=&quot;scenes&quot;&gt;$hits/ancestor::SCENE/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;acts&quot;&gt;$hits/ancestor::ACT/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;plays&quot;&gt;$hits/ancestor::PLAY/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;speakers&quot;&gt;$hits/SPEAKER&lt;/facet&gt;
  &lt;/facets&gt;
return &lt;results&gt;
  &lt;facets&gt;{
    (: loop over facet XPaths, and evaluate them :)
    for $facet in $facets//facet
    let $vals := util:eval($facet)
    return &lt;facet&gt;{
      $facet/@label,
      util:index-keys($vals, '', $callback, 5)
    }&lt;/facet&gt;
  }&lt;/facets&gt;
  &lt;hits&gt;{$hits}&lt;/hits&gt;
&lt;/results&gt;</font></pre>
<p align="left"><font color="#0000ff" face="Courier New"></font></p>
<p align="left">The result looks as follows: </p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">results</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facets</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;scenes&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>SCENE I.  A cavern. In the middle, a boiling cauldron.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>SCENE I.  A churchyard.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>SCENE I.  A room in POLONIUS' house.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>SCENE I.  A room in the castle.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;5&quot;</span><span style="color:#0000ff;">&gt;</span>SCENE I.  Dunsinane. Ante-room in the castle.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;acts&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>ACT I<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>ACT II<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>ACT III<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>ACT IV<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;5&quot;</span><span style="color:#0000ff;">&gt;</span>ACT V<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;plays&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>The Tragedy of Hamlet, Prince of Denmark<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>The Tragedy of Macbeth<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>The Tragedy of Romeo and Juliet<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">facet</span> <span style="color:#ff0000;">label</span>=<span style="color:#0000ff;">&quot;speakers&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>ATTENDANT<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>BALTHASAR<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;4&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>BANQUO<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>BERNARDO<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;5&quot;</span><span style="color:#0000ff;">&gt;</span>Both Murderers<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facet</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">facets</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">hits</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>LAERTES<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEAKER</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>My dread lord,<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>Your leave and favour to return to France;<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>From whence though willingly I came to Denmark,<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>To show my duty in your coronation,<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>Yet now, I must confess, that duty done,<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>My thoughts and wishes bend again toward France<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>And bow them to your gracious leave and pardon.<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">LINE</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">SPEECH</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#008000;">&lt;!-- ... --&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">hits</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">results</span><span style="color:#0000ff;">&gt;</span></pre>
<h3>Full text facets</h3>
<p>Besides the values of the headings of the distinct text divisions that contain the search hits, it might be interesting to add yet another facet, listing the separate words of the speeches containing the search results. This will require a slightly different approach than for the facets already present. One thing of notice in the previous search results, is the fact that the index entries returned consist of whole phrases: ‘SCENE I. A cavern. In the middle, a boiling cauldron.’, ‘ACT I’. This is due to the way the nodes in those facets have been indexed. So far, the facets are constructed on the <em>range indexes </em>that have been defined on the &lt;TITLE&gt; and &lt;SPEAKER&gt; elements. Suppose we defined a range index on the &lt;LINE&gt; elements as well and scanned this with the util:index-keys() function, this would retrieve the entire contents of those lines, instead of the individual words. Which is clearly not very helpful for refining the search in a search interface.</p>
<p>This is where the <em>Lucene full text index </em>comes into play, which tokenises the text of the nodes for which it is defined, and indexes those tokens. The collection.xconf file above defines a Lucene full text index on the &lt;SPEECH&gt; nodes of the Shakespare documents (while ignoring the &lt;SPEAKER&gt; elements). In practice, this means that the index is populated with all tokenised words of the &lt;LINE&gt; elements that make up each &lt;SPEECH&gt;. In order to scan this index with util:index-keys(), we’ll have to be more specific about the index to be used. That is possible in an optional 5th parameter for that function, naming the index. In the case of the &lt;SPEECH&gt; elements, the keys of their full text index can be retrieved as follows:</p>
<pre><font color="#0000ff" face="Courier New">util:index-keys(//SPEECH, '', $callback, 5, 'lucene-index')</font></pre>
<p>Combined with the callback function defined elsewhere in the XQuery, this function call would then produce a list of index entries similar to this fragment: </p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;2&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>abate<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;2&quot;</span><span style="color:#0000ff;">&gt;</span>abatements<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;3&quot;</span><span style="color:#0000ff;">&gt;</span>abbey<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;3&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;4&quot;</span><span style="color:#0000ff;">&gt;</span>abhorred<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">freq</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">docs</span>=<span style="color:#0000ff;">&quot;1&quot;</span> <span style="color:#ff0000;">n</span>=<span style="color:#0000ff;">&quot;5&quot;</span><span style="color:#0000ff;">&gt;</span>abhors<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Note that when no index is specified for the util:index-keys() function, it seems to default to the range index. Only when other indexes are scanned, those have to be named in the last argument to util:index-keys(). (On the other hand, there seems no way to explicitly state the range index in the 5th parameter; ‘range-index’ won’t be recognised.)</p>
<p>Wrapping up, this full text facet can be integrated in the sample XQuery, with the provision that different versions of util:index-keys() must be used depending on the index on which the facet operates. In the following example, this is catered for by adding a @type attribute with value “FT” to the full text search facets, and using this information to determine which index util:index-keys() has to be applied to:</p>
<pre><font color="#0000ff" face="Courier New">declare function local:term-callback($term as xs:string, $data as xs:int+) as element() {
  &lt;term freq=&quot;{$data[1]}&quot; docs=&quot;{$data[2]}&quot; n=&quot;{$data[3]}&quot;&gt;{$term}&lt;/term&gt;
}; </font></pre>
<pre><font color="#0000ff" face="Courier New">let $callback := util:function(xs:QName(&quot;local:term-callback&quot;), 2)
let $coll := collection('/db/shakespeare')
let $hits := $coll//SPEECH[ft:query(., 'lord')]
(: declare facets as XPath expressions, relative to the search hits :)
let $facets :=
  &lt;facets&gt;
    &lt;facet label=&quot;scenes&quot;&gt;$hits/ancestor::SCENE/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;acts&quot;&gt;$hits/ancestor::ACT/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;plays&quot;&gt;$hits/ancestor::PLAY/TITLE&lt;/facet&gt;
    &lt;facet label=&quot;speakers&quot;&gt;$hits/SPEAKER&lt;/facet&gt;
    &lt;facet label=&quot;fulltext&quot; type=&quot;FT&quot;&gt;$hits&lt;/facet&gt;
  &lt;/facets&gt;
return &lt;results&gt;
  &lt;facets&gt;{
    (: loop over facet XPaths, and evaluate them :)
    for $facet in $facets//facet
    let $vals := util:eval($facet)
    let $index := if ($facet[@type eq 'FT']) then 'lucene-index' else ()
    return &lt;facet&gt;{
      $facet/@label,
      if ($index) then
        util:index-keys($vals, '', $callback, 5, $index)
      else
        util:index-keys($vals, '', $callback, 5)
    }&lt;/facet&gt;
  }&lt;/facets&gt;
  &lt;hits&gt;{$hits}&lt;/hits&gt;
&lt;/results&gt;</font></pre>
<p>Although at this stage, this XQuery script only provides a basic proof of concept, it illustrates how facets for certain information categories could be generated alongside search results. Of course, in order to reduce processing overhead, the $max-number-returned parameter of util:index-keys() can be reduced (as in these examples), so that only a limited number of facet entries are returned in a first phase. An option could then be offered to see more (or all) entries for these individual facets in a second step.</p>
<h3>Conclusion</h3>
<p>Although future integration of efficient Lucene search capabilities into eXist probably holds more promises for efficient faceted search implementations with eXist, some eXist-specific features (different indexes, util:index-keys()) provide a way to add search facets to query results already. Although the performance of the approach described in this post will probably decrease with very large data collections, the use of the util:index-keys() function seems to offer a reasonably stable option (performance-wise) for constructing search facets alongside a search.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/99/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=99&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2010/10/06/mimicking-faceted-searching-in-exist/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>Full text queries in eXist: from Lucene to XML syntax</title>
		<link>http://rvdb.wordpress.com/2010/08/04/exist-lucene-to-xml-syntax/</link>
		<comments>http://rvdb.wordpress.com/2010/08/04/exist-lucene-to-xml-syntax/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 23:28:38 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[eXistdb]]></category>
		<category><![CDATA[XQuery]]></category>

		<guid isPermaLink="false">https://rvdb.wordpress.com/?p=73</guid>
		<description><![CDATA[[UPDATE 2011-08-09]: The lucene2xml scripts have been modified: [feature]: added a couple of further conditions in $lucene2xml, in order to benefit from unified &#60;exist:match&#62; markers for adjacent phrase terms: differentiate between phrase search: rewrite &#60;near slop=&#34;&#60;1&#34;&#62; to &#60;phrase&#62; proximity search: copy &#60;near slop=&#34;&#62;=1&#34;&#62; [fix]: improved treatment of escaped parentheses inside proximity search expressions Since version [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=73&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>[<strong>UPDATE 2011-08-09]</strong>: The lucene2xml scripts have been modified:</p>
<ul>
<li>[feature]: added a couple of further conditions in $lucene2xml, in order to benefit from unified &lt;exist:match&gt; markers for adjacent phrase terms: differentiate between
<ul>
<li>phrase search: rewrite &lt;near slop=&quot;&lt;1&quot;&gt; to &lt;phrase&gt; </li>
<li>proximity search: copy &lt;near slop=&quot;&gt;=1&quot;&gt; </li>
</ul>
</li>
<li>[fix]: improved treatment of escaped parentheses inside proximity search expressions </li>
</ul>
<p>Since version 1.4, the <a href="http://www.exist-db.org/" target="_blank">eXist</a> native XML database implements a Lucene-based full text index. The main Lucene-aware search function, <a href="http://demo.exist-db.org/exist/functions/lucene/query" target="_blank">ft:query()</a> accepts queries expressed in two flavours:</p>
<ul>
<li>Lucene&#8217;s default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a> </li>
<li>eXist’s <a href="http://demo.exist-db.org/exist/lucene.xml#N10352" target="_blank">XML query syntax</a> </li>
</ul>
<p>The XML query syntax was explicitly designed to allow for more expressive queries than is possible with the Lucene syntax. Most notably, eXist has extensions for:</p>
<ul>
<li>fine-grained proximity searches with the &lt;near&gt; element (a.o. the possibility to specify that search terms can occur unordered) </li>
<li>regular expression searches with the &lt;regex&gt; element </li>
</ul>
<p>This makes the XML syntax the more interesting option for developing a user search interface. A search interface could then allow users to input search queries in the (quite intuitive) Lucene fashion, while providing additional options for specifying extra search features (‘(un)ordered proximity search’, ‘regular expression search’). Behind the scenes, both pieces of user input (search query + additional parameters) can be translated to an XML expression of the search query.</p>
<p><span id="more-73"></span>
<p>Obviously, the first step of such a translation involves parsing of a query in Lucene syntax and transforming it to its XML syntax equivalent.</p>
<p>This seems feasible in XQuery using a couple of eXist-specific extension functions. I’ll briefly sketch the approach below:</p>
<ol>
<li>translate a Lucene search string to an intermediate string mimicking the XML syntax, with some additions for later parsing of boolean operators </li>
<li>parse the intermediary XML search string as XML with <a href="http://demo.exist-db.org/exist/functions/util/parse" target="_blank">util:parse()</a> </li>
<li>transform the intermediary structures in the search query to full-fledged boolean expressions </li>
</ol>
<p>The first step, transforming a Lucene search string to an intermediate string mimicking the XML syntax, can be done with an XQuery function that merely performs string replacement. Let’s call it local:parse-lucene():</p>
<pre><a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=declare&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">declare</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=function&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">function</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($string) {
  (: replace <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=all&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">all</a> symbolic booleans <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> lexical counterparts :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">[^\\](\|{2}|&amp;amp;{2}|!) </span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
    let $rep := replace(
                  replace(
                    replace($string, '<span style="color:#8b0000;">&amp;amp;{2} </span>', '<span style="color:#8b0000;">AND </span>')
                  , '<span style="color:#8b0000;">\|{2} </span>', '<span style="color:#8b0000;">OR </span>')
                , '<span style="color:#8b0000;">! </span>', '<span style="color:#8b0000;">NOT </span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: replace <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=all&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">all</a> booleans <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> '<span style="color:#8b0000;">&lt;AND/&gt;|&lt;OR/&gt;|&lt;NOT/&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">[^&lt;](AND|OR|NOT) </span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
    let $rep := replace($string,
                        '<span style="color:#8b0000;">(AND|OR|NOT) </span>',
                        '<span style="color:#8b0000;">&lt;$1/&gt;</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: replace <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=all&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">all</a> '<span style="color:#8b0000;">+</span>' modifiers <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> '<span style="color:#8b0000;">&lt;AND/&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|[^\w&quot;</span>'])\+[\w&quot;<span style="color:#8b0000;">'(]')) then</span>
    let $rep := replace($string,
                        '<span style="color:#8b0000;">(^|[^\w&quot;</span>'])\+([\w&quot;<span style="color:#8b0000;">'(])',</span>
                        '<span style="color:#8b0000;">$1&lt;AND type=_+_/&gt;$2</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: replace <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=all&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">all</a> '<span style="color:#8b0000;">-</span>' modifiers <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> '<span style="color:#8b0000;">&lt;NOT/&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|[^\w&quot;</span>'])-[\w&quot;<span style="color:#8b0000;">'(]')) then</span>
    let $rep := replace($string,
                        '<span style="color:#8b0000;">(^|[^\w&quot;</span>'])-([\w&quot;<span style="color:#8b0000;">'(])',</span>
                        '<span style="color:#8b0000;">$1&lt;NOT type=_-_/&gt;$2</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: replace round brackets <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> '<span style="color:#8b0000;">&lt;bool&gt;&lt;/bool&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|\W|&gt;)\(.*?\)(\^(\d+))?(&lt;|\W|$)</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
    let $rep :=
      (: <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=add&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">add</a> @boost attribute <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=when&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">when</a> string ends <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> ^\d :)
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|\W|&gt;)\(.*?\)(\^(\d+))(&lt;|\W|$)</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
        replace($string,
                '<span style="color:#8b0000;">(^|\W|&gt;)\((.*?)\)(\^(\d+))(&lt;|\W|$)</span>',
                '<span style="color:#8b0000;">$1&lt;bool boost=_$4_&gt;$2&lt;/bool&gt;$5</span>')
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a>
        replace($string,
                '<span style="color:#8b0000;">(^|\W|&gt;)\((.*?)\)(&lt;|\W|$)</span>',
                '<span style="color:#8b0000;">$1&lt;bool&gt;$2&lt;/bool&gt;$3</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: replace quoted phrases <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=with&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">with</a> '<span style="color:#8b0000;">&lt;near slop=&quot;&quot;&gt;&lt;/bool&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|\W|&gt;)(&#034;|&#039;).*?\2([~^]\d+)?(&lt;|\W|$)</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
    let $rep :=
      (: <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=add&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">add</a> @boost attribute <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=when&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">when</a> phrase ends <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> ^\d :)
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">(^|\W|&gt;)(&#034;|&#039;).*?\2([\^]\d+)?(&lt;|\W|$)</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
        replace($string,
                '<span style="color:#8b0000;">(^|\W|&gt;)(&#034;|&#039;)(.*?)\2([~^](\d+))?(&lt;|\W|$)</span>',
                '<span style="color:#8b0000;">$1&lt;near boost=_$5_&gt;$3&lt;/near&gt;$6</span>')
      (: <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=add&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">add</a> @slop attribute <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> other cases :)
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a>
        replace($string,
                '<span style="color:#8b0000;">(^|\W|&gt;)(&#034;|&#039;)(.*?)\2([~^](\d+))?(&lt;|\W|$)</span>',
                '<span style="color:#8b0000;">$1&lt;near slop=_$5_&gt;$3&lt;/near&gt;$6</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: wrap fuzzy search strings <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> '<span style="color:#8b0000;">&lt;fuzzy min-similarity=&quot;&quot;&gt;&lt;/fuzzy&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($string, '<span style="color:#8b0000;">[\w-[&lt;&gt;]]+?~[\d.]*</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
    let $rep := replace($string,
                        '<span style="color:#8b0000;">([\w-[&lt;&gt;]]+?)~([\d.]*)</span>',
                        '<span style="color:#8b0000;">&lt;fuzzy min-similarity=_$2_&gt;$1&lt;/fuzzy&gt;</span>')
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($rep)
  (: wrap resulting string <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> '<span style="color:#8b0000;">&lt;query&gt;&lt;/query&gt;</span>' :)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> concat('<span style="color:#8b0000;">&lt;query&gt;</span>',
              replace(normalize-<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=space&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">space</a>($string), '<span style="color:#8b0000;">_</span>', '<span style="color:#8b0000;">&quot;</span>'),
               '<span style="color:#8b0000;">&lt;/query&gt;</span>')
};</pre>
<p><span style="font-size:xx-small;">[<strong>NOTE</strong>: The single and double quotation marks in the match() and replace() functions should be escaped with their respective character entities. Unfortunately, posting them to a HTML blog destroys this escaping, and doubly escaping the ampersands of the character entities ended up… escaped themselves! Should you want to try this function, you’ll have to make sure to escape these characters.]</span></p>
<p>This results in a string representing an intermediary XML version of the query. For example, this function would translate the Lucene search string ‘(fillet OR &quot;mal(ic)e done&quot;~1) AND snake^4’ to:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">query</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>fillet <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">OR</span><span style="color:#0000ff;">/&gt;</span><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">near</span> <span style="color:#ff0000;">slop</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>mal(ic)e done<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">near</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">AND</span><span style="color:#0000ff;">/&gt;</span>snake^4
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">query</span><span style="color:#0000ff;">&gt;</span></pre>
<p>This must be processed further, in order to get all boolean operators (and the occurrence indicators of their members) right. As booleans involve pairing of terms and hence (re)grouping, this conversion optimally operates on the corresponding XML structure of the preprocessed string. Therefore, it should be parsed to XML with eXist’s <a href="http://demo.exist-db.org/exist/functions/util/parse" target="_blank">util:parse()</a> function.</p>
<h2></h2>
<h2></h2>
<h2></h2>
<h2>Option 1: conversion to final XML with XSLT</h2>
<p>Since I’m most proficient in XSLT, I’ve implemented a first proof of concept for this conversion in an XSLT stylesheet that takes care of the boolean operators. This stylesheet can be applied to the intermediary XML search structure with the <a href="http://demo.exist-db.org/exist/functions/transform/transform" target="_blank">transform:transform()</a> function:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span>
     <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">xsl</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/1999/XSL/Transform&quot;</span>
     <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">xs</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/2001/XMLSchema&quot;</span>
     <span style="color:#ff0000;">exclude</span>-<span style="color:#ff0000;">result</span>-<span style="color:#ff0000;">prefixes</span>=<span style="color:#0000ff;">&quot;#all&quot;</span> <span style="color:#ff0000;">version</span>=<span style="color:#0000ff;">&quot;2.0&quot;</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">output</span> <span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;yes&quot;</span><span style="color:#0000ff;">/&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;query&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;@*&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;node()&quot;</span><span style="color:#0000ff;">/&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;*[(following-sibling::*[1]|preceding-sibling::*[1])
       [self::AND or self::OR or self::NOT]]&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">variable</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;name&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;(self::phrase|self::near)[not(@slop &amp;gt; 0)]&quot;</span><span style="color:#0000ff;">&gt;</span>phrase<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">value</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;name()&quot;</span><span style="color:#0000ff;">/&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">variable</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">element</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;{{$name}}&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;occur&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;preceding-sibling::*[1][self::AND]&quot;</span><span style="color:#0000ff;">&gt;</span>must<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;preceding-sibling::*[1][self::NOT]&quot;</span><span style="color:#0000ff;">&gt;</span>not<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;following-sibling::*[1]
                   [self::AND or self::OR or self::NOT][not(@type)]&quot;</span><span style="color:#0000ff;">&gt;</span>should<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>should<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">element</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;query/text()[normalize-space()]|bool/text()[normalize-space()]&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">variable</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;current&quot;</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;.&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">for</span>-<span style="color:#ff0000;">each</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;tokenize(., '\s+')[normalize-space()]&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#008000;">&lt;!-- here is the place for further differentiation between
           term / wildcard / regex elements --&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;occur&quot;</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;position() = 1 and $current/preceding-sibling::*[1]
                 [self::AND]&quot;</span><span style="color:#0000ff;">&gt;</span>must<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;position() = 1 and $current/preceding-sibling::*[1]
                 [self::NOT]&quot;</span><span style="color:#0000ff;">&gt;</span>not<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;position() = 1 and $current/following-sibling::*[1]
                     [self::AND or self::OR or self::NOT][not(@type)]&quot;</span><span style="color:#0000ff;">&gt;</span>should<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>should<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">if</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;matches(., '(.*?)(\^(\d+))(\W|$)')&quot;</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;boost&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">value</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;replace(., '(.*?)(\^(\d+))(\W|$)', '$3')&quot;</span><span style="color:#0000ff;">/&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">attribute</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">if</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">value</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;normalize-space(replace(., '(.*?)(\^(\d+))(\W|$)', '$1'))&quot;</span><span style="color:#0000ff;">/&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">for</span>-each<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;AND|OR|NOT&quot;</span> <span style="color:#ff0000;">priority</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">/&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;near/bool&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">value</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;concat('(', ., ')')&quot;</span><span style="color:#0000ff;">/&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;*&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">variable</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;name&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span> <span style="color:#ff0000;">test</span>=<span style="color:#0000ff;">&quot;(self::phrase|self::near)[not(@slop &amp;gt; 0)]&quot;</span><span style="color:#0000ff;">&gt;</span>phrase<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">when</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">value</span>-<span style="color:#ff0000;">of</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;name()&quot;</span><span style="color:#0000ff;">/&gt;</span><span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">otherwise</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">choose</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">variable</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">element</span> <span style="color:#ff0000;">name</span>=<span style="color:#0000ff;">&quot;{{$name}}&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">element</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span> <span style="color:#ff0000;">match</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span> <span style="color:#ff0000;">priority</span>=<span style="color:#0000ff;">&quot;-1&quot;</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">apply</span>-<span style="color:#ff0000;">templates</span> <span style="color:#ff0000;">select</span>=<span style="color:#0000ff;">&quot;@*|node()&quot;</span><span style="color:#0000ff;">/&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">copy</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">template</span><span style="color:#0000ff;">&gt;</span>

<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">xsl</span>:<span style="color:#800000;">stylesheet</span><span style="color:#0000ff;">&gt;</span></pre>
<p>Suppose (for quick prototyping’s sake) this XSLT stylesheet is stored in a variable $lucene2xml. Now all we have to do in the main query is process the input $string with local:parse-lucene(), parse its output with util:parse(), and transform it with transform:transform():</p>
<pre>let $string := '<span style="color:#8b0000;">(fillet OR &quot;mal(ic)e done&quot;~1) AND snake^4</span>'
let $luceneParse := <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($string)
let $luceneXML := util:parse($luceneParse)
<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> transform:transform($luceneXML, $lucene2xml, ())</pre>
<p>For the input string ‘(fillet OR &quot;mal(ic)e done&quot;~1) AND snake^4’, this produces following XML query:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">query</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">bool</span> <span style="color:#ff0000;">occur</span>=<span style="color:#0000ff;">&quot;must&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">occur</span>=<span style="color:#0000ff;">&quot;must&quot;</span><span style="color:#0000ff;">&gt;</span>fillet<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">near</span> <span style="color:#ff0000;">occur</span>=<span style="color:#0000ff;">&quot;should&quot;</span> <span style="color:#ff0000;">slop</span>=<span style="color:#0000ff;">&quot;1&quot;</span><span style="color:#0000ff;">&gt;</span>mal(ic)e done<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">near</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">term</span> <span style="color:#ff0000;">occur</span>=<span style="color:#0000ff;">&quot;must&quot; <span style="color:#ff0000;">boost</span>=<span style="color:#0000ff;">&quot;4&quot;</span></span><span style="color:#0000ff;">&gt;</span>snake<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">term</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">bool</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">query</span><span style="color:#0000ff;">&gt;</span></pre>
<h2></h2>
<h2>Option 2: conversion to final XML with XQuery function</h2>
<p>Since the XSLT stylesheet described above is quite simple, it can easily be formulated as an XQuery function as well. Apart from possible performance issues (I can’t make any sensible claims about this), a full XQuery solution is probably the more elegant option. Therefore, I’ve prepared an alternative version, expressing the XSLT stylesheet as an XQuery function:</p>
<pre><a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=declare&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">declare</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=function&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">function</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:lucene2xml($node) {
  typeswitch ($node)
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element(query) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
      element { node-name($node)} {
        element bool {
          $node/node()/<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:lucene2xml(.)
        }
      }
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element(<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a>) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> ()
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element(<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=OR&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">OR</a>) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> ()
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element(<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> ()
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element(bool) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($node/parent::near) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
        concat(&quot;<span style="color:#8b0000;">(</span>&quot;, $node, &quot;<span style="color:#8b0000;">)</span>&quot;)
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> element {node-name($node)} {
        $node/@*,
        $node/node()/<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:lucene2xml(.)
      }
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> element() <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
      let $name :=
        <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (($node/self::phrase|$node/self::near)[<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=not&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">not</a>(@slop &gt; 0)]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
          '<span style="color:#8b0000;">phrase</span>'
        <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> node-name($node)
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
        element { $name } {
          $node/@*,
          <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (($node/following-sibling::*[1]|
               $node/preceding-sibling::*[1])
              [self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=OR&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">OR</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
            attribute occur {
              <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($node/preceding-sibling::*[1][self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a>]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">must</span>'
              <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($node/preceding-sibling::*[1][self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">not</span>'
              <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($node/following-sibling::*[1]
                       [self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=OR&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">OR</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>][<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=not&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">not</a>(@type)]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">should</span>'
              <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> '<span style="color:#8b0000;">should</span>'
            }
          <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> (),
          $node/node()/<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:lucene2xml(.)
        }
    <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=case&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">case</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=text&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">text</a>() <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($node/parent::*[self::query <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::bool]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
        <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=for&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">for</a> $tok <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=at&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">at</a> $p <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=in&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">in</a> tokenize($node, '<span style="color:#8b0000;">\s+</span>')[normalize-<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=space&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">space</a>()]
        (: here <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=is&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">is</a> the place <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=for&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">for</a> further differentiation
           <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=between&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">between</a>  term / wildcard / regex elements :)
        (: <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=using&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">using</a> regex-regex detection (?):
           matches($string, '<span style="color:#8b0000;">((^|[^\\])[.?*+()\[\]\\^]|\$$)</span>') :)
        let $el-name := '<span style="color:#8b0000;">term</span>'
        <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> element { $el-name } {
          attribute occur {
            <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($p = 1 <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=and&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">and</a> $node/preceding-sibling::*[1]
                [self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a>]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">must</span>'
            <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($p = 1 <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=and&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">and</a> $node/preceding-sibling::*[1]
                [self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">not</span>'
            <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> ($p = 1 <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=and&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">and</a> $node/following-sibling::*[1]
                [self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=AND&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">AND</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=OR&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">OR</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=or&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">or</a> self::<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=NOT&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">NOT</a>][<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=not&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">not</a>(@type)]) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a> '<span style="color:#8b0000;">should</span>'
            <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> '<span style="color:#8b0000;">should</span>'
          },
          <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=if&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">if</a> (matches($tok, '<span style="color:#8b0000;">(.*?)(\^(\d+))(\W|$)</span>')) <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=then&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">then</a>
            attribute boost {
              replace($tok, '<span style="color:#8b0000;">(.*?)(\^(\d+))(\W|$)</span>', '<span style="color:#8b0000;">$3</span>')
            }
            <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a> (),
          normalize-<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=space&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">space</a>(replace($tok, '<span style="color:#8b0000;">(.*?)(\^(\d+))(\W|$)</span>', '<span style="color:#8b0000;">$1</span>'))
        }
      <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=else&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">else</a>
        normalize-<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=space&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">space</a>($node)
  <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=default&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">default</a> <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a>
    $node
};</pre>
<p>Instead of the XSLT transformation in the example above, this XQuery function can be called as the last step in the main query:</p>
<pre>let $string := '<span style="color:#8b0000;">(fillet OR &quot;mal(ic)e done&quot;~1) AND snake^4</span>'
let $luceneParse := <a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=local&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">local</a>:parse-lucene($string)
let $luceneXML := util:parse($luceneParse)
<a style="color:#0000ff;" href="http://search.microsoft.com/default.asp?so=RECCNT&amp;siteid=us%2Fdev&amp;p=1&amp;nq=NEW&amp;qu=return&amp;IntlSearch=&amp;boolean=PHRASE&amp;ig=01&amp;i=09&amp;i=99">return</a> local:lucene2xml($luceneXML/node())</pre>
<p>The output is (and should be) the same as above.</p>
<h2>Conclusion</h2>
<p>So far for this proof of concept. What’s missing still are the extra search parameters:</p>
<ul>
<li>wrap regex searches in &lt;regex&gt; instead of &lt;term&gt; </li>
<li>specify the ordered status of a proximity search in an @order attribute for &lt;near&gt; </li>
</ul>
<p>In order to achieve this, those options could be passed as parameters to the XSLT stylesheet or the XQuery function. Both the XSLT stylesheet and XQuery function have comments at the place where the former option should be processed. In the current approach, the @occur attribute could already be added in the local:parse-lucene() function. Also, the best place for inserting the &lt;fuzzy&gt; tag is subject to reconsideration (maybe it’s more logical to treat it on a par with other ‘atomic’ search components &lt;term&gt;, &lt;regex&gt;, and &lt;wildcard&gt;).</p>
<p>However, eXist currently limits regex and wildcard searches to a boolean context (&lt;bool&gt;), where they can occur instead of &lt;term&gt;. Allowing them inside &lt;near&gt; (or &lt;phrase&gt;) will have to be implemented (and considered) first. I’ll wait for the outcome of this implementation before finishing the script, which then will either allow &lt;regex&gt; or &lt;wildcard&gt; anywhere, or have to restrict it to more limited contexts. To be continued…</p>
<p>In the mean time, a wrapped-up version of this proof of concept XQuery can be found:</p>
<ul>
<li><a href="http://www.kantl.be/ctb/download/lucene2xml_xslt.xq" target="_blank">here</a> for the XSLT version </li>
<li><a href="http://www.kantl.be/ctb/download/lucene2xml_xquery.xq" target="_blank">here</a> for the XQuery version </li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/73/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/73/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=73&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2010/08/04/exist-lucene-to-xml-syntax/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>I&#8217;m so glad&#8230;</title>
		<link>http://rvdb.wordpress.com/2010/07/28/im-so-glad/</link>
		<comments>http://rvdb.wordpress.com/2010/07/28/im-so-glad/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 13:01:53 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[XSL FO]]></category>

		<guid isPermaLink="false">http://rvdb.wordpress.com/?p=67</guid>
		<description><![CDATA[&#8230;my previous post has finally grown out-of-date! FOP-1.0 has been released, which fixes the nasty bug where footnotes inside lists and tables got swallowed. There&#8217;s still an issue with overlapping content for footnotes inside columns, but I can live with that&#8230; Hence, I could as well delete my previous entry but will leave it here [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=67&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>&#8230;my previous <a href="http://rvdb.wordpress.com/2008/03/07/rendering-footnotes-in-tables-and-lists-with-fop/">post</a> has finally grown out-of-date!</p>
<p><a href="http://xmlgraphics.apache.org/fop/1.0/index.html">FOP-1.0</a> has been released, which fixes the nasty bug where footnotes inside lists and tables got swallowed. There&#8217;s still an issue with overlapping content for footnotes inside columns, but I can live with that&#8230;</p>
<p>Hence, I could as well delete my previous entry but will leave it here for documentation&#8217;s sake.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/67/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=67&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2010/07/28/im-so-glad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>
	</item>
		<item>
		<title>Rendering footnotes in tables and lists with FOP</title>
		<link>http://rvdb.wordpress.com/2008/03/07/rendering-footnotes-in-tables-and-lists-with-fop/</link>
		<comments>http://rvdb.wordpress.com/2008/03/07/rendering-footnotes-in-tables-and-lists-with-fop/#comments</comments>
		<pubDate>Fri, 07 Mar 2008 15:23:21 +0000</pubDate>
		<dc:creator>rvdb</dc:creator>
				<category><![CDATA[XML]]></category>
		<category><![CDATA[XSL FO]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://rvdb.wordpress.com/?p=35</guid>
		<description><![CDATA[[UPDATE: Meanwhile, FOP-1.0 has been released, which fixes the bug that informed this post. The workaround described below thus is only relevant for users of FOP versions 0.92 to 0.95. For the happiest FOPping experience, stop reading here and grab your copy of FOP-1.0!] [...or skip the discussion and just download the files] The reason [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=35&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>[<strong>UPDATE</strong>: Meanwhile, <a href="http://xmlgraphics.apache.org/fop/1.0/index.html" target="_blank">FOP-1.0</a> has been released, which fixes the bug that informed this post. The workaround described below thus is only relevant for users of FOP versions 0.92 to 0.95. For the happiest FOPping experience, stop reading here and grab your copy of <a href="http://xmlgraphics.apache.org/fop/1.0/index.html" target="_blank">FOP-1.0</a>!]</p>
<p>[...or skip the discussion and just download the <a href="http://www.kantl.be/ctb/download/notetest.zip">files</a>]</p>
<h2>The reason</h2>
<p>During the past couple of years, I&#8217;ve gathered some experience working with XML and related standards (XSLT, XSL-FO, XQuery). Part of our professional document production chain involves rendering PDF output from XML sources. I&#8217;ve grown into a big fan of Apache&#8217;s open source <a href="http://xmlgraphics.apache.org/fop/index.html" target="_blank">FOP</a> processor since its now ancient <a href="http://archive.apache.org/dist/xmlgraphics/fop/binaries/fop-0.20.5-bin.zip" target="_blank">version 0.20.5</a>. Although the FOP code has been substantially revised and improved long since, the versions up to <a href="http://xmlgraphics.apache.org/fop/0.95/index.html" target="_blank">version 0.95</a> were haunted by one serious bug, which kept me from switching to an up-to-date version of FOP: footnotes inside lists or table cells got swallowed in PDF output.</p>
<p>On the other hand, FOP&#8217;s <a href="http://xmlgraphics.apache.org/fop/compliance.html" target="_blank">XSL-FO compliance</a> rate has risen substantially in the recent versions, prompting me to find a way of dealing with this nasty show-stopper. Of course, I hope the FOP developers will be able to resolve this issue soon. In the mean time, I think I&#8217;ve found a way of circumventing (or at least alleviating) the problem (at stylesheet level; not at Java code level). Moreover, I think this approach might help other users as well, and other users might help improving this approach where it doesn&#8217;t.</p>
<p> <span id="more-35"></span>
<p>Hence this initial blog post, in a mild blend of self-documentation and altruism. It will be quite technical and specific, but I hope to get the message clear. At least I&#8217;ll try, by:</p>
<ul>
<li>starting from a stable, simple example. For (my own) convenience&#8217;s sake, I&#8217;ve crafted a <a href="http://www.tei-c.org/ns/1.0" target="_blank">TEI P5</a> example structure, since that&#8217;s where my expertise lies. </li>
<li>illustrating intermediate steps with (pointers to) XSL-FO code examples </li>
<li>illustrating the results with screenshots of corresponding PDF output of the examples </li>
</ul>
<p>I&#8217;ve categorised this blog post under XML, XSL-FO, and XSLT as well. Although I&#8217;ll focus on the (theoretical) XSL-FO side of a solution, I&#8217;ll provide a link to a <a href="http://rvdb.wordpress.com/2008/03/07/rendering-footnotes-in-tables-and-lists-with-fop#download">zip package</a> containing sample XML documents and an XSLT stylesheet illustrating the final stage of this solution. However, some problems remain for which I don&#8217;t have an immediate solution. Therefore I welcome any comments.</p>
</p>
<h2>The problem</h2>
<p>Simply put, FOP (0.92-0.95) had troubles rendering fo:footnote areas occurring within fo:table or fo:list-block areas, an issue which is formally documented as a <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=37579" target="_blank">bug</a>.&#160; At the time of writing this post, the comments in the bug tracker suggested that&#160; a) the bug wouldn&#8217;t be resolved for anytime soon and b) there was no known workaround. Consider for example, this standard TEI XML list containing a footnote:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#800000;">list</span> <span style="color:#ff0000;">xmlns</span>=<span style="color:#0000ff;">&quot;http://www.tei-c.org/ns/1.0&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[1]
    <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">note</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[1]/note[1]<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">note</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[2]<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[3]<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">item</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#800000;">list</span><span style="color:#0000ff;">&gt;</span></pre>
<p>
  <br />When transformed to a corresponding standard XSL-FO structure: </p>
<p></p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">fo</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/1999/XSL/Format&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">distance</span>-<span style="color:#ff0000;">between</span>-<span style="color:#ff0000;">starts</span>=<span style="color:#0000ff;">&quot;50pt&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">label</span>-<span style="color:#ff0000;">separation</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
    <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;from-parent(start-indent) + 5pt&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[1]
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;8pt&quot;</span> <span style="color:#ff0000;">vertical</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;super&quot;</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-<span style="color:#ff0000;">body</span>
              <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span> <span style="color:#ff0000;">space</span>-<span style="color:#ff0000;">after</span>=<span style="color:#0000ff;">&quot;0.5em&quot;</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span>
              <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span> <span style="color:#ff0000;">text</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;start&quot;</span>
              <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">style</span>=<span style="color:#0000ff;">&quot;normal&quot;</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">weight</span>=<span style="color:#0000ff;">&quot;normal&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span><span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
                  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
                  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[1]/note[1]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
                <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
              <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-body<span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[2]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1: list[1]/item[3]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span></pre>
<p>&#8230;the PDF output will show the footnote marker in the first list item, but <strong>NOT</strong> the footnote body at the bottom of the page! The same happens when tables are involved:</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="200"><a href="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-list2.jpg"><img style="border-width:0;" alt="footnoteproblem_list" src="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-list-thumb2.jpg?w=175&#038;h=244" width="175" height="244" /></a></td>
<td valign="top" width="200"><a href="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-table4.jpg"><img style="border-width:0;" alt="footnoteproblem_table" src="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-table-thumb2.jpg?w=174&#038;h=244" width="174" height="244" /></a></td>
</tr>
<tr>
<td valign="top" width="200"><strong>footnote problem in a list</strong></td>
<td valign="top" width="200"><strong>footnote problem in a table</strong></td>
</tr>
</tbody>
</table>
</div>
<p>Of course, this is quite uncomfortable: all goes well as long as the input documents don&#8217;t contain any footnotes inside tables or lists. But who can / wants to guarantee that?</p>
<h2>Proposal 1: &quot;relative endnotes&quot;</h2>
<p>A way in which this problem can be avoided, is by generating fo:footnote formatting objects for those footnotes <strong>outside</strong> the areas of their containing lists and tables. Key to this approach lies in this characteristic of the <a href="http://www.w3.org/TR/xsl/#fo_footnote" target="_blank">fo:footnote</a> formatting object:</p>
<blockquote>
<p>The fo:footnote formatting object does not generate any areas. The fo:footnote formatting object returns the areas generated and returned by its child fo:inline formatting object.</p>
</blockquote>
<p>This means that a fo:footnote will only produce in-line areas through the contents of its fo:inline footnote marker. Consequently, if this marker is left empty, a fo:footnote will only generate an out-of-line block area at the bottom of the page.</p>
<p>For tables and lists containing footnotes, this could inform following approach:</p>
<ol>
<li>Process all contents as usual, except for footnotes. For the latter, instead of outputting a fo:footnote formatting object, just generate a fo:inline &quot;dummy&quot; marker, nothing more. </li>
<li>Create a separate fo:block after the affected table / list, containing all footnotes. For each footnote, generate a complete fo:footnote structure, but leave the fo:inline footnote markers empty. </li>
</ol>
<p>Applied to the previous example, this would generate following XSL-FO fragment:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span> <span style="color:#ff0000;">xmlns</span>:<span style="color:#ff0000;">fo</span>=<span style="color:#0000ff;">&quot;http://www.w3.org/1999/XSL/Format&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">distance</span>-<span style="color:#ff0000;">between</span>-<span style="color:#ff0000;">starts</span>=<span style="color:#0000ff;">&quot;50pt&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">label</span>-<span style="color:#ff0000;">separation</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
    <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;from-parent(start-indent) + 5pt&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[1]
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;8pt&quot;</span> <span style="color:#ff0000;">vertical</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;super&quot;</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[2]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[3]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;8pt&quot;</span> <span style="color:#ff0000;">vertical</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;super&quot;</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
        <span style="color:#ff0000;">space</span>-<span style="color:#ff0000;">after</span>=<span style="color:#0000ff;">&quot;0.5em&quot;</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span>
        <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span> <span style="color:#ff0000;">text</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;start&quot;</span>
        <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">style</span>=<span style="color:#0000ff;">&quot;normal&quot;</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">weight</span>=<span style="color:#0000ff;">&quot;normal&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[1]/note[1]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span></pre>
<p>&#8230;and look: when rendered to PDF, this time the &#8216;footnote&#8217; is present (although technically, it&#8217;s rather a &quot;relative endnote&quot; to its containing list)!</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="200"><a href="http://rvdb.files.wordpress.com/2008/03/relativeendnote-list2.jpg"><img style="border-width:0;" alt="relativeendnote_list" src="http://rvdb.files.wordpress.com/2008/03/relativeendnote-list-thumb2.jpg?w=174&#038;h=244" width="174" height="244" /></a></td>
<td valign="top" width="200"><a href="http://rvdb.files.wordpress.com/2008/03/relativeendnote-table2.jpg"><img style="border-width:0;" alt="relativeendnote_table" src="http://rvdb.files.wordpress.com/2008/03/relativeendnote-table-thumb2.jpg?w=174&#038;h=244" width="174" height="244" /></a></td>
</tr>
<tr>
<td valign="top" width="200"><strong>relative endnote in a list</strong></td>
<td valign="top" width="200"><strong>relative endnote in a table</strong></td>
</tr>
</tbody>
</table>
</div>
<h3>Evaluation</h3>
<p>This approach, inspired by other <a href="http://www.dpawson.co.uk/xsl/sect3/fofixedposn.html#d12878e43" target="_blank">solutions to vertical alignment issues</a> seems quite elegant (simple) and efficient (powerful enough for nested tables / lists). However, it comes with a catch:</p>
<ul>
<li>Theoretically, it works best for short lists or tables that don&#8217;t span different pages. For longer ones, however, all footnote bodies will end up after their containing table / list (as a kind of end notes to the table / list). </li>
<li>In practice, there is a case where FOP seems to choke: when the number of footnotes in a list or table grows too large, FOP hangs. </li>
</ul>
<p>Under these provisos, this &quot;relative endnote&quot; approach might strike a fine balance between an efficient solution for most cases, and a sufficient compromise for longer lists or tables. However, the edge case where FOP reveals troubles with long tables / lists containing many footnotes leaves me uneasy.</p>
<h2>Proposal 2: &quot;relative footnotes&quot;</h2>
<p>This approach takes the reasoning one step further. It starts from 2 observations:</p>
<ol>
<li>As seen in the &quot;relative endnote&quot; approach, fo:footnote formatting objects with empty fo:inline footnote markers can be inserted without generating any extra whitespace between blocks. </li>
<li>As with other block formatting objects, tables and lists can be stacked under each other without whitespace as if they formed a whole, if they have appropriate space, margin and padding properties. </li>
</ol>
<p>From this it follows that if we can simulate footnotes in tables / lists via endnotes that are invisible as such (ie. they don&#8217;t generate extra inline areas under the affected tables / lists), it is equally possible to have endnotes per list item / table row. In other words: if tables or lists containing notes can be split up into atomic chunks, those atomic chunks can be followed by the relative endnotes they contain. Since lists and tables are composed of horizontal areas (list items and rows), we could treat each of those separately, create a separate single-item/row list or table for them in their own right and output the footnotes as relative endnotes to these tables / lists.</p>
<p>For the previous example, this would generate 3 separate lists, each containing 1 single item, of which the first one will be followed by a &quot;relative endnote&quot; block:</p>
<pre><span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span> <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">distance</span>-<span style="color:#ff0000;">between</span>-<span style="color:#ff0000;">starts</span>=<span style="color:#0000ff;">&quot;50pt&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">label</span>-<span style="color:#ff0000;">separation</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
    <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;from-parent(start-indent) + 5pt&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[1]
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;8pt&quot;</span> <span style="color:#ff0000;">vertical</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;super&quot;</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">inline</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;8pt&quot;</span> <span style="color:#ff0000;">vertical</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;super&quot;</span><span style="color:#0000ff;">/&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">size</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span> <span style="color:#ff0000;">space</span>-<span style="color:#ff0000;">after</span>=<span style="color:#0000ff;">&quot;0.5em&quot;</span>
          <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;0px&quot;</span> <span style="color:#ff0000;">text</span>-<span style="color:#ff0000;">align</span>=<span style="color:#0000ff;">&quot;start&quot;</span>
          <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">style</span>=<span style="color:#0000ff;">&quot;normal&quot;</span> <span style="color:#ff0000;">font</span>-<span style="color:#ff0000;">weight</span>=<span style="color:#0000ff;">&quot;normal&quot;</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span><span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>1<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
            <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[1]/note[1]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
          <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
        <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span>-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">footnote</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span> <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">distance</span>-<span style="color:#ff0000;">between</span>-<span style="color:#ff0000;">starts</span>=<span style="color:#0000ff;">&quot;50pt&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">label</span>-<span style="color:#ff0000;">separation</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
    <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;from-parent(start-indent) + 5pt&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[2]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">block</span> <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">distance</span>-<span style="color:#ff0000;">between</span>-<span style="color:#ff0000;">starts</span>=<span style="color:#0000ff;">&quot;50pt&quot;</span>
    <span style="color:#ff0000;">provisional</span>-<span style="color:#ff0000;">label</span>-<span style="color:#ff0000;">separation</span>=<span style="color:#0000ff;">&quot;10pt&quot;</span>
    <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;from-parent(start-indent) + 5pt&quot;</span><span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">label</span> <span style="color:#ff0000;">end</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;label-end()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>•<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-label<span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-<span style="color:#ff0000;">item</span>-<span style="color:#ff0000;">body</span> <span style="color:#ff0000;">start</span>-<span style="color:#ff0000;">indent</span>=<span style="color:#0000ff;">&quot;body-start()&quot;</span><span style="color:#0000ff;">&gt;</span>
      <span style="color:#0000ff;">&lt;</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>Case 1': list[1]/item[3]<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">block</span><span style="color:#0000ff;">&gt;</span>
    <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item-body<span style="color:#0000ff;">&gt;</span>
  <span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-item<span style="color:#0000ff;">&gt;</span>
<span style="color:#0000ff;">&lt;/</span><span style="color:#c71585;">fo</span>:<span style="color:#800000;">list</span>-block<span style="color:#0000ff;">&gt;</span></pre>
<p>Of course, this approach is easy enough for single-level lists or tables, but requires further consideration for nested ones. In this case, it is necessary not just to wrap up each list item into its own single-item list, but to mimic its complete superstructure. Otherwise indentation will be completely lost and it will be impossible to get the horizontal and vertical alignment straight.</p>
<p>Therefore, a complete treatment for &quot;relative footnotes&quot; should:</p>
<ol>
<li>For each table / list with footnotes, defer further processing to their child rows / items. </li>
<li>For each table row / list item, reconstruct the complete superstructure (up to the last table / list) in which it occurs. </li>
<li>For each step in this reconstruction, only output the corresponding XSL-FO structure without text contents. Only for the last step (ie. the row / item under consideration), output the text contents. </li>
<li>If the table row / list item contains footnotes, output these as relative endnotes in a block after the table / list. </li>
<li>If the table row / list item contains further nested tables or lists, repeat steps 1 to 5 for each of these. </li>
</ol>
<p>This reconstruction can be done easier than ever with an XSLT 2.0 stylesheet and its native capabilities to assign intermediate node sets to variables. It requires some thought for proper treatment of padding settings and especially table borders, but since that&#8217;s mainly an XSLT matter, I won&#8217;t go into details here. For brevity&#8217;s sake, I won&#8217;t give an XSL-FO code example either (things soon get very verbose), but instead provide a screenshot of a complex case:</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="400"><a href="http://rvdb.files.wordpress.com/2008/03/relativefootnote-complex1.jpg"><img style="border-width:0;" alt="relativefootnote_complex" src="http://rvdb.files.wordpress.com/2008/03/relativefootnote-complex-thumb1.jpg?w=244&#038;h=173" width="244" height="173" /></a></td>
</tr>
<tr>
<td valign="top" width="400"><strong>relative footnotes with mixed nesting tables and lists</strong></td>
</tr>
</tbody>
</table>
</div>
<h3>Evaluation</h3>
<p>From a theoretical angle, this &quot;relative footnote&quot; approach is appealing because it&#8217;s more complete than the &quot;relative endnote&quot; approach and avoids the problem FOP seems to have when the number of footnotes in lists or tables grows too large.</p>
<p>On the other hand, it complicates processing substantially, for dealing with nested lists and tables requires careful thinking about padding and border properties. Moreover, it generates a heavy load of XSL-FO structures: each table row / list item will mirror its complete superstructure, which gets proportionally complex and verbose as the nesting level increases. This concern can be lessened by strategies for minimisation:</p>
<ol>
<li>only applying the &quot;relative footnote&quot; approach to tables and lists containing footnotes, while treating all others regularly </li>
<li>for<a title="optimisation2" name="optimisation2"></a> tables / lists containing footnotes, only apply the &quot;relative footnotes&quot; technique to those rows / items with footnotes, while grouping others in regular tables / lists </li>
</ol>
<p>With such optimisations, this approach looks like a promising solution to table/list-related footnote display problems with FOP. However, another problem looms, exposing the line-based approach of this technique as a limitation.</p>
<h2>Relative, all too relative</h2>
<p>The &quot;relative footnote&quot; approach goes a long way in circumventing the problem, but can&#8217;t cope with a further level of complexity. Consider the case where one table row contains multiple nesting tables or lists.</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="400"><a href="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problemarea1.jpg"><img style="border-width:0;" alt="relativefootnote_problemarea" src="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problemarea-thumb1.jpg?w=174&#038;h=244" width="174" height="244" /></a></td>
</tr>
<tr>
<td valign="top" width="400"><strong>parallel nesting tables</strong></td>
</tr>
</tbody>
</table>
</div>
<p style="text-align:center;"><a href="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problemarea1.jpg"></a></p>
<p>The line-based nature of the &quot;relative footnote&quot; approach won&#8217;t be able to cope well when <strong>multiple levels </strong>of parallel lines are involved, as is the case with parallel nesting tables. In this case, alignment is the problem: within the containing table row (row 2 in this case), both cells align with each other. However, in their nesting tables, rows will align independently of their parallel counterparts, depending on the number of rows and the length of their text contents. With a strict line-based approach, this would force all rows at all levels to align with each other, producing unwanted whitespace as in following example (mocked-up in Word because I&#8217;m undecided if implementing it in the XSLT stylesheet is worth the trouble):</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="400"><a href="http://rvdb.files.wordpress.com/2008/03/relativefootnote-imperfect.jpg"><img style="border-width:0;" alt="relativefootnote_imperfect" src="http://rvdb.files.wordpress.com/2008/03/relativefootnote-imperfect-thumb.jpg?w=190&#038;h=244" width="190" height="244" /></a></td>
</tr>
<tr>
<td valign="top" width="400"><strong>parallel nesting tables with improper alignment</strong></td>
</tr>
</tbody>
</table>
</div>
<p>This is where I&#8217;m stuck at the moment. I don&#8217;t see clear how parallel nesting tables should be treated. It <strong>is</strong> possible, however, to treat parallel nesting tables correctly in isolation, but integrating them in a strictly line-based approach will destroy their independent alignment.</p>
<p>Without modification, the &quot;relative footnote&quot; XSLT stylesheet will render the previous example as follows:</p>
<div style="text-align:center;">
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="400"><a href="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problem1.jpg"><img style="border-width:0;" alt="relativefootnote_problem" src="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problem-thumb1.jpg?w=174&#038;h=244" width="174" height="244" /></a></td>
</tr>
<tr>
<td valign="top" width="400"><strong>parallel nesting tables processed sequentially</strong></td>
</tr>
</tbody>
</table>
</div>
<p>While illustrating the correct logic of the stylesheet, the output clearly is sub-optimal: parallel nesting tables are processed sequentially, producing complete superstructure-tables for each row. I don&#8217;t have any clear ideas yet, but maybe some position-related properties could be worth investigating further:</p>
<ul>
<li>if the width of the outer tables containing parallel nesting tables would be reduced to the required column width, maybe <em>floating </em>the latter on the right hand side of the former table could reconstruct their juxtaposed nature. Unfortunately, floating properties are not yet supported by FOP. </li>
<li>if both tables could be &#8216;laid over&#8217; each other with absolute positioning, this could reconstruct their juxtaposed nature. My current knowledge about position-absolute doesn&#8217;t allow me even to predict if this route is worth investigating. </li>
</ul>
<p>However undesirable from a theoretical point of view, I currently consider this as the end point of my quest, considering the fact that tables with footnotes and parallel nesting tables might be rare in the wild. Anyway, I hope to have found a solution for this problem before I&#8217;ll have to tackle it (or rather hope that FOP will soon have its footnote treatment properly fixed). In the mean time, I welcome any comments!</p>
<h2>Wrapping up: conclusion and files</h2>
<p>When processing PDF output from XML documents with FOP 0.92+, a serious drawback is the omission of footnotes occurring in tables or lists. In this post, 2 possible strategies were explored to circumvent the problem:</p>
<ol>
<li>&quot;relative endnotes&quot; approach: convert footnotes in tables or lists to endnotes to the affected table or list<br />
<table border="1" cellspacing="0" cellpadding="2" width="402">
<tbody>
<tr>
<td valign="top" width="200"><strong>pro</strong></td>
<td valign="top" width="200"><strong>contra</strong></td>
</tr>
<tr>
<td valign="top" width="200">simple</td>
<td valign="top" width="200">limited: footnotes appear as relative endnotes</td>
</tr>
<tr>
<td valign="top" width="200">&#160;</td>
<td valign="top" width="200">FOP hangs when the number of footnotes grows too large</td>
</tr>
</tbody>
</table>
</li>
<li>&quot;relative footnotes&quot; approach: convert tables or lists with footnotes to stacks of atomic tables or lists with relative endnotes<br />
<table border="1" cellspacing="0" cellpadding="2" width="402">
<tbody>
<tr>
<td valign="top" width="200"><strong>pro</strong></td>
<td valign="top" width="200"><strong>contra</strong></td>
</tr>
<tr>
<td valign="top" width="200">footnotes appear as genuine footnotes</td>
<td valign="top" width="200">verbose, complex</td>
</tr>
<tr>
<td valign="top" width="200">&#160;</td>
<td valign="top" width="200">conceptual problem with parallel nesting tables</td>
</tr>
</tbody>
</table>
</li>
</ol>
<h3><a name="download">Get the files</a></h3>
<p>XML and XSLT files illustrating the &quot;relative footnotes&quot; approach can be found <a href="http://www.kantl.be/ctb/download/notetest.zip" target="_blank">here</a>. This zip file contains following files:</p>
<ol>
<li>notetest.xml: a sample XML file containing 6 cases with reference solutions </li>
<li>notetest2.xml: a less carefully compiled sample XML file </li>
<li>notetest.xsl: an XSLT stylesheet demonstrating the &quot;relative footnotes&quot; technique </li>
</ol>
<p>The XSLT stylesheet will only apply the &quot;relative footnotes&quot; technique to tables or lists with footnotes, but currently doesn&#8217;t apply any internal optimisation to the latter (as suggested <a href="#optimisation2">above</a>).</p>
<p>In the XSLT stylesheet, the sub-optimal treatment of parallel nesting tables is kept as a kind of stub for further work.</p>
<p>Any comments are much appreciated!</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/rvdb.wordpress.com/35/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/rvdb.wordpress.com/35/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rvdb.wordpress.com/35/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rvdb.wordpress.com/35/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rvdb.wordpress.com/35/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rvdb.wordpress.com&amp;blog=2984866&amp;post=35&amp;subd=rvdb&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rvdb.wordpress.com/2008/03/07/rendering-footnotes-in-tables-and-lists-with-fop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c8b0c311ab53babab94cb9d250f50308?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">rvdb</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-list-thumb2.jpg" medium="image">
			<media:title type="html">footnoteproblem_list</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/footnoteproblem-table-thumb2.jpg" medium="image">
			<media:title type="html">footnoteproblem_table</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativeendnote-list-thumb2.jpg" medium="image">
			<media:title type="html">relativeendnote_list</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativeendnote-table-thumb2.jpg" medium="image">
			<media:title type="html">relativeendnote_table</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativefootnote-complex-thumb1.jpg" medium="image">
			<media:title type="html">relativefootnote_complex</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problemarea-thumb1.jpg" medium="image">
			<media:title type="html">relativefootnote_problemarea</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativefootnote-imperfect-thumb.jpg" medium="image">
			<media:title type="html">relativefootnote_imperfect</media:title>
		</media:content>

		<media:content url="http://rvdb.files.wordpress.com/2008/03/relativefootnote-problem-thumb1.jpg" medium="image">
			<media:title type="html">relativefootnote_problem</media:title>
		</media:content>
	</item>
	</channel>
</rss>
