From KWIC display to KWIC(er) processing with eXist

The eXist XML database has a dedicated XQuery module for displaying search results in a fixed context window, a visualization that is commonly known as a KeyWord In Context view. Search results are presented with a preceding and following text context (called further in this text left and right text context):

<p>
    <span class="previous">... s effect, sir; after what flourish your </span>
    <span class="hi">nature</span>
    <span class="following"> will.</span>
</p>

This formatting of search results invites to exploit its particular features, such as sorting the search results according to their left or right contexts, or even according to the nth word preceding or following the search term. This is heavily facilitated by the XML representation of the KWIC search results, where all three parts are isolated in their own XML element. However, while eXist’s current KWIC display module (as it is consistently called) does its job in presenting a KWIC display, in my opinion it is too much display-oriented:

  • it lacks performance on large result sets, and / or wide context widths, which is crucial for further processing, since sorting requires pre-computation of the entire result set
  • (though this is nitpicking:) the output is presentational HTML; while this is irrelevant from a processing point of view, I would prefer a semantically more ‘neutral’ format and defer presentational formatting to a later display phase

This post will address both objections and present alternatives. Additionally, ways for processing these KWIC results are discussed in the last section.

Read more of this post

As a matter of fac(e)t: (mimicking) faceted searching in eXist

In hindsight, since I set out developing search interfaces for XML text collections with the marvelous eXist XML database, I’ve been drawn to the concept of faceted search, even long before I knew it was called that way. The recent integration of Lucene indexing and searching capabilities into eXist (since version 1.4) holds promises for efficient facet-oriented search features such as integrating Lucene fields in search queries.

Read more of this post

Follow

Get every new post delivered to your Inbox.