I have always been fond of Sitecores use of Lucene to index. I often use Lucene to relational operations avoiding iterating over the complete content tree etc.
The way Sitecore uses Lucene to index, hasn’t been all that well documented, but earlier this year Sitecore released some documentation describing how items are indexed. The guide is ok and it describes how indexes are configured and how the indexing is performed in an understandable way.
However I often find that I need some documentation on how to extract data from the indexes. Even Lucene’s own online documentation on this, is somewhat limited. A couple of weeks ago I found and skimmed the book “Lucene in action”. It is a recommendable book and describes the different queries and analyzers really well.
Still there is a lack of Sitecore documentation on how to extract data through the Sitecore wrapper API to Lucene. In particular I could use some documentation on the Sitecore.Data.Indexing namespace, which seems to hold a lot of functionality – also when one wants to extract data. The documentation is limited to a few snippets and blog entries.
However the lack of documentation hasn’t been an issue until Sitecore 6, as it was possible to use the tool Luke. With Luke it was possible to browse indexes, try out query strings etc. When developing something that uses Lucene, this tool has been essential.
In Sitecore 6 the indexing uses compression, resulting in Luke failing whenever you browse or search through an index – as .net compression isn’t compatible with Java compression. Whether or not to use compression for the indexes, hasn’t been implemented as a setting in the web.config, but is hardcoded into the IndexData class, so it is not possible to disable. L
I have asked Sitecore Support to register a change request for implementing a setting – but until then developing functionality, which uses the indexes, are based on guessing and use of the reflector… So Sitecore please hurry!