Switching to Lucene 3.6

The Lucene project has released version 3.6 today. Apart from some bug fixes, it provides mainly improvement in text processing. These are features from which KorAP does not profit very much. But in addition, several bugs have been fixed, full Java 7 support is introduced, and the Finite State Transducers applied for certain queries have been improved.

Here’s a list of new features copied from the release announcement:

  • In addition to Java 5 and Java 6, this release has now full Java 7 support (minimum JDK 7u1 required).
  • TypeTokenFilter filters tokens based on their TypeAttribute.
  • Fixed offset bugs in a number of CharFilters, Tokenizers and TokenFilters that could lead to exceptions during highlighting.
  • Added phonetic encoders: Metaphone, Soundex, Caverphone, Beider-Morse, etc.
  • CJKBigramFilter and CJKWidthFilter replace CJKTokenizer.
  • Kuromoji morphological analyzer tokenizes Japanese text, producing both compound words and their segmentation.
  • Static index pruning (Carmel pruning) removes postings with low within-document term frequency.
  • QueryParser now interprets ‘*’ as an open end for range queries.
  • FieldValueFilter excludes documents missing the specified field.
  • CheckIndex and IndexUpgrader allow you to specify the specific FSDirectory implementation to use with the new -dir-impl command-line option.
  • FSTs can now do reverse lookup (by output) in certain cases and can be packed to reduce their size. There is now a method to retrieve top N shortest paths from a start node in an FST.
  • New WFSTCompletionLookup suggester supports finer-grained ranking for suggestions.
  • FST based suggesters now use an offline (disk-based) sort, instead of in-memory sort, when pre-sorting the suggestions.
  • ToChildBlockJoinQuery joins in the opposite direction (parent down to child documents).
  • New query-time joining is more flexible (but less performant) than index-time joins.
  • Added HTMLStripCharFilter to strip HTML markup.
  • Security fix: Better prevention of virtual machine SIGSEGVs when using MMapDirectory: Code using cloned IndexInputs of already closed indexes could possibly crash VM, allowing DoS attacks to your application.
  • Many bug fixes.


Leave a Reply

Your email address will not be published. Required fields are marked *