The new data set version 0.3 release contains a couple of minor changes in the XML specification. Furthermore, the TreeTagger foundry has been re-created with TreeTagger version 3.2, fixing issues with the tokenization.
Here’s the link: http://korap.ids-mannheim.de/files/WPD.rootbasett_0.3.tar.bz2
Today, the Lucene team has announced the release of Lucene version 4.0. We have been working on migrating our Lucene-based code to Lucene 4.0 since the alpha has been released in July this year. Many thanks to all the Lucene developers for another great piece of open source software! Continue reading
The Proceedings of the Konvens 2012 conference (The 11th Conference on Natural Language Processing) are now online, including the paper “Using information retrieval technology for a corpus analysis platform” that has been published within KorAP.
We are happy to report that we submitted a paper titled “Using Information Retrieval Technology for a Corpus Analysis Platform” for the Konvens 2012 (The 11th Conference on Natural Language Processing) yesterday!
- Evaluating Query Languages for a Corpus Processing System and
- The New IDS Corpus Analysis Platform: Challenges and Prospects