In order to familiarize users with the current KorAP interface, a new selection of tutorial videos has been made available:
Furthermore, the frontend has been updated to version 1.084, which includes fixes for several stability issues and layout changes.
A demonstration video presented at the LTC conference in Poznań (Poland) from 7 -9 December 2013:
KorAP web interface demonstration
Today we reached a preliminary feature freeze of the the Lucene based backend of KorAP. We now provide a good portion of the functionality of Cosmas II, are close to fully support the Poliqarp Query Language, and introduced some quite nice novel features for corpus querying.
Although most of the missing features are already provided by the second backend (using a Neo4j graph database), implementing these features is scheduled for early next year, marking the final milestone before starting the alpha testing phase of the KorAP project.
In the next weeks we will prepare the frontend to support all new backend functionalities and start working on the distribution capabilities of the index.
Shortly after the release of Lucene version 4.1, KorAP has updated. The update has been smooth as none of the changes to API have been relevant for the KorAP applications.
The new data set version 0.3 release contains a couple of minor changes in the XML specification. Furthermore, the TreeTagger foundry has been re-created with TreeTagger version 3.2, fixing issues with the tokenization.
Here’s the link: http://korap.ids-mannheim.de/files/WPD.rootbasett_0.3.tar.bz2
Today, the Lucene team has announced the release of Lucene version 4.0. We have been working on migrating our Lucene-based code to Lucene 4.0 since the alpha has been released in July this year. Many thanks to all the Lucene developers for another great piece of open source software! Continue reading
The Proceedings of the Konvens 2012 conference (The 11th Conference on Natural Language Processing) are now online, including the paper “Using information retrieval technology for a corpus analysis platform” that has been published within KorAP.
We are happy to report that we submitted a paper titled “Using Information Retrieval Technology for a Corpus Analysis Platform” for the Konvens 2012 (The 11th Conference on Natural Language Processing) yesterday!
And again, thanks to the patience and precise look of Eliza Margaretha, we have spotted a bug in the data set we have released. The TreeTagger foundry had some issues in places caused by a bug in converting certain special quotation marks from UTF-8 to Latin-1 encoding. Continue reading
It has been brought to our attention that our LREC paper on the comparison of three corpus query languages requires several corrections, which we review below, with apologies to the readers. Continue reading