Public DeReKo access

KorAP

We just relaunched KorAP providing a large subset of DeReKo (with most data from the W archive of COSMAS II, comprising more than 11 million documents). The data is annotated with part-of-speech information from CoreNLP, MarMoT, OpenNLP and TreeTagger, additional morphological features from MarMoT, lemma annotation from TreeTagger, constituency annotations from CoreNLP and dependency annotation from Malt.

To grant access to the restricted corpora, we are currently fixing a critical bug in the integration of the user management of COSMAS II – therefore, KorAP is temporarily not accessible from outside the IDS until we finished the integration.

Rabbid – Rapid Application Development Environment released on GitHub!

Rabbid - Recherche- und Analyse-Basis für Belegstellen in Diskursen

We are happy to announce the open source release of Rabbid (“Recherche- und Analyse-Basis für Belegstellen in Diskursen”). Rabbid is a standalone rapid application development environment for KorAP and used in production for the creation and management of collections of textual examples in the area of discourse analysis and discourse lexicography.

The development of Rabbid was a joint effort by the KorAP project and Dr. Ruth Mell of the Demokratiediskurs 1918-1925 project at the Institute for the German Language in Mannheim.

Unlike KorAP, Rabbid provides only a limited set of search operators for small, non-annotated corpora.

You can download Rabbid from GitHub. Rabbid is free software published under the BSD-2 License.

Rabbid - Screenshots

Kalamar – User Frontend released on GitHub!

Mojolicious-based Frontend to KorAP

We are happy to announce the open source release of Kalamar, the Mojolicious-based frontend for KorAP!

Kalamar is written in Perl and JavaScript, acts as a proof-of-concept for the KorAP API, and provides, among other features, …

  • aligned KWIC views,
  • multiple highlighting,
  • table views of morphological annotations,
  • tree views of hierarchical annotations,
  • localization,
  • a language-independent query helper for multiple tag sets,
  • and an embedded and interactive documentation!

Screenshots

Expect more features to come! You can already use Kalamar from inside the IDS and download the sources from GitHub.

EDIT: The IDS-Instance of KorAP is currently not accessible from outside the IDS.

KoralQuery at the QueryVis Workshop in Vilnius

KoralQuery, the general Corpus Query Protocol used for inter-component communication in KorAP, was presented on May 11th at the workhop on Innovative Corpus Query and Visualization Tools (QueryVis). The workshop was part of the 20th Nordic Conference of Computational Linguistics (Nodalida) in Vilnius, Lithuania. Proceedings are already available.

We would like to thank the reviewers and organizers for a great workshop!

Please cite this work as:
Joachim Bingel, Nils Diewald (2015). KoralQuery – a General Corpus Query Protocol, Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania.

Krill – Lucene-based Search Backend released on GitHub!

A Corpusdata Retrieval Index using Lucene for Look-Ups

We are happy to announce the open source release of Krill, the Lucene-based search backend for KorAP! Krill is the reference implementation for KoralQuery, covering most of the protocols features, including …

  • Fulltext search
  • Token-based annotation search
  • Span-based annotation search
  • Distance search
  • Positional search
  • Nested queries

… and many more!

You can download Krill on GitHub – feedback and contributions are very welcome!

KorAP alpha open for testing

The test version of KorAP is now open for IDS-internal access only, through the “demo” profile, linked right below the “Login” button, at http://korap.ids-mannheim.de/app

Please note that, for the time being, we can only serve a limited portion of DeReKo data, available to non-authenticated users. The preferred browser to use is Firefox. The frontend has also been tested with Safari. Old versions of Internet Explorer will surely not work well with it.

As you can image, at the moment we are aware of the most critical bugs, but we welcome all feedback, big and small. Bug report forms are available from inside the test version, by clicking the “Kontakt” link.

Proudly,

the KorAP Team

Preliminary Feature Freeze of the Lucene Backend

Today we reached a preliminary feature freeze of the the Lucene based backend of KorAP. We now provide a good portion of the functionality of Cosmas II, are close to fully support the Poliqarp Query Language, and introduced some quite nice novel features for corpus querying.
Although most of the missing features are already provided by the second backend (using a Neo4j graph database), implementing these features is scheduled for early next year, marking the final milestone before starting the alpha testing phase of the KorAP project.
In the next weeks we will prepare the frontend to support all new backend functionalities and start working on the distribution capabilities of the index.