Thanks to the work of Bastian Beyer last year, continued and expanded on first by Carsten, and now by Joachim, we are now in the process of adding a new foundry (annotation set), output by the Mate parser.
We use Mate for several reasons:
- we wanted to be able to release dependency annotations to the public,
- Mate was recommended to us as a reliable parser,
- it can produce interesting annotation layers for German (above the standard segmentation-based annotations, we also get two kinds of dependency annotations and semantic role annotations); the two kinds of dependencies are quite precious as reference for the ISO CQLF work done by Andreas, Elena and myself.
- it is also a bit of a challenge because unlike the other annotation tools that we use, Mate does not come with its own tokenization tool, and thus, theoretically, can be made to use any of our existing tokenization layers. This forces us to tighten some aspects of our data model, e.g. to force the presence of the element that encodes the tokenization layer in each instance of foundry metadata information, and to enable it to act as a soft link for cases where the given foundry has to rely on tokenization information external to it. For now, our Mate foundries will use the conservative tokenization layers of the Base foundry, in this way strengthening the concept of the Base foundry even further.
These are the forthcoming events relating directly or indirectly to the KorAP project:
- on July 8-10, we will hold a workshop on the perspectives for KorAP in Sopot, Poland;
- on July 16th, at Digital Humanities 2013 in Lincoln, NE, we will offer a tutorial on “Methods for Data Querying”;
- on July 25th, at Corpus Linguistics 2013 in Lancaster, UK, we will present a paper on “Robust corpus architecture: a new look at virtual collections and data access”;
- on October 1, we will host a workshop on “Perspectives on querying TEI-annotated data“, co-located with the TEI Conference in Rome, Italy.
Last month, with sadness, we said goodbye to Carsten, who has moved to another job and another country — best of luck, Carsten, and thank you for the wonderful job that you have done for KorAP!
We have also temporarily “lost” Elena, for an utterly joyful personal reason — we’re keeping our fingers crossed and waiting for the good news! Elena is scheduled to come back to work on the frontend and to provide her very valuable assistance in the development of ISO CQLF, at the beginning of October.
It is with utmost pleasure that I would like to welcome three new members of our team, who have joined the project at the beginning of June:
- Piotr Pęzik, who has taken it upon himself to develop an alternative backend (recall: KorA is a modular beast), based on a nifty amalgamate of Lucene and Neo4J,
- Nils Diewald, who has taken over a large part of Carsten’s backend duties together with a part of Elena’s frontend responsibilities as well,
- Joachim Bingel, who has co-operated with us in the past and is now returning to temporarily fill in for part of Elena’s role.
And since I’m on the topic of roles and responsibilities, let me mention the ongoing contributions of Michael Hanl, who works, among others, on the issues of authorization, data access, and user management.
Altogether, I’m honoured to be working (or, in Carsten’s case, to have worked) with such a splendid team — thank you, everyone! 🙂
It has been brought to our attention that our LREC paper on the comparison of three corpus query languages requires several corrections, which we review below, with apologies to the readers. Continue reading
We’ve been looking for a solution to this for a while, so perhaps the one we’ve ended up using, while not fully optimal, is worth sharing.
In an attempt to make brainstorming easier, we’ve been looking for mind-mapping software that supports concurrent editing, something like a cross between Freeplane and Google Docs. One semi-commercial product that advertises itself as having that capability turned out not to have it, and we decided to use Freeplane over DropBox [that last link is actually a bonus link that will give both a new DB user and myself extra 500MB of shared space; I’m trying to be clever, see…]. Continue reading