Thanks to the work of Bastian Beyer last year, continued and expanded on first by Carsten, and now by Joachim, we are now in the process of adding a new foundry (annotation set), output by the Mate parser.
We use Mate for several reasons:
- we wanted to be able to release dependency annotations to the public,
- Mate was recommended to us as a reliable parser,
- it can produce interesting annotation layers for German (above the standard segmentation-based annotations, we also get two kinds of dependency annotations and semantic role annotations); the two kinds of dependencies are quite precious as reference for the ISO CQLF work done by Andreas, Elena and myself.
- it is also a bit of a challenge because unlike the other annotation tools that we use, Mate does not come with its own tokenization tool, and thus, theoretically, can be made to use any of our existing tokenization layers. This forces us to tighten some aspects of our data model, e.g. to force the presence of the element that encodes the tokenization layer in each instance of foundry metadata information, and to enable it to act as a soft link for cases where the given foundry has to rely on tokenization information external to it. For now, our Mate foundries will use the conservative tokenization layers of the Base foundry, in this way strengthening the concept of the Base foundry even further.