KorAP: Corpus Data
KorAP is developed as being the main access point to DeReKo, being the successor of COSMAS II in that regard. But KorAP is not focussed on any specific corpus, it is, for example, now also used for the Romanian national corpus CoRoLa.
In KorAP, corpus texts are allowed to have arbitrary metadata information, that partially can be used to create subcorpora (so-called virtual corpora).
KorAP also supports an arbitrary number of Annotations from different sources (called foundries) with different layers.
- Tokens
- Annotations associated to single tokens (e.g. words or numbers)
- Spans
- Annotations to a sequence of words or nodes (e.g. sentences, phrases, constituency annotations)
- Relations
- Annotations of relations between tokens or spans (e.g. dependency annotations)
- Attributes
- Attribute information for tokens, spans, or relations (e.g. attributes of HTML elements)
Annotations of the following kind are supported: