KorAP: Annotations

KorAP provides access to multiple levels of annotations originating from multiple resources, so called foundries.

The base foundry is available for all corpora and acts as a common ground for document structure annotation in the layer s.

s: Document structure supporting the spans: <base/s=s> for sentences, <base/s=p> for paragraphs, and <base/s=t> for the text span.

<base/s=s>

DeReKo annotations provide the following layer for the dereko prefix:

startsWith(<dereko/s=s>, Fragestunde)

CoreNLP annotations provide the following layer for the corenlp prefix:

p: Part-of-speech information is written in capital letters and is based on STTS
c: Constituency information follows the annotations of the negr@ corpus.
ne: Contains named entities like I-PER, I-ORG etc.
ne_hgc_175m_600: See above
ne_dewac_175_175m_600: See above

[corenlp/ne_dewac_175m_600=I-ORG]

TreeTagger annotations provide the following layer for the tt prefix:

l: All non-noun lemmas are written in lower case, nouns are written upper case. Composita stay intact (e.g. Normalbedingung)
p: All part-of-speech information is written in capital letters and is based on STTS

[tt/p=ADV]

Malt annotations provide the following layer for the malt prefix:

tt/p="PPOSAT" ->malt/d[func="DET"] node

OpenNLP annotations provide the following layer for the opennlp prefix:

p: All part-of-speech information is written in capital letters and is based on STTS

[opennlp/p=PDAT]

Marmot annotations provide the following layer for the marmot prefix:

p: Part-of-speech information is written in capital letters and is based on STTS
m: Includes information about case (acc ...), degree (pos), gender (fem ...) etc.

[marmot/m=degree:sup & marmot/p=ADJA]

For queries on specific layers without given foundries, KorAP provides default foundries. The default foundries apply to the following layers:

In the Lucene backend, the orth layer can only be bound to a specific foundry, as only one tokenization is supported.