KorAP: Annotations
KorAP provides access to multiple levels of annotations originating from multiple resources, so called foundries.
Base Foundry
The base foundry is available for all corpora and acts as a common ground for document structure annotation in the layer s
.
- s
- Document structure supporting the spans:
<base/s=s>
for sentences,<base/s=p>
for paragraphs, and<base/s=t>
for the text span.
<base/s=s>
DeReKo (dereko
)
DeReKo annotations provide the following layer for the dereko
prefix:
- s
- Document structure as encoded in the I5 text document.
startsWith(<dereko/s=s>, Fragestunde)
CoreNLP (corenlp
)
CoreNLP annotations provide the following layer for the corenlp
prefix:
- p
- Part-of-speech information is written in capital letters and is based on STTS
- c
- Constituency information follows the annotations of the negr@ corpus.
- ne
- Contains named entities like
I-PER
,I-ORG
etc. - ne_hgc_175m_600
- See above
- ne_dewac_175_175m_600
- See above
[corenlp/ne_dewac_175m_600=I-ORG]
TreeTagger (tt
)
TreeTagger annotations provide the following layer for the tt
prefix:
- l
- All non-noun lemmas are written in lower case, nouns are written upper case. Composita stay intact (e.g.
Normalbedingung
) - p
- All part-of-speech information is written in capital letters and is based on STTS
[tt/p=ADV]
Malt (malt
)
Malt annotations provide the following layer for the malt
prefix:
- d
- Dependency information
tt/p="PPOSAT" ->malt/d[func="DET"] node
OpenNLP (opennlp
)
OpenNLP annotations provide the following layer for the opennlp
prefix:
- p
- All part-of-speech information is written in capital letters and is based on STTS
[opennlp/p=PDAT]
Marmot (marmot
)
Marmot annotations provide the following layer for the marmot
prefix:
- p
- Part-of-speech information is written in capital letters and is based on STTS
- m
- Includes information about case (
acc
...), degree (pos
), gender (fem
...) etc.
[marmot/m=degree:sup & marmot/p=ADJA]
Default Foundries
For queries on specific layers without given foundries, KorAP provides default foundries. The default foundries apply to the following layers:
- orth:
opennlp
- lemma:
tt
- pos:
tt
In the Lucene backend, the
orth
layer can only be bound to a specific foundry, as only one tokenization is supported.