KorAP: Query Languages

KorAP supports multiple query languages, of which Poliqarp is currently supported best. Poliqarp is very similar to the query languages of the IMS Open Corpus Workbench (CWB) and the SketchEngine. It was originally developed to query the Polish National Corpus.

Example Queries

Poliqarp: Find all occurrences of the lemma "Baum" as annotated by the default foundry.

[base=Baum]

Poliqarp: Find all sequences of adjectives as annotated by Treetagger, that are repeated 3 to 5 times in a row.

[tt/p=ADJA]{3,5}

Cosmas-II: Find all occurrences of the words "der" and "Baum", in case they are in a maximum distance of 5 tokens. The order is not relevant.

der /w5 Baum

Poliqarp+: Find all nominal phrases as annotated using CoreNLP, that contain an adverb as annotated by OpenNLP, that is annotated as something starting with an "A" using regular expressions in Treetagger.

contains(<corenlp/c=NP>,{[opennlp/p=ADV & tt/p="A.*"]})

Poliqarp+: Find all sentences as annotated by the base foundry that start with a sequence of one token in present tense as annotated by Marmot and the lemma "die" annotated by the default foundry. Highlight both terms of the sequence.

startswith(<base/s=s>, {1:[marmot/m=tense:pres]}{2:[base=die]})

Poliqarp+: Find all sequences of an article, followed by three to four adjectives and a noun as annotated by the Treetagger foundry, that finish a sentence. Highlight all parts of the sequence.

focus(3:endswith(<base/s=s>,{3:[tt/p=ART]{1:{2:[tt/p=ADJA]{3,4}}[tt/p=NN]}}))

Annis: Find all occurrences of the sequence of two tokens annotated as adverbs by the default foundry.

pos="ADV" & pos="ADV" & #1 . #2

Annis: Find all determiner relations with the label DET by MALT where the relation sources are attributive possesive pronouns annotated by Tree Tagger.

tt/p="PPOSAT" ->malt/d[func="DET"] node

CQL: Find all occurrences of the sequence "der alte Mann".

"der alte Mann"