KorAP: Query Languages
KorAP supports multiple query languages, of which Poliqarp is currently supported best. Poliqarp is very similar to the query languages of the IMS Open Corpus Workbench (CWB) and the SketchEngine. It was originally developed to query the Polish National Corpus.
Example Queries
Poliqarp: Find all occurrences of the lemma "baum" as annotated by the default foundry.
[base=Baum]
Poliqarp: Find all sequences of adjectives as annotated by Treetagger, that are repeated 3 to 5 times in a row.
[tt/p=ADJA]{3,5}
Cosmas-II: Find all occurrences of the words "der" and "Baum", in case they are in a maximum distance of 5 tokens. The order is not relevant.
der /w5 Baum
Cosmas-II: Find all sequences of a word starting with a "d" (using a wildcard) followed by an adjective as annotated in the mate foundry, followed by the word "Baum" (ignore the case), that is in a sentence element annotated by the default foundry.
Be aware: Minor incompatibilities with implemented languages may be announced with warnings.
d* MORPH(mate/p=ADJA) $Baum #IN #ELEM(s)
Poliqarp+: Find all nominal phrases as annotated using CoreNLP, that contain an adverb as annotated by OpenNLP, that is annotated as something starting with an "A" using regular expressions in Treetagger.
contains(<corenlp/c=NP>,{[opennlp/p=ADV & tt/p="A.*"]})
Poliqarp+: Find all sentences as annotated by the base foundry that start with a sequence of one token in present tense as annotated by Marmot and the lemma "die" annotated by the default foundry. Highlight both terms of the sequence.
startswith(<base/s=s>, {1:[marmot/m=tense:pres]}{2:[base=die]})
Poliqarp+: Find all sequences of an article, followed by three to four adjectives and a noun as annotated by the Treetagger foundry, that finish a sentence. Highlight all parts of the sequence.
focus(3:endswith(<base/s=s>,{3:[tt/p=ART]{1:{2:[tt/p=ADJA]{3,4}}[tt/p=NN]}}))
Annis: Find all occurrences of the sequence of two tokens annotated as adverbs by the default foundry.
pos="ADV" & pos="ADV" & #1 . #2
Annis: Find all determiner relations with the label DET
by MALT where the relation sources are attributive possesive pronouns annotated by Tree Tagger.
tt/p="PPOSAT" ->malt/d[func="DET"] node
CQL: Find all occurrences of the sequence "der alte Mann".
"der alte Mann"