KorAP: FCSQL
FCS-QL is a query language specifically developed to accomodate advanced search in Clarin Federated Content Search (FCS), that allows searching through annotated data. Accordingly, FCS-QL is primarily intended to represent queries involving annotation layers such as part-of-speech and lemma. FCS-QL grammar is fairly similar to Poliqarp since it was built heavily based on Poliqarp/CQP.
In FCS-QL, foundries are called qualifiers. A combination of a foundry and a layer is
separated with a colon, for example the lemma layer of Tree Tagger is represented as
tt:lemma
. KorAP supports the following annotation layers for FCS-QL:
- text
- surface text
- lemma
- lemmatisation
- pos
- part-of-speech
Simple queries
Querying simple terms
"Semmel"
Querying regular expressions
"gie(ss|ß)en"
Querying case-insensitive terms
"essen"/c
Complex queries
Querying using layers
Querying a simple term using the layer for surface text
[text = "Semmel"]
[text = "essen"/c]
Querying adverbs from the default foundry.
[pos="ADV"]
Querying using qualifiers (foundries)
Querying adverbs annotated by Opennlp
[opennlp:pos="ADV"]
Querying tokens with a lemma from Tree tagger
[tt:lemma = "leben"]
Querying using boolean operators
All tokens with lemma "leben"
which are also finite verbs
[tt:lemma ="leben" & pos="VVFIN"]
All tokens with lemma "leben"
which are also finite verbs or perfect participle
[tt:lemma ="leben" & (pos="VVFIN" | pos="VVPP")]
Sequence queries
Combining two terms in a sequence query
[opennlp:pos="ADJA"] "leben"
Empty token
Like in Poliqarp, an empty token is signified by []
which means any token. Due to the
excessive number of results, empty token is not allowed to be used independently, but in
combination with other tokens, for instance in a sequence query.
[] "Wolke"
Negation
Similar to empty token, negation is not allowed to be used independently due to the excessive number of results. However, it can be used in a sequence query.
[pos != "ADJA"] "Buch"
Querying using quantifier
Quantifiers indicate repetition of a term, for instance it can be used to search for
exactly two consecutive occurrences of "die"
.
"die" {2}
Quantifiers are also useful to search for the occurrences of any tokens near other
specific tokens, for instance two to three occurrences of any token between "wir"
and
"leben"
.
"wir" []{2,3} "leben"
Querying a term within a sentence
"Boot" within s