blob: f7e50c182ab677cd803893814762668f2e59ecb3 [file] [log] [blame]
Rebecca Wilm4ce37af2024-09-17 12:06:42 +02001% layout 'main', title => 'KorAP: FCS-QL';
Akronff7811f2017-12-19 12:40:41 +01002
Akron9490e3b2019-10-17 12:26:29 +02003%= page_title
Akronff7811f2017-12-19 12:40:41 +01004
margaretha14ce4d62019-07-17 18:38:45 +02005<p>FCS-QL is a query language specifically developed to accomodate advanced search in
Rebecca Wilm4ce37af2024-09-17 12:06:42 +02006 <%= ext_link_to 'CLARIN Federated Content Search (FCS)', "https://www.clarin.eu/content/federated-content-search-clarin-fcs" %>
margaretha14ce4d62019-07-17 18:38:45 +02007 that allows searching through annotated data.
8Accordingly, FCS-QL is primarily intended to represent queries involving annotation layers
Akron3cfa26d2019-10-24 15:17:34 +02009such as part-of-speech and lemma. FCS-QL grammar is fairly similar to <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %> since it was
margaretha14ce4d62019-07-17 18:38:45 +020010built heavily based on Poliqarp/CQP.</p>
11
12<p>In FCS-QL, foundries are called qualifiers. A combination of a foundry and a layer is
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020013separated with a colon, for example the lemma layer of TreeTagger is represented as
margaretha14ce4d62019-07-17 18:38:45 +020014<code>tt:lemma</code>. KorAP supports the following annotation layers for FCS-QL:</p>
15
16<dl>
17 <dt>text</dt>
18 <dd>surface text</dd>
19 <dt>lemma</dt>
20 <dd>lemmatisation</dd>
21 <dt>pos</dt>
22 <dd>part-of-speech</dd>
23</dl>
24
25<section id="simple-queries">
26 <h3>Simple queries</h3>
27 <p>Querying simple terms</p>
28 %= doc_query fcsql => '"Semmel"', cutoff => 1
29
30 <p>Querying regular expressions</p>
31 %= doc_query fcsql => '"gie(ss|ß)en"', cutoff => 1
32
33 <p>Querying case-insensitive terms</p>
34 %= doc_query fcsql => '"essen"/c', cutoff => 1
35</section>
36
37<section id="complex-queries">
38 <h3>Complex queries</h3>
39
40 <h4>Querying using layers</h4>
41
42 <p>Querying a simple term using the layer for surface text</p>
43 %= doc_query fcsql => '[text = "Semmel"]', cutoff => 1
44 %= doc_query fcsql => '[text = "essen"/c]', cutoff => 1
45
Akron3cfa26d2019-10-24 15:17:34 +020046 <p>Querying adverbs from the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p>
margaretha14ce4d62019-07-17 18:38:45 +020047 %= doc_query fcsql => '[pos="ADV"]', cutoff => 1
48
49
50 <h4>Querying using qualifiers (foundries)</h4>
51
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020052 <p>Querying adverbs annotated by OpenNLP</p>
margaretha14ce4d62019-07-17 18:38:45 +020053 %= doc_query fcsql => '[opennlp:pos="ADV"]', cutoff => 1
54
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020055 <p>Querying tokens with a lemma from TreeTagger</p>
margaretha14ce4d62019-07-17 18:38:45 +020056 %= doc_query fcsql => '[tt:lemma = "leben"]', cutoff => 1
57
58
59 <h4>Querying using boolean operators</h4>
60
61 <p>All tokens with lemma <code>&quot;leben&quot;</code> which are also finite verbs</p>
62 %= doc_query fcsql => '[tt:lemma ="leben" & pos="VVFIN"]', cutoff => 1
63
64 <p>All tokens with lemma <code>&quot;leben&quot;</code> which are also finite verbs or perfect participle</p>
65 %= doc_query fcsql => '[tt:lemma ="leben" & (pos="VVFIN" | pos="VVPP")]', cutoff => 1
66
67
68 <h4>Sequence queries</h4>
69
70 <p>Combining two terms in a sequence query</p>
71 %= doc_query fcsql => '[opennlp:pos="ADJA"] "leben"', cutoff => 1
72
73
74 <h4>Empty token</h4>
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020075 <p>Like in <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %>, an empty token is signified by <code>[]</code>,
margaretha14ce4d62019-07-17 18:38:45 +020076 which means any token. Due to the
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020077 excessive number of results, the empty token is not allowed to be used independently but only in
margaretha14ce4d62019-07-17 18:38:45 +020078 combination with other tokens, for instance in a sequence query.</p>
79 %= doc_query fcsql => '[] "Wolke"', cutoff => 1
80
81
82 <h4>Negation</h4>
Rebecca Wilm4ce37af2024-09-17 12:06:42 +020083 <p>Similar to the empty token, negation is not allowed to be used independently due to the
margaretha14ce4d62019-07-17 18:38:45 +020084 excessive number of results. However, it can be used in a sequence query.</p>
85 %= doc_query fcsql => '[pos != "ADJA"] "Buch"', cutoff => 1
86
87
88 <h4>Querying using quantifier</h4>
89 <p>Quantifiers indicate repetition of a term, for instance it can be used to search for
90 exactly two consecutive occurrences of <code>&quot;die&quot;</code>.</p>
91 %= doc_query fcsql => '"die" {2}', cutoff => 1
92
93 <p>Quantifiers are also useful to search for the occurrences of any tokens near other
94 specific tokens, for instance two to three occurrences of any token between <code>&quot;wir&quot;</code> and
95 <code>&quot;leben&quot;</code>.</p>
96 %= doc_query fcsql => '"wir" []{2,3} "leben"', cutoff => 1
97
98
99 <h4>Querying a term within a sentence</h4>
100 %= doc_query fcsql => '"Boot" within s', cutoff => 1
101
102</section>