blob: 843438eebbfa0a464e03f2f8c6af09e1d7fcc225 [file] [log] [blame]
Akronff7811f2017-12-19 12:40:41 +01001% layout 'main', title => 'KorAP: FCSQL';
2
Akron9490e3b2019-10-17 12:26:29 +02003%= page_title
Akronff7811f2017-12-19 12:40:41 +01004
margaretha14ce4d62019-07-17 18:38:45 +02005<p>FCS-QL is a query language specifically developed to accomodate advanced search in
Akron9490e3b2019-10-17 12:26:29 +02006 <%= ext_link_to 'Clarin Federated Content Search (FCS)', "https://www.clarin.eu/content/federated-content-search-clarin-fcs" %>,
margaretha14ce4d62019-07-17 18:38:45 +02007 that allows searching through annotated data.
8Accordingly, FCS-QL is primarily intended to represent queries involving annotation layers
Akron9490e3b2019-10-17 12:26:29 +02009such as part-of-speech and lemma. FCS-QL grammar is fairly similar to <%= embedded_link_to 'Poliqarp', 'ql', 'poliqarp-plus' %> since it was
margaretha14ce4d62019-07-17 18:38:45 +020010built heavily based on Poliqarp/CQP.</p>
11
12<p>In FCS-QL, foundries are called qualifiers. A combination of a foundry and a layer is
13separated with a colon, for example the lemma layer of Tree Tagger is represented as
14<code>tt:lemma</code>. KorAP supports the following annotation layers for FCS-QL:</p>
15
16<dl>
17 <dt>text</dt>
18 <dd>surface text</dd>
19 <dt>lemma</dt>
20 <dd>lemmatisation</dd>
21 <dt>pos</dt>
22 <dd>part-of-speech</dd>
23</dl>
24
25<section id="simple-queries">
26 <h3>Simple queries</h3>
27 <p>Querying simple terms</p>
28 %= doc_query fcsql => '"Semmel"', cutoff => 1
29
30 <p>Querying regular expressions</p>
31 %= doc_query fcsql => '"gie(ss|ß)en"', cutoff => 1
32
33 <p>Querying case-insensitive terms</p>
34 %= doc_query fcsql => '"essen"/c', cutoff => 1
35</section>
36
37<section id="complex-queries">
38 <h3>Complex queries</h3>
39
40 <h4>Querying using layers</h4>
41
42 <p>Querying a simple term using the layer for surface text</p>
43 %= doc_query fcsql => '[text = "Semmel"]', cutoff => 1
44 %= doc_query fcsql => '[text = "essen"/c]', cutoff => 1
45
Akron9490e3b2019-10-17 12:26:29 +020046 <p>Querying adverbs from the <%= embedded_link_to 'default foundry', 'data', 'annotation' %>.</p>
margaretha14ce4d62019-07-17 18:38:45 +020047 %= doc_query fcsql => '[pos="ADV"]', cutoff => 1
48
49
50 <h4>Querying using qualifiers (foundries)</h4>
51
52 <p>Querying adverbs annotated by Opennlp</p>
53 %= doc_query fcsql => '[opennlp:pos="ADV"]', cutoff => 1
54
55 <p>Querying tokens with a lemma from Tree tagger</p>
56 %= doc_query fcsql => '[tt:lemma = "leben"]', cutoff => 1
57
58
59 <h4>Querying using boolean operators</h4>
60
61 <p>All tokens with lemma <code>&quot;leben&quot;</code> which are also finite verbs</p>
62 %= doc_query fcsql => '[tt:lemma ="leben" & pos="VVFIN"]', cutoff => 1
63
64 <p>All tokens with lemma <code>&quot;leben&quot;</code> which are also finite verbs or perfect participle</p>
65 %= doc_query fcsql => '[tt:lemma ="leben" & (pos="VVFIN" | pos="VVPP")]', cutoff => 1
66
67
68 <h4>Sequence queries</h4>
69
70 <p>Combining two terms in a sequence query</p>
71 %= doc_query fcsql => '[opennlp:pos="ADJA"] "leben"', cutoff => 1
72
73
74 <h4>Empty token</h4>
Akron9490e3b2019-10-17 12:26:29 +020075 <p>Like in <%= embedded_link_to 'Poliqarp', 'ql', 'poliqarp-plus' %>, an empty token is signified by <code>[]</code>
margaretha14ce4d62019-07-17 18:38:45 +020076 which means any token. Due to the
77 excessive number of results, empty token is not allowed to be used independently, but in
78 combination with other tokens, for instance in a sequence query.</p>
79 %= doc_query fcsql => '[] "Wolke"', cutoff => 1
80
81
82 <h4>Negation</h4>
83 <p>Similar to empty token, negation is not allowed to be used independently due to the
84 excessive number of results. However, it can be used in a sequence query.</p>
85 %= doc_query fcsql => '[pos != "ADJA"] "Buch"', cutoff => 1
86
87
88 <h4>Querying using quantifier</h4>
89 <p>Quantifiers indicate repetition of a term, for instance it can be used to search for
90 exactly two consecutive occurrences of <code>&quot;die&quot;</code>.</p>
91 %= doc_query fcsql => '"die" {2}', cutoff => 1
92
93 <p>Quantifiers are also useful to search for the occurrences of any tokens near other
94 specific tokens, for instance two to three occurrences of any token between <code>&quot;wir&quot;</code> and
95 <code>&quot;leben&quot;</code>.</p>
96 %= doc_query fcsql => '"wir" []{2,3} "leben"', cutoff => 1
97
98
99 <h4>Querying a term within a sentence</h4>
100 %= doc_query fcsql => '"Boot" within s', cutoff => 1
101
102</section>