Akron | ff7811f | 2017-12-19 12:40:41 +0100 | [diff] [blame] | 1 | % layout 'main', title => 'KorAP: FCSQL'; |
| 2 | |
| 3 | <h2 id="tutorial-top">FCSQL</h2> |
| 4 | |
margaretha | 14ce4d6 | 2019-07-17 18:38:45 +0200 | [diff] [blame^] | 5 | <p>FCS-QL is a query language specifically developed to accomodate advanced search in |
| 6 | <%= doc_ext_link_to 'Clarin Federated Content Search (FCS)', "https://www.clarin.eu/content/federated-content-search-clarin-fcs" %>, |
| 7 | that allows searching through annotated data. |
| 8 | Accordingly, FCS-QL is primarily intended to represent queries involving annotation layers |
| 9 | such as part-of-speech and lemma. FCS-QL grammar is fairly similar to <%= doc_link_to 'Poliqarp', 'ql', 'poliqarp-plus' %> since it was |
| 10 | built heavily based on Poliqarp/CQP.</p> |
| 11 | |
| 12 | <p>In FCS-QL, foundries are called qualifiers. A combination of a foundry and a layer is |
| 13 | separated with a colon, for example the lemma layer of Tree Tagger is represented as |
| 14 | <code>tt:lemma</code>. KorAP supports the following annotation layers for FCS-QL:</p> |
| 15 | |
| 16 | <dl> |
| 17 | <dt>text</dt> |
| 18 | <dd>surface text</dd> |
| 19 | <dt>lemma</dt> |
| 20 | <dd>lemmatisation</dd> |
| 21 | <dt>pos</dt> |
| 22 | <dd>part-of-speech</dd> |
| 23 | </dl> |
| 24 | |
| 25 | <section id="simple-queries"> |
| 26 | <h3>Simple queries</h3> |
| 27 | <p>Querying simple terms</p> |
| 28 | %= doc_query fcsql => '"Semmel"', cutoff => 1 |
| 29 | |
| 30 | <p>Querying regular expressions</p> |
| 31 | %= doc_query fcsql => '"gie(ss|ß)en"', cutoff => 1 |
| 32 | |
| 33 | <p>Querying case-insensitive terms</p> |
| 34 | %= doc_query fcsql => '"essen"/c', cutoff => 1 |
| 35 | </section> |
| 36 | |
| 37 | <section id="complex-queries"> |
| 38 | <h3>Complex queries</h3> |
| 39 | |
| 40 | <h4>Querying using layers</h4> |
| 41 | |
| 42 | <p>Querying a simple term using the layer for surface text</p> |
| 43 | %= doc_query fcsql => '[text = "Semmel"]', cutoff => 1 |
| 44 | %= doc_query fcsql => '[text = "essen"/c]', cutoff => 1 |
| 45 | |
| 46 | <p>Querying adverbs from the <%= doc_link_to 'default foundry', 'data', 'annotation' %>.</p> |
| 47 | %= doc_query fcsql => '[pos="ADV"]', cutoff => 1 |
| 48 | |
| 49 | |
| 50 | <h4>Querying using qualifiers (foundries)</h4> |
| 51 | |
| 52 | <p>Querying adverbs annotated by Opennlp</p> |
| 53 | %= doc_query fcsql => '[opennlp:pos="ADV"]', cutoff => 1 |
| 54 | |
| 55 | <p>Querying tokens with a lemma from Tree tagger</p> |
| 56 | %= doc_query fcsql => '[tt:lemma = "leben"]', cutoff => 1 |
| 57 | |
| 58 | |
| 59 | <h4>Querying using boolean operators</h4> |
| 60 | |
| 61 | <p>All tokens with lemma <code>"leben"</code> which are also finite verbs</p> |
| 62 | %= doc_query fcsql => '[tt:lemma ="leben" & pos="VVFIN"]', cutoff => 1 |
| 63 | |
| 64 | <p>All tokens with lemma <code>"leben"</code> which are also finite verbs or perfect participle</p> |
| 65 | %= doc_query fcsql => '[tt:lemma ="leben" & (pos="VVFIN" | pos="VVPP")]', cutoff => 1 |
| 66 | |
| 67 | |
| 68 | <h4>Sequence queries</h4> |
| 69 | |
| 70 | <p>Combining two terms in a sequence query</p> |
| 71 | %= doc_query fcsql => '[opennlp:pos="ADJA"] "leben"', cutoff => 1 |
| 72 | |
| 73 | |
| 74 | <h4>Empty token</h4> |
| 75 | <p>Like in <%= doc_link_to 'Poliqarp', 'ql', 'poliqarp-plus' %>, an empty token is signified by <code>[]</code> |
| 76 | which means any token. Due to the |
| 77 | excessive number of results, empty token is not allowed to be used independently, but in |
| 78 | combination with other tokens, for instance in a sequence query.</p> |
| 79 | %= doc_query fcsql => '[] "Wolke"', cutoff => 1 |
| 80 | |
| 81 | |
| 82 | <h4>Negation</h4> |
| 83 | <p>Similar to empty token, negation is not allowed to be used independently due to the |
| 84 | excessive number of results. However, it can be used in a sequence query.</p> |
| 85 | %= doc_query fcsql => '[pos != "ADJA"] "Buch"', cutoff => 1 |
| 86 | |
| 87 | |
| 88 | <h4>Querying using quantifier</h4> |
| 89 | <p>Quantifiers indicate repetition of a term, for instance it can be used to search for |
| 90 | exactly two consecutive occurrences of <code>"die"</code>.</p> |
| 91 | %= doc_query fcsql => '"die" {2}', cutoff => 1 |
| 92 | |
| 93 | <p>Quantifiers are also useful to search for the occurrences of any tokens near other |
| 94 | specific tokens, for instance two to three occurrences of any token between <code>"wir"</code> and |
| 95 | <code>"leben"</code>.</p> |
| 96 | %= doc_query fcsql => '"wir" []{2,3} "leben"', cutoff => 1 |
| 97 | |
| 98 | |
| 99 | <h4>Querying a term within a sentence</h4> |
| 100 | %= doc_query fcsql => '"Boot" within s', cutoff => 1 |
| 101 | |
| 102 | </section> |