Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 1 | % layout 'main', title => 'KorAP: Annis QL'; |
| 2 | |
Akron | 9490e3b | 2019-10-17 12:26:29 +0200 | [diff] [blame] | 3 | %= page_title |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 4 | |
Akron | 9490e3b | 2019-10-17 12:26:29 +0200 | [diff] [blame] | 5 | <p><%= ext_link_to 'ANNIS Query Language (Annis QL or AQL)', "https://corpus-tools.org/annis/aql.html" %> |
| 6 | is a query language of the <%= ext_link_to 'ANNIS corpus search system', "https://corpus-tools.org/annis/" %> |
margaretha | 0a3aeec | 2019-07-18 16:19:16 +0200 | [diff] [blame] | 7 | designed particularly to deal with complex linguistic corpora with multiple |
| 8 | annotation layers (e.g. morphology) and various annotation types (e.g. attribute-value |
| 9 | pairs, relations). The concept of AQL is similar to searching node elements and edges |
| 10 | between them, where a node element can be a token or an attribute-value pair.</p> |
| 11 | |
Akron | 9490e3b | 2019-10-17 12:26:29 +0200 | [diff] [blame] | 12 | <p>KorAP supports the following keywords by using the <%= embedded_link_to 'default foundries', 'data', 'annotation' %>: </p> |
margaretha | 0a3aeec | 2019-07-18 16:19:16 +0200 | [diff] [blame] | 13 | <dl> |
| 14 | <dt><code>node</code></dt> |
| 15 | <dd>a node element</dd> |
| 16 | <dt><code>tok</code></dt> |
| 17 | <dd>a token</dd> |
| 18 | <dt><code>cat</code> or <code>c</code></dt> |
| 19 | <dd>a constituent</dd> |
| 20 | <dt><code>lemma</code> or <code>l</code></dt> |
| 21 | <dd>a lemma annotated node</dd> |
| 22 | <dt><code>pos</code> or <code>p</code></dt> |
| 23 | <dd>a part-of speech annotated node</dd> |
| 24 | <dt><code>m</code></dt> |
| 25 | <dd>a morphologically annotated node</dd> |
| 26 | </dl> |
| 27 | |
| 28 | <blockquote class="warning"> |
| 29 | <p>KorAP does not support in-query metadata constraints in AQL yet, namely the prefix "meta::". In |
| 30 | KorAP, metadata constraints should be separated from search queries and be given as corpus |
| 31 | queries defining virtual corpora.</p> |
| 32 | </blockquote> |
| 33 | |
| 34 | <section id="examples"> |
| 35 | <h3>Node elements</h3> |
| 36 | |
| 37 | <p>Simple tokens</p> |
| 38 | %= doc_query annis => '"liebe"', cutoff => 1 |
| 39 | |
| 40 | <p>Attribute-value pairs</p> |
| 41 | %= doc_query annis => 'tok="liebe"', cutoff => 1 |
| 42 | |
| 43 | <p>Namespaces in AQL are realized as foundry and layer combinations in KorAP. They can be used |
| 44 | to query tokens having a specific layer annotated by a specific parser (foundry), for |
| 45 | example coordinating conjunctions (part-of-speech layer) from the TreeTagger foundry.</p> |
| 46 | %= doc_query annis => 'tt/p="KON"', cutoff => 1 |
| 47 | |
| 48 | <h3>Regular expressions</h3> |
| 49 | %= doc_query annis => 'tok =/m.*keit/', cutoff => 1 |
| 50 | |
| 51 | <h3>Sequence queries</h3> |
| 52 | <p>Two consecutive tokens</p> |
| 53 | %= doc_query annis => '"der"."Bär"', cutoff => 1 |
| 54 | |
| 55 | <p>Finite verbs indirectly followed by an adverb, where any number of tokens may occur in |
| 56 | between.</p> |
| 57 | %= doc_query annis => 'pos="VVFIN" .* pos="ADV"', cutoff => 1 |
| 58 | |
| 59 | <h3>Negation</h3> |
| 60 | <p>Negation, such as negated tokens, is only supported in KorAP in a sequence query. </p> |
| 61 | %= doc_query annis => '"Katze" . pos != "VVFIN"', cutoff => 1 |
| 62 | |
| 63 | <h3>Pointing relations</h3> |
| 64 | <p>Pointing relations describe direct relationships between two node elements, for instance |
| 65 | dependency relations.</p> |
| 66 | |
| 67 | <p>Querying all <code>"SUBJ"</code> dependency relations</p> |
| 68 | %= doc_query annis => 'node ->malt/d[func="SUBJ"] node', cutoff => 1 |
| 69 | |
| 70 | <p>Querying <code>"SUBJ"</code> dependency relations where the source node is token <code>"ich"</code></p> |
| 71 | %= doc_query annis => '"ich" ->malt/d[func="SUBJ"] node', cutoff => 1 |
| 72 | |
| 73 | <p>Querying <code>"SUBJ"</code> dependency relations where the source node is token |
| 74 | <code>"ich"</code> and the target node is a perfect participle</p> |
| 75 | %= doc_query annis => '"ich" ->malt/d[func="SUBJ"] pos="VVPP"', cutoff => 1 |
| 76 | |
| 77 | <h3>Using references</h3> |
| 78 | <p>Node elements may be refered to by using <code>#</code> and the position number of the element. For |
| 79 | instance, </p> |
| 80 | %= doc_query annis => '"ich" & pos="VVPP" & #1 ->malt/d[func="SUBJ"] #2', cutoff => 1 |
| 81 | %= doc_query annis => '"ich" & pos="VVPP" & #1 . #2', cutoff => 1 |
| 82 | |
| 83 | %# Bug in Krill |
| 84 | %# <p>"ich" & pos="VVFIN" & #1 ->malt/d[func="SUBJ"] #2 & #1 . #2</p> |
| 85 | |
| 86 | <blockquote class="warning"> |
| 87 | <p>Unary operators like <code>arity</code> or <code>tokenarity</code> are not yet implemented in KorAP.</p> |
| 88 | </blockquote> |
| 89 | |
| 90 | |
| 91 | <!-- Not implemented in Krill yet |
| 92 | |
| 93 | <h3>Unary operators</h3> |
| 94 | <dl>Arity</dl> |
| 95 | <dt>the number of children directly dominated by a node</dt> |
| 96 | <p>Querying adverbial phrases having exactly 2 direct childeren</p> |
| 97 | <p>cat="AVP" & #1:arity=2</p> |
| 98 | |
| 99 | <dl>Tokenarity</dl> |
| 100 | <dt>the number of tokens within a node</dt> |
| 101 | <p>Querying adverbial phrases consisting of exactly 2 tokens</p> |
| 102 | <p>cat="AVP" & #1:tokenarity=2</p> |
| 103 | |
| 104 | <h3>Searching within a tree</h3> |
| 105 | <h4>Dominance</h4> |
| 106 | <p>AQL describes hierarchical relations between nodes in a tree as a concept of dominance. |
| 107 | Node A dominates node B when A is located in a higher position than node B in a tree. |
| 108 | Moreover, A <strong>directly dominates</strong> B when A is located exactly above B |
| 109 | without any other nodes in between.</p> |
| 110 | |
| 111 | <p>Direct dominance</p> |
| 112 | <p></p> |
| 113 | |
| 114 | <p>Indirect dominance</p> |
| 115 | <p></p> |
| 116 | |
| 117 | --> |
| 118 | </section> |