blob: 2b66d8d875a4822a91666affa2529c0e48aa0aac [file] [log] [blame]
Nils Diewalda31a5152015-04-17 21:05:23 +00001% layout 'main', title => 'KorAP: Poliqarp+';
2
Akron9490e3b2019-10-17 12:26:29 +02003%= page_title
Nils Diewalda31a5152015-04-17 21:05:23 +00004
Akron84b91992019-07-16 11:35:49 +02005<p>The following documentation introduces all features provided by our version of the Poliqarp Query Language and some KorAP specific extensions.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +00006
Nils Diewaldfccfbcb2015-04-29 20:48:19 +00007<section id="segments">
Nils Diewalda31a5152015-04-17 21:05:23 +00008 <h3>Simple Segments</h3>
9
Akronae24e162018-02-13 18:48:44 +010010 <p>The atomic elements of Poliqarp queries are segments. Most of the time segments represent words and can be simple queried:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000011 %# footnote: In the polish national corpus, Poliqarp can join multiple segments when identifying a single word.
12
Akronbee660d2018-02-14 15:57:02 +010013 %= doc_query poliqarp => loc('Q_poliqarp_simple', '** Tree')
Nils Diewalda31a5152015-04-17 21:05:23 +000014
15 <p>Sequences of simple segments are expressed using a space delimiter:</p>
16
Akronbee660d2018-02-14 15:57:02 +010017 %= doc_query poliqarp => loc('Q_poliqarp_simpleseq', '** the Tree')
Nils Diewalda31a5152015-04-17 21:05:23 +000018
19 <p>Simple segments always refer to the surface form of a word. To search for surface forms without case sensitivity, you can use the <code>/i</code> flag.</p>
20
Akronbee660d2018-02-14 15:57:02 +010021 %= doc_query poliqarp => loc('Q_poliqarp_simpleci', '** run/i')
Nils Diewalda31a5152015-04-17 21:05:23 +000022
Akrona7cfd902017-12-21 19:28:36 +010023 <p>The query above will find all occurrences of the term irrespective of the capitalization of letters.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000024
Nils Diewaldfccfbcb2015-04-29 20:48:19 +000025 <h4 id="regexp">Regular Expressions</h4>
Nils Diewalda31a5152015-04-17 21:05:23 +000026
Akron3cfa26d2019-10-24 15:17:34 +020027 <p>Segments can also be queried using <%= embedded_link_to 'doc', 'regular expressions', 'ql', 'regexp' %> - by surrounding the segment with double quotes.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000028
Akronbee660d2018-02-14 15:57:02 +010029 %= doc_query poliqarp => loc('Q_poliqarp_re', '** "r(u|a)n"'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000030
Akronae24e162018-02-13 18:48:44 +010031 <p>Regular expression segments will always match the whole segment, meaning the above query will find words starting with the first letter of the regular expression and ending with the last letter. To support subqueries, you can use the <code>/x</code> flag.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000032
Akronbee660d2018-02-14 15:57:02 +010033 %= doc_query poliqarp => loc('Q_poliqarp_rex', '** "r(u|a)n"/x'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000034
35 <p>The <code>/x</code> will search for all segments that contain a sequence of characters the regular expression matches. That means the above query is equivalent to:</p>
36
Akronbee660d2018-02-14 15:57:02 +010037 %= doc_query poliqarp => loc('Q_poliqarp_recontext', '** ".*?r(u|a)n.*?"'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000038
Akronae24e162018-02-13 18:48:44 +010039 <p>The <code>/x</code> flag can also be used in conjunction with strict expressions to search for substrings:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000040
Akronbee660d2018-02-14 15:57:02 +010041 %= doc_query poliqarp => loc('Q_poliqarp_simplex', '** part/xi'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000042
Akronae24e162018-02-13 18:48:44 +010043 <p>The above query will find all occurrences of segments including the defined substring regardless of upper and lower case.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000044
45 <blockquote class="warning">
Akronae24e162018-02-13 18:48:44 +010046 <p>Beware: Queries with prepended <code>.*</code> expressions can become extremely slow!</p>
Akron81afd282018-07-24 11:39:55 +020047 <p>In the original Poliqarp specification, regular expressions can be marked both by double quotes and single quotes. In Poliqarp+ only double quotes are used for regular expressions, while single quotes are used to mark verbatim strings.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000048 </blockquote>
49
Akronae24e162018-02-13 18:48:44 +010050 <p>You can again apply the <code>/i</code> flag to regular expressions to search case insensitive.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000051
Akronbee660d2018-02-14 15:57:02 +010052 %= doc_query poliqarp => loc('Q_poliqarp_rexi', '** "r(u|a)n"/xi'), cutoff => 1
Akron852fb8a2023-07-21 07:48:13 +020053
54 <h4 id="reserved">Reserved terms</h4>
55
56 <p>The following terms are <em>reserved words</em> in Poliqarp+ and can therefore not be used in short notation of simple segments.
57 Use the notation for <%= embedded_link_to 'doc', 'complex segments', 'ql', 'poliqarp-plus#complex'%> to query them (e.g. <code>[orth='contains']</code>):</p>
58
59 <ul>
60 <li><code>contains</code></li>
61 <li><code>dependency</code></li>
62 <li><code>dominates</code></li>
63 <li><code>endswith</code></li>
64 <li><code>endsWith</code></li>
65 <li><code>focus</code></li>
66 <li><code>i</code> and <code>I</code></li>
67 <li><code>meta</code></li>
68 <li><code>matches</code></li>
69 <li><code>overlaps</code></li>
70 <li><code>relatesTo</code></li>
71 <li><code>split</code></li>
72 <li><code>startswith</code> and <code>startsWith</code></li>
73 <li><code>submatch</code></li>
74 <li><code>within</code></li>
75 <li><code>x</code> and <code>X</code></li>
76 </ul>
77
Nils Diewalda31a5152015-04-17 21:05:23 +000078</section>
79
Nils Diewaldfccfbcb2015-04-29 20:48:19 +000080<section id="complex">
Nils Diewalda31a5152015-04-17 21:05:23 +000081 <h3>Complex Segments</h3>
82
83 <p>Complex segments are expressed in square brackets and contain additional information on the resource of the term under scrutiny by providing key/value pairs, separated by an equal-sign.</p>
84
Akronae24e162018-02-13 18:48:44 +010085 <p>The KorAP implementation of Poliqarp provides three special segment keys: <code>orth</code> for surface forms, <code>base</code> for lemmata, and <code>pos</code> for Part-of-Speech. The following complex query finds all surface forms of the defined word.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000086 %# There are more special keys in Poliqarp, but KorAP doesn't provide them.
87
Akronbee660d2018-02-14 15:57:02 +010088 %= doc_query poliqarp => loc('Q_poliqarp_complexorth', '** [orth=Tree]')
Nils Diewalda31a5152015-04-17 21:05:23 +000089
90 <p>The query is thus equivalent to:</p>
91
Akronbee660d2018-02-14 15:57:02 +010092 %= doc_query poliqarp => loc('Q_poliqarp_simple', '** Tree')
Nils Diewalda31a5152015-04-17 21:05:23 +000093
94 <p>Complex segments expect simple expressions as values, meaning that the following expression is valid as well:</p>
95
Akronbee660d2018-02-14 15:57:02 +010096 %= doc_query poliqarp => loc('Q_poliqarp_complexre', '** [orth="r(u|a)n"/xi]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000097
Akron3cfa26d2019-10-24 15:17:34 +020098 <p>Another special key is <code>base</code>, refering to the lemma annotation of the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.
Akronae24e162018-02-13 18:48:44 +010099 The following query finds all occurrences of segments annotated as a specified lemma by the default foundry.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000100
Akronbee660d2018-02-14 15:57:02 +0100101 %= doc_query poliqarp => loc('Q_poliqarp_complexlemma', '** [base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000102
Akron3cfa26d2019-10-24 15:17:34 +0200103 <p>The third special key is <code>pos</code>, refering to the part-of-speech annotation of the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.
Nils Diewalda31a5152015-04-17 21:05:23 +0000104 The following query finds all attributive adjectives:</p>
105
Akronbee660d2018-02-14 15:57:02 +0100106 %= doc_query poliqarp => loc('Q_poliqarp_complexpos', '** [pos=ADJA]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000107
108 <p>Complex segments requesting further token annotations can have keys following the <code>foundry/layer</code> notation.
Akronae24e162018-02-13 18:48:44 +0100109 For example to find all occurrences of plural words in a supporting foundry, you can search using the following query:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000110
Akronbee660d2018-02-14 15:57:02 +0100111 %= doc_query poliqarp => loc('Q_poliqarp_complexplural', '** [mate/m=number:pl]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000112
Akron81afd282018-07-24 11:39:55 +0200113 <p>In case an annotation contains special non-alphabetic and non-numeric characters, the annotation part can be surrounded by single quotes to ensure a verbatim interpretation:</p>
114
115 %= doc_query poliqarp => loc('Q_poliqarp_complexverbatim', '** [mate/o=\'This is an annotation with space characters\']'), cutoff => 1
116
Nils Diewalda31a5152015-04-17 21:05:23 +0000117 <h4>Negation</h4>
Akron54740182017-06-17 14:17:23 +0200118 <p>Negation of terms in complex expressions can be expressed by prepending the equal sign or the whole expression with an exclamation mark.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000119
Akronbee660d2018-02-14 15:57:02 +0100120 %= doc_query poliqarp => loc('Q_poliqarp_neg1', '** [pos!=ADJA]'), cutoff => 1
121 %= doc_query poliqarp => loc('Q_poliqarp_neg2', '** [!pos=ADJA]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000122
123 <blockquote class="warning">
Akronae24e162018-02-13 18:48:44 +0100124 <p>Beware: Negated complex segments can't be searched as a single statement.
Akron3cfa26d2019-10-24 15:17:34 +0200125 However, they work in case they are part of a <%= embedded_link_to 'doc', 'sequence', 'ql', 'poliqarp-plus#syntagmatic-operators-sequence' %>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000126 </blockquote>
127
128 <h4 id="empty-segments">Empty Segments</h4>
129
130 <p>A special segment is the empty segment, that matches every word in the index.</p>
131
Akronbee660d2018-02-14 15:57:02 +0100132 %= doc_query poliqarp => '[]', cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000133
Akron3cfa26d2019-10-24 15:17:34 +0200134 <p>Empty segments are useful to express distances of words by using <%= embedded_link_to 'doc', 'repetitions', 'ql', 'poliqarp-plus#syntagmatic-operators-repetitions' %>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000135
136 <blockquote class="warning">
Akronae24e162018-02-13 18:48:44 +0100137 <p>Beware: Empty segments can't be searched as a single statement.
Akron3cfa26d2019-10-24 15:17:34 +0200138 However, they work in case they are part of a <%= embedded_link_to 'doc', 'sequence', 'ql', 'poliqarp-plus#syntagmatic-operators-sequence' %>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000139 </blockquote>
140</section>
141
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000142<section id="spans">
Nils Diewalda31a5152015-04-17 21:05:23 +0000143 <h3>Span Segments</h3>
144
145 <p>Not all segments are bound to words - some are bound to concepts spanning multiple words, for example noun phrases, sentences, or paragraphs.
146Span segments can be searched for using angular brackets instead of square brackets.</p>
147
Akronbee660d2018-02-14 15:57:02 +0100148 %= doc_query poliqarp => loc('Q_poliqarp_span', '** <corenlp/c=NP>'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000149
150 <p>Otherwise they can be treated in exactly the same way as simple or complex segments.</p>
151</section>
152
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000153<section id="paradigmatic-operators">
Nils Diewalda31a5152015-04-17 21:05:23 +0000154 <h3>Paradigmatic Operators</h3>
155
Akronae24e162018-02-13 18:48:44 +0100156 <p>A complex segment can have multiple properties a token requires. For example to search for all words with a certain surface form of a particular lemma (no matter if capitalized or not), you can search for:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000157
Akronbee660d2018-02-14 15:57:02 +0100158 %= doc_query poliqarp => loc('Q_poliqarp_and', '** [orth=run/i & base=Run]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000159
160 <p>The ampersand combines multiple properties with a logical AND.
161Terms of the complex segment can be negated as introduced before.</p>
162
Akronbee660d2018-02-14 15:57:02 +0100163 %= doc_query poliqarp => loc('Q_poliqarp_andneg1', '** [orth=run/i & base!=Run]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000164
165 <p>The following query is therefore equivalent:</p>
166
Akronbee660d2018-02-14 15:57:02 +0100167 %= doc_query poliqarp => loc('Q_poliqarp_andneg2', '** [orth=run/i & !base=Run]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000168
169 <p>Alternatives can be expressed by using the pipe symbol:</p>
170
Akronbee660d2018-02-14 15:57:02 +0100171 %= doc_query poliqarp => loc('Q_poliqarp_or', '** [base=run | base=go]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000172
173 <p>All these sub expressions can be grouped using round brackets to form complex boolean expressions:</p>
174
Akronbee660d2018-02-14 15:57:02 +0100175 %= doc_query poliqarp => loc('Q_poliqarp_group', '** [(base=run | base=go) & tt/pos=VVFIN]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000176</section>
177
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000178<section id="syntagmatic-operators">
Nils Diewalda31a5152015-04-17 21:05:23 +0000179 <h3>Syntagmatic Operators</h3>
180
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000181 <h4 id="syntagmatic-operators-sequence">Sequences</h4>
Nils Diewalda31a5152015-04-17 21:05:23 +0000182
Akronae24e162018-02-13 18:48:44 +0100183 <p>Sequences can be used to search for segments in order. For this, simple expressions are separated by whitespaces.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000184
Akronbee660d2018-02-14 15:57:02 +0100185 %= doc_query poliqarp => loc('Q_poliqarp_seq', '** the old man'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000186
187 <p>However, you can obviously search using complex segments as well:</p>
188
Akronbee660d2018-02-14 15:57:02 +0100189 %= doc_query poliqarp => loc('Q_poliqarp_seqcomplex', '** [orth=the][orth=old][orth=man]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000190
191 <p>Now you may see the benefit of the empty segment to search for words you don't know:</p>
192
Akronbee660d2018-02-14 15:57:02 +0100193 %= doc_query poliqarp => loc('Q_poliqarp_seqcomplexempty', '** [orth=the][][orth=man]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000194
Akronae24e162018-02-13 18:48:44 +0100195 <p>You are also able to mix segments and spans in sequences, for example to search for a word at the beginning of a sentence (which can be interpreted as the first word after the end of a sentence).</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000196
Akronbee660d2018-02-14 15:57:02 +0100197 %= doc_query poliqarp => loc('Q_poliqarp_seqspan', '** <base/s=s>[orth=The]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000198
199 <h4>Groups</h4>
200
201 ...
202
203 <h4>Alternation</h4>
204
Akronae24e162018-02-13 18:48:44 +0100205 <p>Alternations allow for searching alternative segments or sequences of segments, similar to the paradigmatic operator. You already have seen that you can search for a sequence with an alternative adjective in between by typing in:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000206
Akronbee660d2018-02-14 15:57:02 +0100207 %= doc_query poliqarp => loc('Q_poliqarp_seqor', '** the [orth=old | orth=young] man'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000208
Akronae24e162018-02-13 18:48:44 +0100209 <p>However, this formulation has problems in case you want to search for alternations of sequences rather than terms. In this case you can use syntagmatic alternations and groups:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000210
Akronbee660d2018-02-14 15:57:02 +0100211 %= doc_query poliqarp => loc('Q_poliqarp_seqorgroup1', '** the (young man | old woman)'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000212
Akronae24e162018-02-13 18:48:44 +0100213 <p>The pipe symbol works the same way as with the paradigmatic alternation, but supports sequences of different length as operands. The above query with an alternative adjective in a sequence can therefore be reformulated as:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000214
Akronbee660d2018-02-14 15:57:02 +0100215 %= doc_query poliqarp => loc('Q_poliqarp_seqorgroup2', '** the (old | young) man'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000216
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000217 <h4 id="syntagmatic-operators-repetitions">Repetition</h4>
Nils Diewalda31a5152015-04-17 21:05:23 +0000218
Akron3cfa26d2019-10-24 15:17:34 +0200219 <p>Repetitions in Poliqarp are realized as in <%= embedded_link_to 'doc', 'regular expressions', 'ql', 'regexp' %>, by giving quantifieres in curly brackets.</p>
Akronae24e162018-02-13 18:48:44 +0100220 <p>To search for a sequence of three occurrences of a defined string, you can formulate your query in any of the following ways - they will have the same results:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000221
Akronbee660d2018-02-14 15:57:02 +0100222 %= doc_query poliqarp => loc('Q_poliqarp_repmanual', '** the the the'), cutoff => 1
223 %= doc_query poliqarp => loc('Q_poliqarp_repsimple', '** the{3}'), cutoff => 1
224 %= doc_query poliqarp => loc('Q_poliqarp_repcomplex', '** [orth=the]{3}'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000225
Akronae24e162018-02-13 18:48:44 +0100226 <p>In difference to regular expressions, the repetition operation won't refer to the match but to the pattern given. So the following query will give you a sequence of three words with a defined substring - but the words don't have to be identical.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000227
Akronbee660d2018-02-14 15:57:02 +0100228 %= doc_query poliqarp => loc('Q_poliqarp_repre', '** "ru.*?"/i{3}'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000229
Akronae24e162018-02-13 18:48:44 +0100230 <p>The same is true for annotations. The following query will find a sequence of 3 to 4 adjectives in a defined context. The adjectives do not have to be identical though.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000231
Akronbee660d2018-02-14 15:57:02 +0100232 %= doc_query poliqarp => loc('Q_poliqarp_repanno', '** [base=the][tt/p=ADJA]{3,4}[tt/p=NOUN]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000233
Akronae24e162018-02-13 18:48:44 +0100234 <p>In addition to numbered quantities, it is also possible to pass repetition information as Kleene operators <code>?</code>, <code>*</code>, and <code>+</code>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000235
Akronae24e162018-02-13 18:48:44 +0100236 <p>To search for a sequence with an optional segment, you can search for:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000237
Akronbee660d2018-02-14 15:57:02 +0100238 %= doc_query poliqarp => loc('Q_poliqarp_seqopt1', '** [base=the][tt/pos=ADJA]?[base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000239
240 <p>This query is identical to the numbered quantification of:</p>
241
Akronbee660d2018-02-14 15:57:02 +0100242 %= doc_query poliqarp => loc('Q_poliqarp_seqopt2', '** [base=the][tt/pos=ADJA]{,1}[base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000243
Akronae24e162018-02-13 18:48:44 +0100244 <p>To search for the same sequences but with unlimited adjectives in between, you can use the Kleene Star:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000245
Akronbee660d2018-02-14 15:57:02 +0100246 %= doc_query poliqarp => loc('Q_poliqarp_seqstar', '** [base=the][tt/pos=ADJA]*[base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000247
248 <p>And to search for this sequence but with at least one adjective in between, you can use the Kleene Plus (all queries are identical):</p>
249
Akronbee660d2018-02-14 15:57:02 +0100250 %= doc_query poliqarp => loc('Q_poliqarp_seqplus1', '** [base=the][tt/pos=ADJA]+[base=Tree]'), cutoff => 1
251 %= doc_query poliqarp => loc('Q_poliqarp_seqplus2', '** [base=the][tt/pos=ADJA]{1,}[base=Tree]'), cutoff => 1
252 %= doc_query poliqarp => loc('Q_poliqarp_seqplus3', '** [base=the][tt/pos=ADJA][tt/pos=ADJA]*[base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000253
254 <blockquote class="warning">
Akronae24e162018-02-13 18:48:44 +0100255 <p>Repetition operators like <code>{,n}</code>, <code>?</code>, and <code>*</code> make segments or groups of segments optional. In case these queries are used separated and not as part of a sequence (and there are no mandatory segments in the query), you will be warned by the system that your query won't be treated as optional.</p>
256 <p>Keep in mind that optionality may be somehow <i>inherited</i>, for example an entire query becomes optional as soon as one segment of an alternation is optional.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000257 </blockquote>
258
Akron3cfa26d2019-10-24 15:17:34 +0200259 <p>Repetition can also be used to express distances between segments by using <%= embedded_link_to 'doc', 'empty segments', 'ql', 'poliqarp-plus#empty-segments' %>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000260
Akronbee660d2018-02-14 15:57:02 +0100261 %= doc_query poliqarp => loc('Q_poliqarp_seqdistance1', '** [base=the][][base=Tree]'), cutoff => 1
262 %= doc_query poliqarp => loc('Q_poliqarp_seqdistance2', '** [base=the][]{2}[base=Tree]'), cutoff => 1
263 %= doc_query poliqarp => loc('Q_poliqarp_seqdistance3', '** [base=the][]{2,}[base=Tree]'), cutoff => 1
264 %= doc_query poliqarp => loc('Q_poliqarp_seqdistance4', '** [base=the][]{,3}[base=Tree]'), cutoff => 1
265
Nils Diewalda31a5152015-04-17 21:05:23 +0000266 <p>Of course, Kleene operators can be used with empty segments as well.</p>
267
Akronbee660d2018-02-14 15:57:02 +0100268 %= doc_query poliqarp => loc('Q_poliqarp_seqdistanceopt', '** [base=the][]?[base=Tree]'), cutoff => 1
269 %= doc_query poliqarp => loc('Q_poliqarp_seqdistancestar', '** [base=the][]*[base=Tree]'), cutoff => 1
270 %= doc_query poliqarp => loc('Q_poliqarp_seqdistanceplus', '** [base=the][]+[base=Tree]'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000271
272 <h4>Position</h4>
273
Akron3cfa26d2019-10-24 15:17:34 +0200274 <p>Sequences as shown above can all be nested in further complex queries and treated as subqueries (see <%= embedded_link_to 'doc', 'class operators', 'ql', 'poliqarp-plus#class-operators' %> on how to later access these subqueries directly).</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000275 <p>Positional operators compare two matches of subqueries and will match, in case a certain condition regarding the position of both is true.</p>
276 <p>The <code>contains()</code> operation will match, when a second subquery matches inside the span of a first subquery.</p>
277
Akronbee660d2018-02-14 15:57:02 +0100278 %= doc_query poliqarp => loc('Q_poliqarp_poscontains', '** contains(<base/s=s>, [tt/p=KOUS])'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000279
280 <p>The <code>startsWith()</code> operation will match, when a second subquery matches at the beginning of the span of a first subquery.</p>
281
Akronbee660d2018-02-14 15:57:02 +0100282 %= doc_query poliqarp => loc('Q_poliqarp_posstartswith', '** startsWith(<base/s=s>, [tt/p=KOUS])'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000283
284 <p>The <code>endsWith()</code> operation will match, when a second subquery matches at the end of the span of a first subquery.</p>
285
Akronbee660d2018-02-14 15:57:02 +0100286 %= doc_query poliqarp => loc('Q_poliqarp_posendswith', '** endsWith(<base/s=s>, [opennlp/p=NN])'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000287
Akronae24e162018-02-13 18:48:44 +0100288 <p>The <code>matches()</code> operation will match, when a second subquery has the exact same span as a first subquery.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000289
Akronbee660d2018-02-14 15:57:02 +0100290 %= doc_query poliqarp => loc('Q_poliqarp_posmatches', '** matches(<base/s=s>,[tt/p=CARD][tt/p="N.*"])'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000291
292 <p>The <code>overlaps()</code> operation will match, when a second subquery has an overlapping span with the first subquery.</p>
293
Akronbee660d2018-02-14 15:57:02 +0100294 %= doc_query poliqarp => loc('Q_poliqarp_posoverlaps', '** overlaps([][tt/p=ADJA],{1:[tt/p=ADJA]}[])'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000295
296 <blockquote class="warning">
297 <p>Positional operators are still experimental and may change in certain aspects in the future (although the behaviour defined is intended to be stable). There is also known incorrect behaviour which will be corrected in future versions.</p>
Akronae24e162018-02-13 18:48:44 +0100298 <p>Optional operands in position operators have to be mandatory at the moment and will be reformulated to occur at least once.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000299 <p>This behaviour may change in the future.</p>
300 </blockquote>
301
302 <!--
303 <blockquote>
304 <p>The KorAP implementation of Poliqarp also supports the postfix <code>within</code> operator, that works similar to the <code>contains()</code> operator, but is not nestable.</p>
305 </blockquote>
306 -->
307
308</section>
309
Nils Diewaldfccfbcb2015-04-29 20:48:19 +0000310<section id="class-operators">
Nils Diewalda31a5152015-04-17 21:05:23 +0000311 <h3>Class Operators</h3>
312
Akronae24e162018-02-13 18:48:44 +0100313 <p>Classes are used to group submatches by surrounding curly brackets and a class number <code>{1:...}</code>. Classes can be used to refer to submatches in a query, similar to captures in regular expressions. In Poliqarp+ classes have multiple purposes, with highlighting being the most intuitive one:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000314
Akronbee660d2018-02-14 15:57:02 +0100315 %= doc_query poliqarp => loc('Q_poliqarp_classes', '** the {1:{2:[]} man}'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000316
Akronae24e162018-02-13 18:48:44 +0100317 <p>In KorAP classes can be defined from 1 to 128. In case a class number is missing, the class defaults to the class number 1: <code>{...}</code> is equal to <code>{1:...}</code>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000318
319 <h4>Match Modification</h4>
320
321 <p>Based on classes, matches may be modified. The <code>focus()</code> operator restricts the span of a match to the boundary of a certain class.</p>
322
Akronbee660d2018-02-14 15:57:02 +0100323 %= doc_query poliqarp => loc('Q_poliqarp_focus', '** focus(the {Tree})'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000324
Akronae24e162018-02-13 18:48:44 +0100325 <p>The query above will search for a sequence but the match will be limited to the second segment. You can think of the first segment in this query as a <i>positive look-behind zero-length assertion</i> in regular expressions.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000326
Akronae24e162018-02-13 18:48:44 +0100327 <p>But focus is way more useful if you are searching for matches without knowing the surface form. For example, to find all terms between defined words you can search:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000328
Akronbee660d2018-02-14 15:57:02 +0100329 %= doc_query poliqarp => loc('Q_poliqarp_focusempty', '** focus(the {[]} Tree)'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000330
Akronae24e162018-02-13 18:48:44 +0100331 <p>Or you may want to search for all words following a known sequence immediately:</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000332
Akronbee660d2018-02-14 15:57:02 +0100333 %= doc_query poliqarp => loc('Q_poliqarp_focusextension', '** focus(the old and {[]})'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000334
Akronbee660d2018-02-14 15:57:02 +0100335 <p><code>focus()</code> is especially useful if you are searching for matches in certain areas,
336 for example in quotes using positional operators.
337 While not being interested in the whole quote as a match, you can focus on what's really relevant to you.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000338
Akronbee660d2018-02-14 15:57:02 +0100339 %= doc_query poliqarp => loc('Q_poliqarp_focusrelevance', '** focus(contains(she []{,10} said, {Tree}))'), cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +0000340
Akronae24e162018-02-13 18:48:44 +0100341 <p>In case a class number is missing, the focus operator defaults to the class number 1: <code>focus(...)</code> is equal to <code>focus(1: ...)</code>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +0000342
343 <blockquote class="warning">
344 <p>As numbers in curly brackets can be ambiguous in certain circumstances, for example <code>[]{3}</code> can be read as either &quot;any word repeated three times&quot; or &quot;any word followed by the number 3 highlighted as class number 1&quot;, numbers should always be expressed as <code>[orth=3]</code> for the latter case.</p>
345 </blockquote>
346</section>