blob: d15a73762e4a53e09b741acb205cf1c166954ff0 [file] [log] [blame]
Nils Diewald7cad8402014-07-08 17:06:56 +00001% content main => begin
2
3<h2>KorAP-Tutorial: Poliqarp+</h2>
4
Nils Diewald4e9fbcb2014-07-15 11:45:09 +00005<p><%= korap_tut_link_to 'Back to Index', '/tutorial' %></p>
6
7<p>The following tutorial introduces all features provided by our version of the Poliqarp Query Language and some KorAP specific extensions.</p>
8
9<section id="tut-segments">
Nils Diewald7cad8402014-07-08 17:06:56 +000010<h3>Simple Segments</h3>
11
12<p>The atomic elements of Poliqarp queries are segments. Most of the time segments represent words and can be simply queried:</p>
13%# footnote: In the polish national corpus, Poliqarp can join multiple segments when identifying a single word.
14
15%= korap_tut_query poliqarp => 'Baum'
16
17<p>Sequences of simple segments are expressed using a space delimiter:</p>
18
19%= korap_tut_query poliqarp => 'der Baum'
20
21<p>Simple segments always refer to the surface form of a word. To search for surface forms without case sensitivity, you can use the <code>/i</code> flag.</p>
22
23%= korap_tut_query poliqarp => 'laufen/i'
24
25<p>The query above will find all occurrences of <code>laufen</code> irrespective of the capitalization of letters, so <code>wir laufen</code> will be find as well as <code>das Laufen</code> and even <code>&quot;GEH LAUFEN!&quot;</code>.
26</section>
27
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000028<section id="tut-regexp">
Nils Diewald7cad8402014-07-08 17:06:56 +000029 <h3>Regular Expressions</h3>
30
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000031<p>Segments can also be queried using <%= korap_tut_link_to 'regular expressions', '/tutorial/regular-expressions' %> - by surrounding the segment with double quotes.</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000032
33%= korap_tut_query poliqarp => '"l(au|ie)fen"'
34
35<p>Regular expression segments will always match the whole segment, meaning the above query will find words starting with <code>l</code> and ending with <code>n</code>. To support subqueries, you can use the <code>/x</code> flag.
36
37%= korap_tut_query poliqarp => '"l(au|ie)fen"/x', cutoff => 1
38
39<p>The <code>/x</code> will search for all segments that contain a sequence of characters the regular expression matches. That means the above query is equivalent to:</p>
40
Nils Diewaldbfcf0902014-07-15 13:36:47 +000041%= korap_tut_query poliqarp => '".*?l(au|ie)fen.*?"', cutoff => 1
Nils Diewald7cad8402014-07-08 17:06:56 +000042
Nils Diewaldca69efa2014-07-15 15:21:58 +000043<blockquote class="exception">
44 <p>There is a minor serialization bug currently, not accepting non-greedy quantifiers at the moment, so this query may fail.</p>
45</blockquote>
46
Nils Diewald7cad8402014-07-08 17:06:56 +000047<p>The <code>/x</code> flag can also be used in conjuntion with strict expressions to search for substrings:</p>
48
49%= korap_tut_query poliqarp => 'trenn/xi', cutoff => 1
50
51<p>The above query will find all occurrences of segments including the string <code>trenn</code> case insensitive, like &quot;Trennung&quot;, &quot;unzertrennlich&quot;, or &quot;Wettrennen&quot;.</p>
52
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000053<blockquote class="warning">
54 <p>Beware: These kinds of queries (with prepended <code>.*</code> expressions) are extremely slow!</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000055</blockquote>
56
57<p>You can again apply the <code>/i</code> flag to search case insensitive.</p>
58
59%= korap_tut_query poliqarp => '"l(au|ie)fen"/xi', cutoff => 1
60
61</section>
62
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000063<section id="tut-complex">
Nils Diewald7cad8402014-07-08 17:06:56 +000064 <h3>Complex Segments</h3>
65
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000066<p>Complex segments are expressed in square brackets and contain additional information on the resource of the term under scrutiny by providing key/value pairs, separated by a <code>=</code> symbol.</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000067
68<p>The KorAP implementation of Poliqarp provides three special segment keys: <code>orth</code> for surface forms, <code>base</code> for lemmata, and <code>pos</code> for Part-of-Speech. The following complex query finds all surface forms of <code>Baum</code>.</p>
69
70%# There are more special keys in Poliqarp, but KorAP doesn't provide them.
71
72%= korap_tut_query poliqarp => '[orth=Baum]'
73
74<p>The query is thus equivalent to:</p>
75
76%= korap_tut_query poliqarp => 'Baum'
77
78<p>Complex segments expect simple expressions as a values, meaning that the following expression is valid as well:</p>
79
80%= korap_tut_query poliqarp => '[orth="l(au|ie)fen"/xi]', cutoff => 1
81
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000082<p>Another special key is <code>base</code>, refering to the lemma annotation of the <%= korap_tut_link_to 'default foundry', '/tutorial/foundries' %>. The following query finds all occurrences of segments annotated as the lemma <code>Baum</code> by the default foundry.</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000083
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000084%= korap_tut_query poliqarp => '[base=baum]'
Nils Diewald7cad8402014-07-08 17:06:56 +000085
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000086<p>The third special key is <code>pos</code>, refering to the part-of-speech annotation of the <%= korap_tut_link_to 'default foundry', '/tutorial/foundries' %>. The following query finds all attributive adjectives:</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000087
88%= korap_tut_query poliqarp => '[pos=ADJA]'
89
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000090<p>Complex segments requesting further token annotations can have keys following the <code>foundry/layer</code> notation. For example to find all occurrences of plural words in the mate foundry, you can search using the following query:</p>
Nils Diewald7cad8402014-07-08 17:06:56 +000091
92%= korap_tut_query poliqarp => '[mate/m=number:pl]'
93
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000094<blockquote class="warning">
Nils Diewaldca69efa2014-07-15 15:21:58 +000095 <p><strong>The following queries in the tutorial are not yet tested and may not work.</strong></p>
Nils Diewald4e9fbcb2014-07-15 11:45:09 +000096</blockquote>
97
Nils Diewald7cad8402014-07-08 17:06:56 +000098</section>
99
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000100<section id="tut-spans">
Nils Diewald7cad8402014-07-08 17:06:56 +0000101<h3>Span Segments</h3>
102
103%= korap_tut_query poliqarp => '<s>'
104
105</section>
106
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000107<section id="tut-paradigmatic-operators">
Nils Diewald7cad8402014-07-08 17:06:56 +0000108<h3>Paradigmatic Operators</h3>
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000109%= korap_tut_query poliqarp => '[orth=laufe/i & base=lauf]'
Nils Diewald7cad8402014-07-08 17:06:56 +0000110
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000111%= korap_tut_query poliqarp => '[orth=laufe/i & base!=lauf]'
112
113<blockquote class="warning">
114 <p>There is a bug in the Lucene backend regarding the negation of matches</p>
115</blockquote>
Nils Diewald7cad8402014-07-08 17:06:56 +0000116
117<p>The following query is equivalent</p>
118
119%= korap_tut_query poliqarp => '[orth=bäume & !base=bäumen]'
120
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000121<p>Some more ...</p>
122
Nils Diewald7cad8402014-07-08 17:06:56 +0000123%= korap_tut_query poliqarp => '[base=laufen | base=gehen]'
124
125%= korap_tut_query poliqarp => '[(base=laufen | base=gehen) & tt/pos=VVFIN]'
126
127%= korap_tut_query poliqarp => '[]'
128
129</section>
130
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000131<section id="tut-syntagmatic-operators">
Nils Diewald7cad8402014-07-08 17:06:56 +0000132<h3>Syntagmatic Operators</h3>
133
134<h4>Sequences</h4>
135
136<h4>Repetition</h4>
137%= korap_tut_query poliqarp => '[base=der][][base=Baum]'
138
139%= korap_tut_query poliqarp => '[base=der][]{2}[base=Baum]'
140%= korap_tut_query poliqarp => '[base=der][]{2,3}[base=Baum]'
141%= korap_tut_query poliqarp => '[base=der][]{2,}[base=Baum]'
142%= korap_tut_query poliqarp => '[base=der][]{,3}[base=Baum]'
143
144%= korap_tut_query poliqarp => '[base=der][tt/pos=ADJA]?[base=Baum]'
145%= korap_tut_query poliqarp => '[base=der][tt/pos=ADJA]*[base=Baum]'
146%= korap_tut_query poliqarp => '[base=der][tt/pos=ADJA]+[base=Baum]'
147
148<h4>Alternation</h4>
149
150<h4>Position Operators</h4>
151
Nils Diewald13bad6a2014-07-18 16:44:51 +0000152%#= korap_tut_query poliqarp => 'matches(<s>,[])'
153%# matches(<s>,[cnx/p=INTERJ]{2})
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000154<p>contains()</p>
155<p>startsWith()</p>
156<p>endsWith()</p>
157<p>overlaps()</p>
Nils Diewald7cad8402014-07-08 17:06:56 +0000158
159<blockquote>
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000160 <p>The KorAP implementation of Poliqarp also support the postfix <code>within</code> operator, that works similar to the <code>contains()</code>, but is not nestable.</p>
Nils Diewald7cad8402014-07-08 17:06:56 +0000161</blockquote>
162
163<h4>Class Operators</h4>
164
Nils Diewald4e9fbcb2014-07-15 11:45:09 +0000165<p>{}</p>
166<p>focus()</p>
167<p>...</p>
Nils Diewald7cad8402014-07-08 17:06:56 +0000168
169</section>
170
171% end