blob: df40bbb5804a21af720fb1d4f38674b353e1fc08 [file] [log] [blame]
Akron84b91992019-07-16 11:35:49 +02001% layout 'main', title => 'KorAP: COSMAS II';
Nils Diewalda31a5152015-04-17 21:05:23 +00002
Akron9490e3b2019-10-17 12:26:29 +02003%= page_title
Nils Diewalda31a5152015-04-17 21:05:23 +00004
Akron9490e3b2019-10-17 12:26:29 +02005<p>The following documentation introduces some features provided by our version of the COSMAS II Query Language. For more information, please visit the <%= ext_link_to 'online help of COSMAS II', "http://www.ids-mannheim.de/cosmas2/web-app/hilfe/suchanfrage/eingabe-zeile/syntax/allgemein.html" %>.</p>
Akron84b91992019-07-16 11:35:49 +02006
7<section id="queryterms">
8 <h3>Query Terms</h3>
9
10 <p>A query term in COSMAS II can be a word, a punctuation symbol, or a number.</p>
11
12 %= doc_query cosmas2 => 'Baum'
13 %= doc_query cosmas2 => '4000'
14
15 <blockquote class="missing">
16 <p>Currently, punctuations are not supported by KorAP.</p>
17 </blockquote>
18
19 <h4>Placeholder Operators</h4>
20
21 <p>In addition query terms can contain multiple placeholders like <code>?</code> (for any symbol), <code>+</code> (for any or no symbol), or <code>*</code> (for any sequence of any or no symbols).</p>
22 <%= doc_query cosmas2 => 'Bau?m' %>
23 <%= doc_query cosmas2 => 'Bau+m' %>
24 <%= doc_query cosmas2 => 'Bau*m' %>
25
26%# TODO:
27%# <p>To escape placeholder symbols (i.e. to prevent these symbols from being interpreted as placeholders), they need to be prepended by a <code>\</code> symbol.</p>
28%# <%= doc_query cosmas2 => 'Student\*in' %>
29%# <p>To escape the backslash symbol, another backslash is required (<code>\\</code>).</p>
30
31 <h4>Lemma Operator</h4>
32
Akron3cfa26d2019-10-24 15:17:34 +020033 <p>Instead of searching for the surface form of a word, a lemma (as annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>) can be requested by prepending the term with the <code>&amp;</code> operator. The form of the lemma is dependent on the annotation.</p>
Akron84b91992019-07-16 11:35:49 +020034 <%= doc_query cosmas2 => '&laufen' %>
35
36 <h4>Case Insensitivity Operator</h4>
37
38 <p>By prepending the term with a <code>$</code> symbol, the search is case insensitive.</p>
39 <%= doc_query cosmas2 => '$Lauf' %>
40
41 <h4>Regular Expression Operator</h4>
42
Akron3cfa26d2019-10-24 15:17:34 +020043 <p>By using the <code>#REG(...)</code> operator, query terms can be formulated using <%= embedded_link_to 'doc', 'regular expressions', 'ql', 'regexp' %>.</p>
Akron84b91992019-07-16 11:35:49 +020044
45
46 <blockquote class="bug">
Akron3cfa26d2019-10-24 15:17:34 +020047 <p>Regular expressions in COSMAS II are not yet properly implemented in KorAP. If you want to use regular expressions, please refer to <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus#regexp' %>.</p>
Akron84b91992019-07-16 11:35:49 +020048 </blockquote>
49
50</section>
51
52<section id="logical-operators">
53 <h3>Logical Operators</h3>
54
55 <p>Query terms can be combined in logical operations, using the operators <code>and</code>, <code>or</code>, and <code>not</code>. The german forms are supported as well: <code>und</code>, <code>oder</code> and <code>nicht</code>.</p>
56 <p>These operators work on the text level, so the following query returns matches for all occurrences where both terms occur anywhere in the same text.</p>
57 <%= doc_query cosmas2 => 'anscheinend und scheinbar' %>
58
59 <p>The following query returns matches for all occurrences where at least one of the terms occur anywhere in the text.</p>
60 <%= doc_query cosmas2 => 'anscheinend oder scheinbar' %>
61
62 <p>The following query returns matches for all occurrences of the first term, where the term following the <code>nicht</code> operator does not occur anywhere in the same text.</p>
63 <%= doc_query cosmas2 => 'Kegel nicht Kind' %>
64
65 <p>To escape terms for logical operators (i.e. to prevent these terms from being interpreted as logical operators), they need to be surrounded by quotations.</p>
66 <%= doc_query cosmas2 => 'Mann "und" Maus' %>
67
68</section>
69
70
71<section id="distance-operators">
72 <h3>Distance Operators</h3>
73
74 <p>Distance operators allow you to search for two operands (search terms or complex search operations) that occur or don't occur at a certain distance from each other in a text. When the two operands should occur together (the operator is prepended by a <code>/</code> symbol), both operands are in the result set. When they shouldn't occur together (the operator is prepended by a <code>%</code> symbol), only the first operand is in the result set.</p>
75
76 <p>Distance operators accept a prefixing direction parameter.
77 By prepending the operator with a <code>+</code> symbol (e.g. in <code>/+s0</code>), the second operand is required to occur or not occur after the first operand.
78 By prepending the operator with a <code>-</code> symbol (e.g. in <code>/-s0</code>), the second operand is required to occur or not occur in front of the first operand.
79 In case the direction parameter is omitted, the direction of both operands is arbitrary.</p>
80
81 <p>Distance operators accept the definition of a distance interval by appending numerical values. If only a single numerical value is given (e.g. in <code>/+s4</code>), the defined distance is considered a maximum distance. So both operands can or can not occur in a distance equal or lower the given value. If two numerical values are given separated by the <code>:</code> symbol (e.g. in <code>/+s4:2</code>), they define an interval, in which the distance is valid.</p>
82
83%# <blockquote class="warning">
Akron9490e3b2019-10-17 12:26:29 +020084%# <p>Currently, intervals are interpreted as MIN:MAX only, while COSMAS 2 defines intervals as being MAX:MIN, while taking the smaller number as being the minimum value of the interval and the greater number as being the maximum value of the interval. <%= ext_link_to 'KorAP will adopt the behaviour of COSMAS II in the near future', "https://github.com/KorAP/Koral/issues/67" %>.</p>
Akron84b91992019-07-16 11:35:49 +020085%# </blockquote>
86
Akron3cfa26d2019-10-24 15:17:34 +020087 <p>Distance operators rely on the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +020088
hebasta8914c482020-01-07 16:22:51 +010089 <p>In case a query contains numerous distance operators, they need to be nested in parentheses:</p>
90 %= doc_query cosmas2 => '(Tag /+w2 offenen) /+w1 Tür'
91
Akron84b91992019-07-16 11:35:49 +020092 <h4>Word Distance Operator</h4>
93
94 <p>The word distance operator <code>w</code> defines how many words are allowed or are not allowed in-between two search operands.</p>
95
96 <p>Search for two operands with up to 4 words in-between in arbitrary order:</p>
97 %= doc_query cosmas2 => 'Gegenwart /w4 Zukunft'
98
99 <p>Search for two operands with 3 to 4 words in-between with the first operand preceeding the second one:</p>
100 %= doc_query cosmas2 => 'Gegenwart /+w4:3 Zukunft'
101
102 <p>Search for two consecutive operands in the given order:</p>
103 %= doc_query cosmas2 => 'Gegenwart /+w1:1 Zukunft'
104
105 <p>Search for a first operand that is neither preceded nor suceeded by a second operand:</p>
106 %= doc_query cosmas2 => 'Gegenwart %w1 die'
107
108 <h4>Sentence Distance Operator</h4>
109
110 <p>The sentence distance operator <code>s</code> defines how many sentences are allowed or are not allowed in-between two search operands.</p>
Akron3cfa26d2019-10-24 15:17:34 +0200111 <p>The sentence distance relies on the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +0200112
113 <p>Search for two operands occuring in the same or a following sentence in arbitrary order:</p>
114 %= doc_query cosmas2 => 'offen /s1 Geschäft'
115
116 <p>Search for two operands occuring in the same sentence with the first operand preceeding the second one:</p>
117 %= doc_query cosmas2 => 'offen /+s0 Geschäft'
118
119 <p>Search for a first operand that does not occur with a second operand in the same sentence:</p>
120 %= doc_query cosmas2 => 'Gegenwart %s0 Zukunft'
121
122 <h4>Paragraph Distance Operator</h4>
123
124 <p>The paragraph distance operator <code>p</code> defines how many paragraphs are allowed or are not allowed in-between two search operands.</p>
Akron3cfa26d2019-10-24 15:17:34 +0200125 <p>The paragraph distance relies on the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +0200126
127 <p>Search for two operands occuring in the same or a following paragraph in arbitrary order:</p>
128 %= doc_query cosmas2 => 'offen /p1 Geschäft'
129
130 <p>Search for two operands occuring in the same paragraph with the first operand preceeding the second one:</p>
131 %= doc_query cosmas2 => 'offen /+p0 Geschäft'
132
133 <p>Search for a first operand that does not occur with a second operand in the same paragraph:</p>
134 %= doc_query cosmas2 => 'Gegenwart %p0 Zukunft'
135
136 <blockquote class="warning">
137 <p>The KWIC result of including paragraph distance queries will likely exceed the supported maximum length of matches in KorAP and will therefore be cut.</p>
138 </blockquote>
139
hebasta8914c482020-01-07 16:22:51 +0100140 <h4>Multi-Distance Operators</h4>
141
142 <p>Distance operators can be combined to further limit the result set. The distance conditions are separated by comma (without spaces).</p>
143 <p>Search for &quot;ein Fest&quot; in a sentence:</p>
144 %= doc_query cosmas2 => 'ein /+w1,s0 Fest'
145
146 <h4>Omitted Distance Operator</h4>
147 <p>If the distance operator is omitted between two operands, KorAP is searching for a <code>/+w1</code> distance:</p>
148 %= doc_query cosmas2 => 'runder Tisch'
Akron84b91992019-07-16 11:35:49 +0200149
150</section>
151
152<section id="annotation-operators">
153 <h3>Annotation Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200154 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200155 %# MORPH and ELEM
156</section>
157
158<section id="combination-operators">
159 <h3>Combination Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200160 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200161 %# IN and OV
162</section>
163
164<section id="area-operators">
165 <h3>Area Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200166 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200167 %# LINKS, RECHTS, INKLUSIVE, EXKLUSIVE, BED
168</section>