blob: e7fba993f5e3057d906ebba92606bf65c988cc88 [file] [log] [blame]
Akron84b91992019-07-16 11:35:49 +02001% layout 'main', title => 'KorAP: COSMAS II';
Nils Diewalda31a5152015-04-17 21:05:23 +00002
Akron84b91992019-07-16 11:35:49 +02003<h2 id="tutorial-top">COSMAS II</h2>
Nils Diewalda31a5152015-04-17 21:05:23 +00004
Akron84b91992019-07-16 11:35:49 +02005<p>The following documentation introduces some features provided by our version of the COSMAS II Query Language. For more information, please visit the <%= doc_ext_link_to 'online help of COSMAS II', "http://www.ids-mannheim.de/cosmas2/web-app/hilfe/suchanfrage/eingabe-zeile/syntax/allgemein.html" %>.</p>
6
7<section id="queryterms">
8 <h3>Query Terms</h3>
9
10 <p>A query term in COSMAS II can be a word, a punctuation symbol, or a number.</p>
11
12 %= doc_query cosmas2 => 'Baum'
13 %= doc_query cosmas2 => '4000'
14
15 <blockquote class="missing">
16 <p>Currently, punctuations are not supported by KorAP.</p>
17 </blockquote>
18
19 <h4>Placeholder Operators</h4>
20
21 <p>In addition query terms can contain multiple placeholders like <code>?</code> (for any symbol), <code>+</code> (for any or no symbol), or <code>*</code> (for any sequence of any or no symbols).</p>
22 <%= doc_query cosmas2 => 'Bau?m' %>
23 <%= doc_query cosmas2 => 'Bau+m' %>
24 <%= doc_query cosmas2 => 'Bau*m' %>
25
26%# TODO:
27%# <p>To escape placeholder symbols (i.e. to prevent these symbols from being interpreted as placeholders), they need to be prepended by a <code>\</code> symbol.</p>
28%# <%= doc_query cosmas2 => 'Student\*in' %>
29%# <p>To escape the backslash symbol, another backslash is required (<code>\\</code>).</p>
30
31 <h4>Lemma Operator</h4>
32
33 <p>Instead of searching for the surface form of a word, a lemma (as annotated by the <%= doc_link_to 'default foundry', 'data', 'annotation' %>) can be requested by prepending the term with the <code>&amp;</code> operator. The form of the lemma is dependent on the annotation.</p>
34 <%= doc_query cosmas2 => '&laufen' %>
35
36 <h4>Case Insensitivity Operator</h4>
37
38 <p>By prepending the term with a <code>$</code> symbol, the search is case insensitive.</p>
39 <%= doc_query cosmas2 => '$Lauf' %>
40
41 <h4>Regular Expression Operator</h4>
42
43 <p>By using the <code>#REG(...)</code> operator, query terms can be formulated using <%= doc_link_to 'regular expressions', 'ql', 'regexp' %>.</p>
44
45
46 <blockquote class="bug">
47 <p>Regular expressions in COSMAS II are not yet properly implemented in KorAP. If you want to use regular expressions, please refer to <%= doc_link_to 'Poliqarp', 'ql', 'poliqarp-plus#regexp' %>.</p>
48 </blockquote>
49
50</section>
51
52<section id="logical-operators">
53 <h3>Logical Operators</h3>
54
55 <p>Query terms can be combined in logical operations, using the operators <code>and</code>, <code>or</code>, and <code>not</code>. The german forms are supported as well: <code>und</code>, <code>oder</code> and <code>nicht</code>.</p>
56 <p>These operators work on the text level, so the following query returns matches for all occurrences where both terms occur anywhere in the same text.</p>
57 <%= doc_query cosmas2 => 'anscheinend und scheinbar' %>
58
59 <p>The following query returns matches for all occurrences where at least one of the terms occur anywhere in the text.</p>
60 <%= doc_query cosmas2 => 'anscheinend oder scheinbar' %>
61
62 <p>The following query returns matches for all occurrences of the first term, where the term following the <code>nicht</code> operator does not occur anywhere in the same text.</p>
63 <%= doc_query cosmas2 => 'Kegel nicht Kind' %>
64
65 <p>To escape terms for logical operators (i.e. to prevent these terms from being interpreted as logical operators), they need to be surrounded by quotations.</p>
66 <%= doc_query cosmas2 => 'Mann "und" Maus' %>
67
68</section>
69
70
71<section id="distance-operators">
72 <h3>Distance Operators</h3>
73
74 <p>Distance operators allow you to search for two operands (search terms or complex search operations) that occur or don't occur at a certain distance from each other in a text. When the two operands should occur together (the operator is prepended by a <code>/</code> symbol), both operands are in the result set. When they shouldn't occur together (the operator is prepended by a <code>%</code> symbol), only the first operand is in the result set.</p>
75
76 <p>Distance operators accept a prefixing direction parameter.
77 By prepending the operator with a <code>+</code> symbol (e.g. in <code>/+s0</code>), the second operand is required to occur or not occur after the first operand.
78 By prepending the operator with a <code>-</code> symbol (e.g. in <code>/-s0</code>), the second operand is required to occur or not occur in front of the first operand.
79 In case the direction parameter is omitted, the direction of both operands is arbitrary.</p>
80
81 <p>Distance operators accept the definition of a distance interval by appending numerical values. If only a single numerical value is given (e.g. in <code>/+s4</code>), the defined distance is considered a maximum distance. So both operands can or can not occur in a distance equal or lower the given value. If two numerical values are given separated by the <code>:</code> symbol (e.g. in <code>/+s4:2</code>), they define an interval, in which the distance is valid.</p>
82
83%# <blockquote class="warning">
84%# <p>Currently, intervals are interpreted as MIN:MAX only, while COSMAS 2 defines intervals as being MAX:MIN, while taking the smaller number as being the minimum value of the interval and the greater number as being the maximum value of the interval. <%= doc_ext_link_to 'KorAP will adopt the behaviour of COSMAS II in the near future', "https://github.com/KorAP/Koral/issues/67" %>.</p>
85%# </blockquote>
86
87 <p>Distance operators rely on the <%= doc_link_to 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
88
89 <h4>Word Distance Operator</h4>
90
91 <p>The word distance operator <code>w</code> defines how many words are allowed or are not allowed in-between two search operands.</p>
92
93 <p>Search for two operands with up to 4 words in-between in arbitrary order:</p>
94 %= doc_query cosmas2 => 'Gegenwart /w4 Zukunft'
95
96 <p>Search for two operands with 3 to 4 words in-between with the first operand preceeding the second one:</p>
97 %= doc_query cosmas2 => 'Gegenwart /+w4:3 Zukunft'
98
99 <p>Search for two consecutive operands in the given order:</p>
100 %= doc_query cosmas2 => 'Gegenwart /+w1:1 Zukunft'
101
102 <p>Search for a first operand that is neither preceded nor suceeded by a second operand:</p>
103 %= doc_query cosmas2 => 'Gegenwart %w1 die'
104
105 <h4>Sentence Distance Operator</h4>
106
107 <p>The sentence distance operator <code>s</code> defines how many sentences are allowed or are not allowed in-between two search operands.</p>
108 <p>The sentence distance relies on the <%= doc_link_to 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
109
110 <p>Search for two operands occuring in the same or a following sentence in arbitrary order:</p>
111 %= doc_query cosmas2 => 'offen /s1 Geschäft'
112
113 <p>Search for two operands occuring in the same sentence with the first operand preceeding the second one:</p>
114 %= doc_query cosmas2 => 'offen /+s0 Geschäft'
115
116 <p>Search for a first operand that does not occur with a second operand in the same sentence:</p>
117 %= doc_query cosmas2 => 'Gegenwart %s0 Zukunft'
118
119 <h4>Paragraph Distance Operator</h4>
120
121 <p>The paragraph distance operator <code>p</code> defines how many paragraphs are allowed or are not allowed in-between two search operands.</p>
122 <p>The paragraph distance relies on the <%= doc_link_to 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
123
124 <p>Search for two operands occuring in the same or a following paragraph in arbitrary order:</p>
125 %= doc_query cosmas2 => 'offen /p1 Geschäft'
126
127 <p>Search for two operands occuring in the same paragraph with the first operand preceeding the second one:</p>
128 %= doc_query cosmas2 => 'offen /+p0 Geschäft'
129
130 <p>Search for a first operand that does not occur with a second operand in the same paragraph:</p>
131 %= doc_query cosmas2 => 'Gegenwart %p0 Zukunft'
132
133 <blockquote class="warning">
134 <p>The KWIC result of including paragraph distance queries will likely exceed the supported maximum length of matches in KorAP and will therefore be cut.</p>
135 </blockquote>
136
137 <h4>Multiple Distance Operators</h4>
138
139 %= doc_uc
140
141 <h4>Nesting of multiple Distance Operations</h4>
142
143 <p>In case a query contains multiple distance operators, they need to be nested in parentheses.</p>
144 <%= doc_query cosmas2 => '(Tag /+w2 offenen) /+w1 Tür' %>
145
146</section>
147
148<section id="annotation-operators">
149 <h3>Annotation Operators</h3>
150 %= doc_uc
151 %# MORPH and ELEM
152</section>
153
154<section id="combination-operators">
155 <h3>Combination Operators</h3>
156 %= doc_uc
157 %# IN and OV
158</section>
159
160<section id="area-operators">
161 <h3>Area Operators</h3>
162 %= doc_uc
163 %# LINKS, RECHTS, INKLUSIVE, EXKLUSIVE, BED
164</section>