blob: a0453c249793dd016a17b6f3a4786db447830bac [file] [log] [blame]
Akron84b91992019-07-16 11:35:49 +02001% layout 'main', title => 'KorAP: COSMAS II';
Nils Diewalda31a5152015-04-17 21:05:23 +00002
Akron9490e3b2019-10-17 12:26:29 +02003%= page_title
Nils Diewalda31a5152015-04-17 21:05:23 +00004
Akron9490e3b2019-10-17 12:26:29 +02005<p>The following documentation introduces some features provided by our version of the COSMAS II Query Language. For more information, please visit the <%= ext_link_to 'online help of COSMAS II', "http://www.ids-mannheim.de/cosmas2/web-app/hilfe/suchanfrage/eingabe-zeile/syntax/allgemein.html" %>.</p>
Akron84b91992019-07-16 11:35:49 +02006
7<section id="queryterms">
8 <h3>Query Terms</h3>
9
10 <p>A query term in COSMAS II can be a word, a punctuation symbol, or a number.</p>
11
12 %= doc_query cosmas2 => 'Baum'
13 %= doc_query cosmas2 => '4000'
14
15 <blockquote class="missing">
16 <p>Currently, punctuations are not supported by KorAP.</p>
17 </blockquote>
18
19 <h4>Placeholder Operators</h4>
20
21 <p>In addition query terms can contain multiple placeholders like <code>?</code> (for any symbol), <code>+</code> (for any or no symbol), or <code>*</code> (for any sequence of any or no symbols).</p>
22 <%= doc_query cosmas2 => 'Bau?m' %>
23 <%= doc_query cosmas2 => 'Bau+m' %>
24 <%= doc_query cosmas2 => 'Bau*m' %>
25
26%# TODO:
27%# <p>To escape placeholder symbols (i.e. to prevent these symbols from being interpreted as placeholders), they need to be prepended by a <code>\</code> symbol.</p>
28%# <%= doc_query cosmas2 => 'Student\*in' %>
29%# <p>To escape the backslash symbol, another backslash is required (<code>\\</code>).</p>
30
31 <h4>Lemma Operator</h4>
32
Akron3cfa26d2019-10-24 15:17:34 +020033 <p>Instead of searching for the surface form of a word, a lemma (as annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>) can be requested by prepending the term with the <code>&amp;</code> operator. The form of the lemma is dependent on the annotation.</p>
Akron84b91992019-07-16 11:35:49 +020034 <%= doc_query cosmas2 => '&laufen' %>
35
36 <h4>Case Insensitivity Operator</h4>
37
38 <p>By prepending the term with a <code>$</code> symbol, the search is case insensitive.</p>
39 <%= doc_query cosmas2 => '$Lauf' %>
40
41 <h4>Regular Expression Operator</h4>
42
Akron3cfa26d2019-10-24 15:17:34 +020043 <p>By using the <code>#REG(...)</code> operator, query terms can be formulated using <%= embedded_link_to 'doc', 'regular expressions', 'ql', 'regexp' %>.</p>
Akron84b91992019-07-16 11:35:49 +020044
Helgeb4c098c2022-10-04 16:26:38 +020045 <%= doc_query cosmas2 => '#REG(Archi.*ung)' %>
46
47 <blockquote class="missing">
Akron3cfa26d2019-10-24 15:17:34 +020048 <p>Regular expressions in COSMAS II are not yet properly implemented in KorAP. If you want to use regular expressions, please refer to <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus#regexp' %>.</p>
Akron84b91992019-07-16 11:35:49 +020049 </blockquote>
50
51</section>
52
53<section id="logical-operators">
54 <h3>Logical Operators</h3>
55
56 <p>Query terms can be combined in logical operations, using the operators <code>and</code>, <code>or</code>, and <code>not</code>. The german forms are supported as well: <code>und</code>, <code>oder</code> and <code>nicht</code>.</p>
57 <p>These operators work on the text level, so the following query returns matches for all occurrences where both terms occur anywhere in the same text.</p>
58 <%= doc_query cosmas2 => 'anscheinend und scheinbar' %>
59
60 <p>The following query returns matches for all occurrences where at least one of the terms occur anywhere in the text.</p>
61 <%= doc_query cosmas2 => 'anscheinend oder scheinbar' %>
62
63 <p>The following query returns matches for all occurrences of the first term, where the term following the <code>nicht</code> operator does not occur anywhere in the same text.</p>
64 <%= doc_query cosmas2 => 'Kegel nicht Kind' %>
65
66 <p>To escape terms for logical operators (i.e. to prevent these terms from being interpreted as logical operators), they need to be surrounded by quotations.</p>
67 <%= doc_query cosmas2 => 'Mann "und" Maus' %>
68
69</section>
70
71
72<section id="distance-operators">
73 <h3>Distance Operators</h3>
74
Helgeb4c098c2022-10-04 16:26:38 +020075 <p>Distance operators allow you to search for two operands (search terms or complex search operations) that occur or don&apos;t occur at a certain distance from each other in a text. When the two operands should occur together (the operator is prepended by a <code>/</code> symbol), both operands are in the result set. When they shouldn&apos;t occur together (the operator is prepended by a <code>%</code> symbol), only the first operand is in the result set.</p>
Akron84b91992019-07-16 11:35:49 +020076
Helgeb4c098c2022-10-04 16:26:38 +020077 <p>Distance operators accept an additional direction parameter.
Akron84b91992019-07-16 11:35:49 +020078 By prepending the operator with a <code>+</code> symbol (e.g. in <code>/+s0</code>), the second operand is required to occur or not occur after the first operand.
79 By prepending the operator with a <code>-</code> symbol (e.g. in <code>/-s0</code>), the second operand is required to occur or not occur in front of the first operand.
80 In case the direction parameter is omitted, the direction of both operands is arbitrary.</p>
81
82 <p>Distance operators accept the definition of a distance interval by appending numerical values. If only a single numerical value is given (e.g. in <code>/+s4</code>), the defined distance is considered a maximum distance. So both operands can or can not occur in a distance equal or lower the given value. If two numerical values are given separated by the <code>:</code> symbol (e.g. in <code>/+s4:2</code>), they define an interval, in which the distance is valid.</p>
Akron84b91992019-07-16 11:35:49 +020083
Helgeb4c098c2022-10-04 16:26:38 +020084 <p>Distance operators rely on the tokenization and <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +020085
hebasta8914c482020-01-07 16:22:51 +010086 <p>In case a query contains numerous distance operators, they need to be nested in parentheses:</p>
87 %= doc_query cosmas2 => '(Tag /+w2 offenen) /+w1 Tür'
88
Akron84b91992019-07-16 11:35:49 +020089 <h4>Word Distance Operator</h4>
90
91 <p>The word distance operator <code>w</code> defines how many words are allowed or are not allowed in-between two search operands.</p>
92
93 <p>Search for two operands with up to 4 words in-between in arbitrary order:</p>
94 %= doc_query cosmas2 => 'Gegenwart /w4 Zukunft'
95
96 <p>Search for two operands with 3 to 4 words in-between with the first operand preceeding the second one:</p>
97 %= doc_query cosmas2 => 'Gegenwart /+w4:3 Zukunft'
98
99 <p>Search for two consecutive operands in the given order:</p>
100 %= doc_query cosmas2 => 'Gegenwart /+w1:1 Zukunft'
101
102 <p>Search for a first operand that is neither preceded nor suceeded by a second operand:</p>
103 %= doc_query cosmas2 => 'Gegenwart %w1 die'
104
105 <h4>Sentence Distance Operator</h4>
106
107 <p>The sentence distance operator <code>s</code> defines how many sentences are allowed or are not allowed in-between two search operands.</p>
Akron3cfa26d2019-10-24 15:17:34 +0200108 <p>The sentence distance relies on the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +0200109
110 <p>Search for two operands occuring in the same or a following sentence in arbitrary order:</p>
111 %= doc_query cosmas2 => 'offen /s1 Geschäft'
112
113 <p>Search for two operands occuring in the same sentence with the first operand preceeding the second one:</p>
114 %= doc_query cosmas2 => 'offen /+s0 Geschäft'
115
116 <p>Search for a first operand that does not occur with a second operand in the same sentence:</p>
117 %= doc_query cosmas2 => 'Gegenwart %s0 Zukunft'
118
119 <h4>Paragraph Distance Operator</h4>
120
121 <p>The paragraph distance operator <code>p</code> defines how many paragraphs are allowed or are not allowed in-between two search operands.</p>
Akron3cfa26d2019-10-24 15:17:34 +0200122 <p>The paragraph distance relies on the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %> annotation for document structures.</p>
Akron84b91992019-07-16 11:35:49 +0200123
124 <p>Search for two operands occuring in the same or a following paragraph in arbitrary order:</p>
125 %= doc_query cosmas2 => 'offen /p1 Geschäft'
126
127 <p>Search for two operands occuring in the same paragraph with the first operand preceeding the second one:</p>
128 %= doc_query cosmas2 => 'offen /+p0 Geschäft'
129
130 <p>Search for a first operand that does not occur with a second operand in the same paragraph:</p>
131 %= doc_query cosmas2 => 'Gegenwart %p0 Zukunft'
132
133 <blockquote class="warning">
134 <p>The KWIC result of including paragraph distance queries will likely exceed the supported maximum length of matches in KorAP and will therefore be cut.</p>
135 </blockquote>
136
hebasta8914c482020-01-07 16:22:51 +0100137 <h4>Multi-Distance Operators</h4>
138
139 <p>Distance operators can be combined to further limit the result set. The distance conditions are separated by comma (without spaces).</p>
Helgeb4c098c2022-10-04 16:26:38 +0200140 <p>Search for a defined two-word phrase in a sentence:</p>
hebasta8914c482020-01-07 16:22:51 +0100141 %= doc_query cosmas2 => 'ein /+w1,s0 Fest'
142
143 <h4>Omitted Distance Operator</h4>
144 <p>If the distance operator is omitted between two operands, KorAP is searching for a <code>/+w1</code> distance:</p>
145 %= doc_query cosmas2 => 'runder Tisch'
Akron84b91992019-07-16 11:35:49 +0200146
147</section>
148
149<section id="annotation-operators">
150 <h3>Annotation Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200151 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200152 %# MORPH and ELEM
153</section>
154
155<section id="combination-operators">
156 <h3>Combination Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200157 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200158 %# IN and OV
159</section>
160
161<section id="area-operators">
162 <h3>Area Operators</h3>
Akron9490e3b2019-10-17 12:26:29 +0200163 %= under_construction
Akron84b91992019-07-16 11:35:49 +0200164 %# LINKS, RECHTS, INKLUSIVE, EXKLUSIVE, BED
165</section>