KorAP: COSMAS II
The following documentation introduces some features provided by our version of the COSMAS II Query Language. For more information, please visit the online help of COSMAS II.
Query Terms
A query term in COSMAS II can be a word, a punctuation symbol, or a number.
Baum
4000
Currently, punctuations are not supported by KorAP.
Placeholder Operators
In addition query terms can contain multiple placeholders like ?
(for any symbol), +
(for any or no symbol), or *
(for any sequence of any or no symbols).
Bau?m
Bau+m
Bau*m
Lemma Operator
Instead of searching for the surface form of a word, a lemma (as annotated by the default foundry) can be requested by prepending the term with the &
operator. The form of the lemma is dependent on the annotation.
&laufen
Case Insensitivity Operator
By prepending the term with a $
symbol, the search is case insensitive.
$Lauf
Regular Expression Operator
By using the #REG(...)
operator, query terms can be formulated using regular expressions.
#REG(Archi.*ung)
Regular expressions in COSMAS II are not yet properly implemented in KorAP. If you want to use regular expressions, please refer to Poliqarp.
Logical Operators
Query terms can be combined in logical operations, using the operators and
, or
, and not
. The german forms are supported as well: und
, oder
and nicht
.
These operators work on the text level, so the following query returns matches for all occurrences where both terms occur anywhere in the same text.
anscheinend und scheinbar
The following query returns matches for all occurrences where at least one of the terms occur anywhere in the text.
anscheinend oder scheinbar
The following query returns matches for all occurrences of the first term, where the term following the nicht
operator does not occur anywhere in the same text.
Kegel nicht Kind
To escape terms for logical operators (i.e. to prevent these terms from being interpreted as logical operators), they need to be surrounded by quotations.
Mann "und" Maus
Distance Operators
Distance operators allow you to search for two operands (search terms or complex search operations) that occur or don't occur at a certain distance from each other in a text. When the two operands should occur together (the operator is prepended by a /
symbol), both operands are in the result set. When they shouldn't occur together (the operator is prepended by a %
symbol), only the first operand is in the result set.
Distance operators accept an additional direction parameter.
By prepending the operator with a +
symbol (e.g. in /+s0
), the second operand is required to occur or not occur after the first operand.
By prepending the operator with a -
symbol (e.g. in /-s0
), the second operand is required to occur or not occur in front of the first operand.
In case the direction parameter is omitted, the direction of both operands is arbitrary.
Distance operators accept the definition of a distance interval by appending numerical values. If only a single numerical value is given (e.g. in /+s4
), the defined distance is considered a maximum distance. So both operands can or can not occur in a distance equal or lower the given value. If two numerical values are given separated by the :
symbol (e.g. in /+s4:2
), they define an interval, in which the distance is valid.
Distance operators rely on the tokenization and default foundry annotation for document structures.
In case a query contains numerous distance operators, they need to be nested in parentheses:
(Tag /+w2 offenen) /+w1 Tür
Word Distance Operator
The word distance operator w
defines how many words are allowed or are not allowed in-between two search operands.
Search for two operands with up to 4 words in-between in arbitrary order:
Gegenwart /w4 Zukunft
Search for two operands with 3 to 4 words in-between with the first operand preceeding the second one:
Gegenwart /+w4:3 Zukunft
Search for two consecutive operands in the given order:
Gegenwart /+w1:1 Zukunft
Search for a first operand that is neither preceded nor suceeded by a second operand:
Gegenwart %w1 die
Sentence Distance Operator
The sentence distance operator s
defines how many sentences are allowed or are not allowed in-between two search operands.
The sentence distance relies on the default foundry annotation for document structures.
Search for two operands occuring in the same or a following sentence in arbitrary order:
offen /s1 Geschäft
Search for two operands occuring in the same sentence with the first operand preceeding the second one:
offen /+s0 Geschäft
Search for a first operand that does not occur with a second operand in the same sentence:
Gegenwart %s0 Zukunft
Paragraph Distance Operator
The paragraph distance operator p
defines how many paragraphs are allowed or are not allowed in-between two search operands.
The paragraph distance relies on the default foundry annotation for document structures.
Search for two operands occuring in the same or a following paragraph in arbitrary order:
offen /p1 Geschäft
Search for two operands occuring in the same paragraph with the first operand preceeding the second one:
offen /+p0 Geschäft
Search for a first operand that does not occur with a second operand in the same paragraph:
Gegenwart %p0 Zukunft
The KWIC result of including paragraph distance queries will likely exceed the supported maximum length of matches in KorAP and will therefore be cut.
Multi-Distance Operators
Distance operators can be combined to further limit the result set. The distance conditions are separated by comma (without spaces).
Search for a defined two-word phrase in a sentence:
ein /+w1,s0 Fest
Omitted Distance Operator
If the distance operator is omitted between two operands, KorAP is searching for a /+w1
distance:
runder Tisch