KorAP: COSMAS II

The following documentation introduces some features provided by our version of the COSMAS II Query Language. For more information, please visit the online help of COSMAS II.

Query Terms

A query term in COSMAS II can be a word, a punctuation symbol, or a number.

Baum
4000

Currently, punctuations are not supported by KorAP.

Placeholder Operators

In addition query terms can contain multiple placeholders like ? (for any symbol), + (for any or no symbol), or * (for any sequence of any or no symbols).

Bau?m
Bau+m
Bau*m

Lemma Operator

Instead of searching for the surface form of a word, a lemma (as annotated by the default foundry) can be requested by prepending the term with the & operator. The form of the lemma is dependent on the annotation.

&laufen

Case Insensitivity Operator

By prepending the term with a $ symbol, the search is case insensitive.

$Lauf

Regular Expression Operator

By using the #REG(...) operator, query terms can be formulated using regular expressions.

#REG(Archi.*ung)

Regular expressions in COSMAS II are not yet properly implemented in KorAP. If you want to use regular expressions, please refer to Poliqarp.

Logical Operators

Query terms can be combined in logical operations, using the operators and, or, and not. The german forms are supported as well: und, oder and nicht.

These operators work on the text level, so the following query returns matches for all occurrences where both terms occur anywhere in the same text.

anscheinend und scheinbar

The following query returns matches for all occurrences where at least one of the terms occur anywhere in the text.

anscheinend oder scheinbar

The following query returns matches for all occurrences of the first term, where the term following the nicht operator does not occur anywhere in the same text.

Kegel nicht Kind

To escape terms for logical operators (i.e. to prevent these terms from being interpreted as logical operators), they need to be surrounded by quotations.

Mann "und" Maus

Distance Operators

Distance operators allow you to search for two operands (search terms or complex search operations) that occur or don't occur at a certain distance from each other in a text. When the two operands should occur together (the operator is prepended by a / symbol), both operands are in the result set. When they shouldn't occur together (the operator is prepended by a % symbol), only the first operand is in the result set.

Distance operators accept an additional direction parameter. By prepending the operator with a + symbol (e.g. in /+s0), the second operand is required to occur or not occur after the first operand. By prepending the operator with a - symbol (e.g. in /-s0), the second operand is required to occur or not occur in front of the first operand. In case the direction parameter is omitted, the direction of both operands is arbitrary.

Distance operators accept the definition of a distance interval by appending numerical values. If only a single numerical value is given (e.g. in /+s4), the defined distance is considered a maximum distance. So both operands can or can not occur in a distance equal or lower the given value. If two numerical values are given separated by the : symbol (e.g. in /+s4:2), they define an interval, in which the distance is valid.

Distance operators rely on the tokenization and default foundry annotation for document structures.

In case a query contains numerous distance operators, they need to be nested in parentheses:

(Tag /+w2 offenen) /+w1 Tür

Word Distance Operator

The word distance operator w defines how many words are allowed or are not allowed in-between two search operands.

Search for two operands with up to 4 words in-between in arbitrary order:

Gegenwart /w4 Zukunft

Search for two operands with 3 to 4 words in-between with the first operand preceeding the second one:

Gegenwart /+w4:3 Zukunft

Search for two consecutive operands in the given order:

Gegenwart /+w1:1 Zukunft

Search for a first operand that is neither preceded nor suceeded by a second operand:

Gegenwart %w1 die

Sentence Distance Operator

The sentence distance operator s defines how many sentences are allowed or are not allowed in-between two search operands.

The sentence distance relies on the default foundry annotation for document structures.

Search for two operands occuring in the same or a following sentence in arbitrary order:

offen /s1 Geschäft

Search for two operands occuring in the same sentence with the first operand preceeding the second one:

offen /+s0 Geschäft

Search for a first operand that does not occur with a second operand in the same sentence:

Gegenwart %s0 Zukunft

Paragraph Distance Operator

The paragraph distance operator p defines how many paragraphs are allowed or are not allowed in-between two search operands.

The paragraph distance relies on the default foundry annotation for document structures.

Search for two operands occuring in the same or a following paragraph in arbitrary order:

offen /p1 Geschäft

Search for two operands occuring in the same paragraph with the first operand preceeding the second one:

offen /+p0 Geschäft

Search for a first operand that does not occur with a second operand in the same paragraph:

Gegenwart %p0 Zukunft

The KWIC result of including paragraph distance queries will likely exceed the supported maximum length of matches in KorAP and will therefore be cut.

Multi-Distance Operators

Distance operators can be combined to further limit the result set. The distance conditions are separated by comma (without spaces).

Search for a defined two-word phrase in a sentence:

ein /+w1,s0 Fest

Omitted Distance Operator

If the distance operator is omitted between two operands, KorAP is searching for a /+w1 distance:

runder Tisch