Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 1 | % layout 'main', title => 'KorAP: Regular Expressions'; |
| 2 | |
Akron | 1120a58 | 2017-10-17 12:29:16 +0200 | [diff] [blame] | 3 | <h2 id="tutorial-top">Regular Expressions</h2> |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 4 | |
Akron | f8715a3 | 2019-07-19 13:26:03 +0200 | [diff] [blame] | 5 | <p>Regular expressions are patterns describing a set of strings.</p> |
| 6 | <p>In the KorAP backend a wide range of operators is supported, but only the following are guaranteed to be stable throughout the system:</p> |
| 7 | |
| 8 | <section id="quantifiers"> |
| 9 | <h3>Operators</h3> |
| 10 | <dl> |
| 11 | <dt><code>.</code> - Any</dt> |
| 12 | <dd>Any symbol</dd> |
| 13 | <dt><code>()</code> - Group</dt> |
| 14 | <dd>Create a group of operands</dd> |
| 15 | <dt><code>|</code> - Alternation</dt> |
| 16 | <dd>Create alternative operands</dd> |
| 17 | <dt><code>[]</code> - Character Class</dt> |
| 18 | <dd>Group alternative characters</dd> |
| 19 | <dt><code>\</code> - Escape symbol</dt> |
| 20 | <dd>Mark the following character to be interpreted as verbatim, when the character is special (i.e. an operator or quantifier)</dd> |
| 21 | </dl> |
| 22 | |
| 23 | %= doc_query poliqarp => '".eine" Frau', cutoff => 1 |
| 24 | %= doc_query poliqarp => '"Fr..de"', cutoff => 1 |
| 25 | %= doc_query poliqarp => '"Fr(ie|eu)de" []{,3} Eierkuchen', cutoff => 1 |
| 26 | %= doc_query poliqarp => '"Fre[um]de"', cutoff => 1 |
| 27 | %= doc_query poliqarp => '"b.w\."', cutoff => 1 |
| 28 | </section> |
| 29 | |
| 30 | <section id="quantifiers"> |
| 31 | <h3>Quantifiers</h3> |
| 32 | |
| 33 | <p>Operands in regular expressions can be quantified, |
| 34 | meaning they are allowed to occur consecutively a specified number of times. |
| 35 | The following quantifieres are supported:</p> |
| 36 | |
| 37 | <dl> |
| 38 | <dt><code>?</code></dt> |
| 39 | <dd>Match 0 or 1 times</dd> |
| 40 | <dt><code>*</code></dt> |
| 41 | <dd>Match 0 or more times</dd> |
| 42 | <dt><code>+</code></dt> |
| 43 | <dd>Match 1 or more times</dd> |
| 44 | <dt><code>{n}</code></dt> |
| 45 | <dd>Match <code>n</code> times</dd> |
| 46 | <dt><code>{n,}</code></dt> |
| 47 | <dd>Match at least <code>n</code> times</dd> |
| 48 | <dt><code>{n,m}</code></dt> |
| 49 | <dd>Match at least <code>n</code> times but no more than <code>m</code> times</dd> |
| 50 | </dl> |
| 51 | %= doc_query poliqarp => '"Schif+ahrt"', cutoff => 1 |
| 52 | %= doc_query poliqarp => '"kl?eine" Kinder', cutoff => 1 |
| 53 | %= doc_query poliqarp => '"Schlos{2,3}traße"', cutoff => 1 |
| 54 | %= doc_query poliqarp => '"Rha(bar){2}"', cutoff => 1 |
| 55 | </section> |