Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 1 | % layout 'main', title => 'KorAP: Query Languages'; |
| 2 | |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 3 | %# Store the id of an active section in the session, so the system is able to directly scroll to the relevant section |
| 4 | %# This should be stored when clicking on a specific query |
| 5 | %# but the remembered section contains the id - not the query |
| 6 | |
Akron | 9490e3b | 2019-10-17 12:26:29 +0200 | [diff] [blame] | 7 | %= page_title |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 8 | |
Marc Kupietz | cf26a59 | 2021-09-07 20:41:56 +0200 | [diff] [blame] | 9 | <p> |
| 10 | KorAP supports multiple query languages, of which <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %> |
| 11 | is currently supported best. Poliqarp is very similar to the query languages of the |
| 12 | <%= ext_link_to 'IMS Open Corpus Workbench (CWB)', "http://cwb.sourceforge.net/" %> and the |
| 13 | <%= ext_link_to 'SketchEngine', "https://www.sketchengine.eu/documentation/corpus-querying/" %>. |
| 14 | It was originally developed to query the |
| 15 | <%= ext_link_to 'Polish National Corpus', "http://nkjp.pl/poliqarp/" %>. |
| 16 | </p> |
| 17 | |
Nils Diewald | c46003b | 2015-05-07 15:55:35 +0000 | [diff] [blame] | 18 | <section id="examples"> |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 19 | <h3>Example Queries</h3> |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 20 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 21 | <p><strong><%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %></strong>: Find all occurrences of the lemma "baum" as annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 22 | %= doc_query poliqarp => '[base=Baum]' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 23 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 24 | <p><strong><%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %></strong>: Find all sequences of adjectives as annotated by Treetagger, that are repeated 3 to 5 times in a row.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 25 | %= doc_query poliqarp => '[tt/p=ADJA]{3,5}' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 26 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 27 | <p><strong><%= embedded_link_to 'doc', 'Cosmas-II', 'ql', 'cosmas-2' %></strong>: Find all occurrences of the words "der" and "Baum", in case they are in a maximum distance of 5 tokens. The order is not relevant.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 28 | %= doc_query cosmas2 => 'der /w5 Baum' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 29 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 30 | <p><strong><%= embedded_link_to 'doc', 'Cosmas-II', 'ql', 'cosmas-2' %></strong>: Find all sequences of a word starting with a "d" (using a wildcard) followed by an adjective as annotated in the mate foundry, followed by the word "Baum" (ignore the case), that is in a sentence element annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p> |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 31 | <p><em>Be aware</em>: Minor incompatibilities with implemented languages may be announced with warnings.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 32 | %= doc_query cosmas2 => 'd* MORPH(mate/p=ADJA) $Baum #IN #ELEM(s)' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 33 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 34 | <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all nominal phrases as annotated using CoreNLP, that contain an adverb as annotated by OpenNLP, that is annotated as something starting with an "A" using regular expressions in Treetagger.</p> |
Akron | b55f7b8 | 2018-12-17 22:42:52 +0100 | [diff] [blame] | 35 | %= doc_query poliqarp => 'contains(<corenlp/c=NP>,{[opennlp/p=ADV & tt/p="A.*"]})', cutoff => 1 |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 36 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 37 | <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all sentences as annotated by the base foundry that start with a sequence of one token in present tense as annotated by Marmot and the lemma "die" annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>. Highlight both terms of the sequence.</p> |
Akron | b55f7b8 | 2018-12-17 22:42:52 +0100 | [diff] [blame] | 38 | %= doc_query poliqarp => 'startswith(<base/s=s>, {1:[marmot/m=tense:pres]}{2:[base=die]})', cutoff => 1 |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 39 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 40 | <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all sequences of an article, followed by three to four adjectives and a noun as annotated by the Treetagger foundry, that finish a sentence. Highlight all parts of the sequence.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 41 | %= doc_query poliqarp => 'focus(3:endswith(<base/s=s>,{3:[tt/p=ART]{1:{2:[tt/p=ADJA]{3,4}}[tt/p=NN]}}))', cutoff => 1 |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 42 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 43 | <p><strong><%= embedded_link_to 'doc', 'Annis', 'ql', 'annis' %></strong>: Find all occurrences of the sequence of two tokens annotated as adverbs by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 44 | %= doc_query annis => 'pos="ADV" & pos="ADV" & #1 . #2' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 45 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 46 | <p><strong><%= embedded_link_to 'doc', 'Annis', 'ql', 'annis' %></strong>: Find all determiner relations with the label <code>DET</code> by MALT where the relation sources are attributive possesive pronouns annotated by Tree Tagger.</p> |
Akron | b37056c | 2018-01-09 16:55:12 +0100 | [diff] [blame] | 47 | %= doc_query annis => 'tt/p="PPOSAT" ->malt/d[func="DET"] node' |
| 48 | |
Akron | 3cfa26d | 2019-10-24 15:17:34 +0200 | [diff] [blame] | 49 | <p><strong><%= embedded_link_to 'doc', 'CQL', 'ql', 'cql' %></strong>: Find all occurrences of the sequence "der alte Mann".</p> |
Akron | f4a7cf4 | 2018-01-09 15:58:45 +0100 | [diff] [blame] | 50 | %= doc_query cql => '"der alte Mann"' |
Nils Diewald | a31a515 | 2015-04-17 21:05:23 +0000 | [diff] [blame] | 51 | </section> |