blob: c32453c0e2fb3c36f8dcc4b9466b425d36ea90b9 [file] [log] [blame]
Nils Diewalda31a5152015-04-17 21:05:23 +00001% layout 'main', title => 'KorAP: Query Languages';
2
Nils Diewalda31a5152015-04-17 21:05:23 +00003%# Store the id of an active section in the session, so the system is able to directly scroll to the relevant section
4%# This should be stored when clicking on a specific query
5%# but the remembered section contains the id - not the query
6
Akron9490e3b2019-10-17 12:26:29 +02007%= page_title
Nils Diewalda31a5152015-04-17 21:05:23 +00008
Marc Kupietzcf26a592021-09-07 20:41:56 +02009<p>
10 KorAP supports multiple query languages, of which <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %>
11 is currently supported best. Poliqarp is very similar to the query languages of the
12 <%= ext_link_to 'IMS Open Corpus Workbench (CWB)', "http://cwb.sourceforge.net/" %> and the
13 <%= ext_link_to 'SketchEngine', "https://www.sketchengine.eu/documentation/corpus-querying/" %>.
14 It was originally developed to query the
15 <%= ext_link_to 'Polish National Corpus', "http://nkjp.pl/poliqarp/" %>.
16</p>
17
Nils Diewaldc46003b2015-05-07 15:55:35 +000018<section id="examples">
Nils Diewalda31a5152015-04-17 21:05:23 +000019 <h3>Example Queries</h3>
Nils Diewalda31a5152015-04-17 21:05:23 +000020
Helge95a8a9f2024-03-26 17:08:18 +010021 <p><strong><%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %></strong>: Find all occurrences of the lemma &quot;Baum&quot; as annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p>
Akronf4a7cf42018-01-09 15:58:45 +010022 %= doc_query poliqarp => '[base=Baum]'
Nils Diewalda31a5152015-04-17 21:05:23 +000023
Akron3cfa26d2019-10-24 15:17:34 +020024 <p><strong><%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %></strong>: Find all sequences of adjectives as annotated by Treetagger, that are repeated 3 to 5 times in a row.</p>
Akronf4a7cf42018-01-09 15:58:45 +010025 %= doc_query poliqarp => '[tt/p=ADJA]{3,5}'
Nils Diewalda31a5152015-04-17 21:05:23 +000026
Akron3cfa26d2019-10-24 15:17:34 +020027 <p><strong><%= embedded_link_to 'doc', 'Cosmas-II', 'ql', 'cosmas-2' %></strong>: Find all occurrences of the words &quot;der&quot; and &quot;Baum&quot;, in case they are in a maximum distance of 5 tokens. The order is not relevant.</p>
Akronf4a7cf42018-01-09 15:58:45 +010028 %= doc_query cosmas2 => 'der /w5 Baum'
Nils Diewalda31a5152015-04-17 21:05:23 +000029
Helgea97cfd82023-11-14 10:45:11 +010030 <p><strong><%= embedded_link_to 'doc', 'Cosmas-II', 'ql', 'cosmas-2' %></strong>: Find all sequences of a word starting with a &quot;d&quot; (using a wildcard) followed by an adjective as annotated in the marmot foundry, followed by the word &quot;Baum&quot; (ignore the case), that is in a sentence element annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p>
Nils Diewalda31a5152015-04-17 21:05:23 +000031 <p><em>Be aware</em>: Minor incompatibilities with implemented languages may be announced with warnings.</p>
Helgea97cfd82023-11-14 10:45:11 +010032 %= doc_query cosmas2 => 'd* MORPH(marmot/p=ADJA) $Baum #IN #ELEM(s)'
Nils Diewalda31a5152015-04-17 21:05:23 +000033
Akron3cfa26d2019-10-24 15:17:34 +020034 <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all nominal phrases as annotated using CoreNLP, that contain an adverb as annotated by OpenNLP, that is annotated as something starting with an &quot;A&quot; using regular expressions in Treetagger.</p>
Akronb55f7b82018-12-17 22:42:52 +010035 %= doc_query poliqarp => 'contains(<corenlp/c=NP>,{[opennlp/p=ADV & tt/p="A.*"]})', cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000036
Akron3cfa26d2019-10-24 15:17:34 +020037 <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all sentences as annotated by the base foundry that start with a sequence of one token in present tense as annotated by Marmot and the lemma &quot;die&quot; annotated by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>. Highlight both terms of the sequence.</p>
Akronb55f7b82018-12-17 22:42:52 +010038 %= doc_query poliqarp => 'startswith(<base/s=s>, {1:[marmot/m=tense:pres]}{2:[base=die]})', cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000039
Akron3cfa26d2019-10-24 15:17:34 +020040 <p><strong><%= embedded_link_to 'doc', 'Poliqarp+', 'ql', 'poliqarp-plus' %></strong>: Find all sequences of an article, followed by three to four adjectives and a noun as annotated by the Treetagger foundry, that finish a sentence. Highlight all parts of the sequence.</p>
Akronf4a7cf42018-01-09 15:58:45 +010041 %= doc_query poliqarp => 'focus(3:endswith(<base/s=s>,{3:[tt/p=ART]{1:{2:[tt/p=ADJA]{3,4}}[tt/p=NN]}}))', cutoff => 1
Nils Diewalda31a5152015-04-17 21:05:23 +000042
Akron3cfa26d2019-10-24 15:17:34 +020043 <p><strong><%= embedded_link_to 'doc', 'Annis', 'ql', 'annis' %></strong>: Find all occurrences of the sequence of two tokens annotated as adverbs by the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p>
Akronf4a7cf42018-01-09 15:58:45 +010044 %= doc_query annis => 'pos="ADV" & pos="ADV" & #1 . #2'
Nils Diewalda31a5152015-04-17 21:05:23 +000045
Akron3cfa26d2019-10-24 15:17:34 +020046 <p><strong><%= embedded_link_to 'doc', 'Annis', 'ql', 'annis' %></strong>: Find all determiner relations with the label <code>DET</code> by MALT where the relation sources are attributive possesive pronouns annotated by Tree Tagger.</p>
Akronb37056c2018-01-09 16:55:12 +010047 %= doc_query annis => 'tt/p="PPOSAT" ->malt/d[func="DET"] node'
48
Akron3cfa26d2019-10-24 15:17:34 +020049 <p><strong><%= embedded_link_to 'doc', 'CQL', 'ql', 'cql' %></strong>: Find all occurrences of the sequence &quot;der alte Mann&quot;.</p>
Akronf4a7cf42018-01-09 15:58:45 +010050 %= doc_query cql => '"der alte Mann"'
Nils Diewalda31a5152015-04-17 21:05:23 +000051</section>