templates/doc/data/annotation.html.ep - KorAP/Kalamar - Gitiles

 % layout 'main', title => 'KorAP: Annotations';

 %= page_title

 <p>KorAP provides access to multiple levels of annotations originating from multiple resources, so called <em>foundries</em>.</p>

 <section id="base">
   <h3>Base Foundry</h3>
   <p>The base foundry is available for all corpora and acts as a common ground for document structure annotation in the layer <code>s</code>.</p>
   <dl>
     <dt><abbr data-type="token" title="Structure">s</abbr></dt>
     <dd>Document structure supporting the spans: <code>&lt;base/s=s&gt;</code> for sentences, <code>&lt;base/s=p&gt;</code> for paragraphs, and <code>&lt;base/s=t&gt;</code> for the text span.</dd>
   </dl>

   %= doc_query poliqarp => '<base/s=s>', cutoff => 1
 </section>


 <!--
 <section id="cnx">
   <h3>Connexor (<code>cnx</code>)</h3>
   <p>Connexor annotations provide the following layer for the <code>cnx</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Lemma">l</abbr></dt>
     <dd>All lemmas are written in lower case. Composita are split, e.g. the token &quot;Leitfähigkeit&quot; is matched by the lemmas &quot;leit&quot; and &quot;fähigkeit&quot; - not by the lemma &quot;leitfähigkeit&quot;</dd>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
     <dt><abbr data-type="token" title="Syntactical information">syn</abbr></dt>
     <dd>Includes token based information like <code>@PREMOD</code>, <code>@NH</code>, <code>@MAIN</code> ...</dd>
     <dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
     <dd>Includes information about tense (<code>PRES</code> ...), mode (<code>IND</code>), number (<code>PL</code> ...) etc.</dd>
     <dt><abbr data-type="span" title="Phrases">c</abbr></dt>
     <dd>Only nominal phrases are available and all nominal phrases are written in lower case (<code>np</code>)</dd>
   </dl>
   %= doc_query poliqarp => '[cnx/p=CC]', cutoff => 1
 </section>
 -->

 <section id="dereko">
   <h3>DeReKo (<code>dereko</code>)</h3>
   <p>DeReKo annotations provide the following layer for the <code>dereko</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Structure">s</abbr></dt>
     <dd>Document structure as encoded in the <%= ext_link_to 'I5 text document', 'http://www1.ids-mannheim.de/kl/projekte/korpora/textmodell.html' %>.</dd>
   </dl>
   %= doc_query poliqarp => 'startsWith(<dereko/s=s>, Fragestunde)', cutoff => 1
 </section>


 <section id="corenlp">
   <h3>CoreNLP (<code>corenlp</code>)</h3>
   <p>CoreNLP annotations provide the following layer for the <code>corenlp</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
     <dt><abbr data-type="token" title="Constituency">c</abbr></dt>
     <dd>Constituency information follows the annotations of the <%= ext_link_to 'negr@ corpus', 'http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/negra-corpus.html' %>.</dd>
     <dt><abbr data-type="token" title="Named Entity">ne</abbr></dt>
     <dd>Contains named entities like <code>I-PER</code>, <code>I-ORG</code> etc.</dd>
     <dt><abbr data-type="token" title="Named Entity">ne_hgc_175m_600</abbr></dt>
     <dd>See above</dd>
     <dt><abbr data-type="token" title="Named Entity">ne_dewac_175_175m_600</abbr></dt>
     <dd>See above</dd>
   </dl>
   %= doc_query poliqarp => '[corenlp/ne_dewac_175m_600=I-ORG]', cutoff => 1
 </section>


 <section id="tt">
   <h3>TreeTagger (<code>tt</code>)</h3>
   <p>TreeTagger annotations provide the following layer for the <code>tt</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Lemma">l</abbr></dt>
     <dd>All non-noun lemmas are written in lower case, nouns are written upper case. Composita stay intact (e.g. <code>Normalbedingung</code>)</dd>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
   </dl>
   %= doc_query poliqarp => '[tt/p=ADV]', cutoff => 1
 </section>


 <!--
 <section id="mate">
   <h3>Mate (<code>mate</code>)</h3>
   <dl>
     <dt><abbr data-type="token" title="Lemma">l</abbr></dt>
     <dd>All lemmas are written in lower case. Composita stay intact (e.g. <code>buchstabenbezeichnung</code>)</dd>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
     <dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
     <dd>Includes information about tense (<code>tense:pres</code> ...), mode (<code>mood:ind</code>), number (<code>number:pl</code> ...), gender (<code>gender:masc</code> ...) etc.</dd>
   </dl>
   %= doc_query poliqarp => '[mate/m=gender:fem]', cutoff => 1
 </section>
 -->

 <section id="malt">
   <h3>Malt (<code>malt</code>)</h3>
   <p>Malt annotations provide the following layer for the <code>malt</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Lemma">d</abbr></dt>
     <dd>Dependency information</dd>
   </dl>
   %= doc_query annis => 'tt/p="PPOSAT" ->malt/d[func="DET"] node', cutoff => 1
 </section>

 <section id="opennlp">
   <h3>OpenNLP (<code>opennlp</code>)</h3>
   <p>OpenNLP annotations provide the following layer for the <code>opennlp</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
   </dl>
   %= doc_query poliqarp => '[opennlp/p=PDAT]', cutoff => 1
 </section>


 <section id="marmot">
   <h3>Marmot (<code>marmot</code>)</h3>
   <p>Marmot annotations provide the following layer for the <code>marmot</code> prefix:</p>
   <dl>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
     <dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
     <dd>Includes information about case (<code>acc</code> ...), degree (<code>pos</code>), gender (<code>fem</code> ...) etc.</dd>
   </dl>
   %= doc_query poliqarp => '[marmot/m=degree:sup & marmot/p=ADJA]', cutoff => 1
 </section>


 <!--
 <section id="xip">
   <h3>Xerox Incremental Parser (<code>xip</code>)</h3>
   <dl>
     <dt><abbr data-type="token" title="Lemma">l</abbr></dt>
     <dd>All non-noun lemmas are written in lower case, nouns are written upper case. Composita are split, e.g. the token <code>Leitfähigkeit</code> is matched by the lemmas <code>leiten</code> and <code>Fähigkeit</code> - and by a merged and pretty useless <code>leitenfähigkeit</code> (This is going to change)</dd>
     <dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
     <dd>All part-of-spech information is written in capital letters and is based on STTS</dd>
     <dt><abbr data-type="span" title="Phrases">c</abbr></dt>
     <dd>Some phrases to create sentences, all upper case (<code>NP</code>, <code>NPA</code>, <code>NOUN</code>, <code>VERB</code>, <code>PREP</code>, <code>AP</code> ...)</dd>
   </dl>
   %= doc_query poliqarp => '[xip/p=ADJ]', cutoff => 1
   %= doc_query poliqarp => '<xip/c=VERB>', cutoff => 1
 </section>
 -->

 <section id="default-foundries">
   <h3>Default Foundries</h3>
   <p>For queries on specific layers without given foundries, KorAP provides default foundries<!--, that can be overwritten by user configurations-->. The default foundries apply to the following layers:</p>

   <ul>
     <li><strong>orth</strong>: <code>opennlp</code></li>
     <li><strong>lemma</strong>: <code>tt</code></li>
     <li><strong>pos</strong>: <code>tt</code></li>
   </ul>

   <blockquote>
     <p>In the Lucene backend, the <code>orth</code> layer can only be bound to a specific foundry, as only one tokenization is supported.</p>
   </blockquote>
 </section>
	% layout 'main', title => 'KorAP: Annotations';

	%= page_title

	<p>KorAP provides access to multiple levels of annotations originating from multiple resources, so called <em>foundries</em>.</p>

	<section id="base">
	<h3>Base Foundry</h3>
	<p>The base foundry is available for all corpora and acts as a common ground for document structure annotation in the layer <code>s</code>.</p>
	<dl>
	<dt><abbr data-type="token" title="Structure">s</abbr></dt>
	<dd>Document structure supporting the spans: <code><base/s=s></code> for sentences, <code><base/s=p></code> for paragraphs, and <code><base/s=t></code> for the text span.</dd>
	</dl>

	%= doc_query poliqarp => '<base/s=s>', cutoff => 1
	</section>


	<!--
	<section id="cnx">
	<h3>Connexor (<code>cnx</code>)</h3>
	<p>Connexor annotations provide the following layer for the <code>cnx</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Lemma">l</abbr></dt>
	<dd>All lemmas are written in lower case. Composita are split, e.g. the token "Leitfähigkeit" is matched by the lemmas "leit" and "fähigkeit" - not by the lemma "leitfähigkeit"</dd>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
	<dt><abbr data-type="token" title="Syntactical information">syn</abbr></dt>
	<dd>Includes token based information like <code>@PREMOD</code>, <code>@NH</code>, <code>@MAIN</code> ...</dd>
	<dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
	<dd>Includes information about tense (<code>PRES</code> ...), mode (<code>IND</code>), number (<code>PL</code> ...) etc.</dd>
	<dt><abbr data-type="span" title="Phrases">c</abbr></dt>
	<dd>Only nominal phrases are available and all nominal phrases are written in lower case (<code>np</code>)</dd>
	</dl>
	%= doc_query poliqarp => '[cnx/p=CC]', cutoff => 1
	</section>
	-->

	<section id="dereko">
	<h3>DeReKo (<code>dereko</code>)</h3>
	<p>DeReKo annotations provide the following layer for the <code>dereko</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Structure">s</abbr></dt>
	<dd>Document structure as encoded in the <%= ext_link_to 'I5 text document', 'http://www1.ids-mannheim.de/kl/projekte/korpora/textmodell.html' %>.</dd>
	</dl>
	%= doc_query poliqarp => 'startsWith(<dereko/s=s>, Fragestunde)', cutoff => 1
	</section>


	<section id="corenlp">
	<h3>CoreNLP (<code>corenlp</code>)</h3>
	<p>CoreNLP annotations provide the following layer for the <code>corenlp</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
	<dt><abbr data-type="token" title="Constituency">c</abbr></dt>
	<dd>Constituency information follows the annotations of the <%= ext_link_to 'negr@ corpus', 'http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/negra-corpus.html' %>.</dd>
	<dt><abbr data-type="token" title="Named Entity">ne</abbr></dt>
	<dd>Contains named entities like <code>I-PER</code>, <code>I-ORG</code> etc.</dd>
	<dt><abbr data-type="token" title="Named Entity">ne_hgc_175m_600</abbr></dt>
	<dd>See above</dd>
	<dt><abbr data-type="token" title="Named Entity">ne_dewac_175_175m_600</abbr></dt>
	<dd>See above</dd>
	</dl>
	%= doc_query poliqarp => '[corenlp/ne_dewac_175m_600=I-ORG]', cutoff => 1
	</section>


	<section id="tt">
	<h3>TreeTagger (<code>tt</code>)</h3>
	<p>TreeTagger annotations provide the following layer for the <code>tt</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Lemma">l</abbr></dt>
	<dd>All non-noun lemmas are written in lower case, nouns are written upper case. Composita stay intact (e.g. <code>Normalbedingung</code>)</dd>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
	</dl>
	%= doc_query poliqarp => '[tt/p=ADV]', cutoff => 1
	</section>


	<!--
	<section id="mate">
	<h3>Mate (<code>mate</code>)</h3>
	<dl>
	<dt><abbr data-type="token" title="Lemma">l</abbr></dt>
	<dd>All lemmas are written in lower case. Composita stay intact (e.g. <code>buchstabenbezeichnung</code>)</dd>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
	<dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
	<dd>Includes information about tense (<code>tense:pres</code> ...), mode (<code>mood:ind</code>), number (<code>number:pl</code> ...), gender (<code>gender:masc</code> ...) etc.</dd>
	</dl>
	%= doc_query poliqarp => '[mate/m=gender:fem]', cutoff => 1
	</section>
	-->

	<section id="malt">
	<h3>Malt (<code>malt</code>)</h3>
	<p>Malt annotations provide the following layer for the <code>malt</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Lemma">d</abbr></dt>
	<dd>Dependency information</dd>
	</dl>
	%= doc_query annis => 'tt/p="PPOSAT" ->malt/d[func="DET"] node', cutoff => 1
	</section>

	<section id="opennlp">
	<h3>OpenNLP (<code>opennlp</code>)</h3>
	<p>OpenNLP annotations provide the following layer for the <code>opennlp</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>All part-of-speech information is written in capital letters and is based on STTS</dd>
	</dl>
	%= doc_query poliqarp => '[opennlp/p=PDAT]', cutoff => 1
	</section>


	<section id="marmot">
	<h3>Marmot (<code>marmot</code>)</h3>
	<p>Marmot annotations provide the following layer for the <code>marmot</code> prefix:</p>
	<dl>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>Part-of-speech information is written in capital letters and is based on STTS</dd>
	<dt><abbr data-type="token" title="Morphosyntactical information">m</abbr></dt>
	<dd>Includes information about case (<code>acc</code> ...), degree (<code>pos</code>), gender (<code>fem</code> ...) etc.</dd>
	</dl>
	%= doc_query poliqarp => '[marmot/m=degree:sup & marmot/p=ADJA]', cutoff => 1
	</section>


	<!--
	<section id="xip">
	<h3>Xerox Incremental Parser (<code>xip</code>)</h3>
	<dl>
	<dt><abbr data-type="token" title="Lemma">l</abbr></dt>
	<dd>All non-noun lemmas are written in lower case, nouns are written upper case. Composita are split, e.g. the token <code>Leitfähigkeit</code> is matched by the lemmas <code>leiten</code> and <code>Fähigkeit</code> - and by a merged and pretty useless <code>leitenfähigkeit</code> (This is going to change)</dd>
	<dt><abbr data-type="token" title="Part-of-Speech">p</abbr></dt>
	<dd>All part-of-spech information is written in capital letters and is based on STTS</dd>
	<dt><abbr data-type="span" title="Phrases">c</abbr></dt>
	<dd>Some phrases to create sentences, all upper case (<code>NP</code>, <code>NPA</code>, <code>NOUN</code>, <code>VERB</code>, <code>PREP</code>, <code>AP</code> ...)</dd>
	</dl>
	%= doc_query poliqarp => '[xip/p=ADJ]', cutoff => 1
	%= doc_query poliqarp => '<xip/c=VERB>', cutoff => 1
	</section>
	-->

	<section id="default-foundries">
	<h3>Default Foundries</h3>
	<p>For queries on specific layers without given foundries, KorAP provides default foundries<!--, that can be overwritten by user configurations-->. The default foundries apply to the following layers:</p>

	<ul>
	<li><strong>orth</strong>: <code>opennlp</code></li>
	<li><strong>lemma</strong>: <code>tt</code></li>
	<li><strong>pos</strong>: <code>tt</code></li>
	</ul>

	<blockquote>
	<p>In the Lucene backend, the <code>orth</code> layer can only be bound to a specific foundry, as only one tokenization is supported.</p>
	</blockquote>
	</section>