Blame - templates/tutorial/foundries.html.ep - KorAP/Kalamar

blob: 4c6427d0de2279b9e1e9125dac3c56552449970a [file] [log] [blame]

Nils Diewald	4e9fbcb	2014-07-15 11:45:09 +0000	[diff] [blame]	1	% content main => begin
				2
				3	<h2>KorAP-Tutorial: Foundries and Layers</h2>
				4
				5	<p><%= korap_tut_link_to 'Back to Index', '/tutorial' %></p>
				6
				7	<p>KorAP provides access to multiple levels of annotations originating from multiple resources, so called <i>foundries</i>.</p>
				8
				9	<section name="cheatsheet">
				10	<ul>
				11	<li><strong>base</strong>
				12	<ul>
				13	<li>Supports two types of spans: <strong><s></strong> for sentences and <strong><p></strong> for paragraphs - this will likely change in the next index version. These spans lack prefix information!</li>
				14	</ul>
				15	</li>
				16	<li><strong>cnx</strong>
				17	<ul>
				18	<li><strong>l</strong> (Token:Lemma): All lemmas are written in lower case. Composita are split, e.g. the token "Leitfähigkeit" is matched by the lemmas "leit" and "fähigkeit" - not by the lemma "leitfähigkeit"</li>
				19	<li><strong>p</strong> (Token:Part of Speech): All pos infos are written in capital letters and are based on STTS</li>
				20	<li><strong>syn</strong> (Token:Syntactical information): Includes token based information like @PREMOD, @NH, @MAIN ...</li>
				21	<li><strong>m</strong> (Token:Morphosyntactical information): Includes information about tense ("PRES" ...), mode ("IND&qut;), number ("PL" ...) etc.</li>
				22	<li><strong>c</strong> (Span:Phrases): Only nominal phrases are available and all nominal phrases are written in lower case ("np")</li>
				23	</ul>
				24	</li>
				25	<li><strong>corenlp</strong>
				26	<ul>
				27	<li><strong>ne_hgc_175m_600</strong> (Token:Named Entity): Contains named entities like "I-PER", "I-ORG" etc. </li>
				28	<li><strong>ne_dewac_175_175m_600</strong> (Token:Named Entity): see above</li>
				29	</ul>
				30	</li>
				31	<li><strong>tt</strong>
				32	<ul>
				33	<li><strong>l</strong> (Token:Lemma): All non-noun lemmas are written in lower case, nouns are written upper case. Composita stay intact (e.g. "Normalbedingung")</li>
				34	<li><strong>p</strong> (Token:Part of Speech): All pos infos are written in capital letters and are based on STTS</li>
				35	</ul>
				36	</li>
				37	<li><strong>mate</strong>
				38	<ul>
				39	<li><strong>l</strong> (Token:Lemma): All lemmas are written in lower case. Composita stay intact (e.g. "buchstabenbezeichnung")</li>
				40	<li><strong>p</strong> (Token:Part of Speech): All pos infos are written in capital letters and are based on STTS</li>
				41	<li><strong>m</strong> (Token:Morphosyntactical information): Includes information about tense ("tense:pres" ...), mode ("mood:ind&qut;), number ("number:pl" ...), gender ("gender:masc" etc.</li>
				42	</ul>
				43	</li>
				44	<li><strong>opennlp</strong>
				45	<ul>
				46	<li><strong>p</strong> (Token:Part of Speech): All pos infos are written in capital letters and are based on STTS</li>
				47	</ul>
				48	</li>
				49	<li><strong>xip</strong>
				50	<ul>
				51	<li><strong>l</strong> (Token:Lemma): All non-noun lemmas are written in lower case, nouns are written upper case. Composita are split, e.g. the token "Leitfähigkeit" is matched by the lemmas "leiten" and "Fähigkeit" - and by a merged and pretty useless "leitenfähigkeit" (This is going to change)</li>
				52	<li><strong>p</strong> (Token:Part of Speech): All pos infos are written in capital letters and are based on STTS</li>
				53	<li><strong>c</strong> (Span:Phrases): Some phrases to create sentences, all upper case ("NP", "NPA", "NOUN", "VERB", "PREP", "AP" ...)</li>
				54	</ul>
				55	</li>
				56	</ul>
				57	</section>
				58
				59	<h3>Default Foundries</h3>
				60
				61	<p>For queries on specific layers without given foundries, KorAP provides default foundries, that can be overwritten by user configurations. The default foundries apply to the following layers:</p>
				62
				63	<ul>
				64	<li><strong>orth</strong>: opennlp </li>
				65	<li><strong>lemma</strong>: opennlp </li>
				66	<li><strong>pos</strong>: mate</li>
				67	</ul>
				68
				69	<blockquote>
				70	<p>In the Lucene backend, the <strong>orth</strong> layer can be bound to a specific foundry, as only one tokenization is supported.</p>
				71	</blockquote>
				72
				73	% end