Blame - templates/doc/ql/fcsql.html.ep - KorAP/Kalamar

blob: 735ce8bee78ff5807753e8341284663b54ae60b1 [file] [log] [blame]

Akron	ff7811f	2017-12-19 12:40:41 +0100	[diff] [blame]	1	% layout 'main', title => 'KorAP: FCSQL';
				2
Akron	9490e3b	2019-10-17 12:26:29 +0200	[diff] [blame]	3	%= page_title
Akron	ff7811f	2017-12-19 12:40:41 +0100	[diff] [blame]	4
margaretha	14ce4d6	2019-07-17 18:38:45 +0200	[diff] [blame]	5	<p>FCS-QL is a query language specifically developed to accomodate advanced search in
Akron	9490e3b	2019-10-17 12:26:29 +0200	[diff] [blame]	6	<%= ext_link_to 'Clarin Federated Content Search (FCS)', "https://www.clarin.eu/content/federated-content-search-clarin-fcs" %>,
margaretha	14ce4d6	2019-07-17 18:38:45 +0200	[diff] [blame]	7	that allows searching through annotated data.
				8	Accordingly, FCS-QL is primarily intended to represent queries involving annotation layers
Akron	3cfa26d	2019-10-24 15:17:34 +0200	[diff] [blame]	9	such as part-of-speech and lemma. FCS-QL grammar is fairly similar to <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %> since it was
margaretha	14ce4d6	2019-07-17 18:38:45 +0200	[diff] [blame]	10	built heavily based on Poliqarp/CQP.</p>
				11
				12	<p>In FCS-QL, foundries are called qualifiers. A combination of a foundry and a layer is
				13	separated with a colon, for example the lemma layer of Tree Tagger is represented as
				14	<code>tt:lemma</code>. KorAP supports the following annotation layers for FCS-QL:</p>
				15
				16	<dl>
				17	<dt>text</dt>
				18	<dd>surface text</dd>
				19	<dt>lemma</dt>
				20	<dd>lemmatisation</dd>
				21	<dt>pos</dt>
				22	<dd>part-of-speech</dd>
				23	</dl>
				24
				25	<section id="simple-queries">
				26	<h3>Simple queries</h3>
				27	<p>Querying simple terms</p>
				28	%= doc_query fcsql => '"Semmel"', cutoff => 1
				29
				30	<p>Querying regular expressions</p>
				31	%= doc_query fcsql => '"gie(ss\|ß)en"', cutoff => 1
				32
				33	<p>Querying case-insensitive terms</p>
				34	%= doc_query fcsql => '"essen"/c', cutoff => 1
				35	</section>
				36
				37	<section id="complex-queries">
				38	<h3>Complex queries</h3>
				39
				40	<h4>Querying using layers</h4>
				41
				42	<p>Querying a simple term using the layer for surface text</p>
				43	%= doc_query fcsql => '[text = "Semmel"]', cutoff => 1
				44	%= doc_query fcsql => '[text = "essen"/c]', cutoff => 1
				45
Akron	3cfa26d	2019-10-24 15:17:34 +0200	[diff] [blame]	46	<p>Querying adverbs from the <%= embedded_link_to 'doc', 'default foundry', 'data', 'annotation' %>.</p>
margaretha	14ce4d6	2019-07-17 18:38:45 +0200	[diff] [blame]	47	%= doc_query fcsql => '[pos="ADV"]', cutoff => 1
				48
				49
				50	<h4>Querying using qualifiers (foundries)</h4>
				51
				52	<p>Querying adverbs annotated by Opennlp</p>
				53	%= doc_query fcsql => '[opennlp:pos="ADV"]', cutoff => 1
				54
				55	<p>Querying tokens with a lemma from Tree tagger</p>
				56	%= doc_query fcsql => '[tt:lemma = "leben"]', cutoff => 1
				57
				58
				59	<h4>Querying using boolean operators</h4>
				60
				61	<p>All tokens with lemma <code>"leben"</code> which are also finite verbs</p>
				62	%= doc_query fcsql => '[tt:lemma ="leben" & pos="VVFIN"]', cutoff => 1
				63
				64	<p>All tokens with lemma <code>"leben"</code> which are also finite verbs or perfect participle</p>
				65	%= doc_query fcsql => '[tt:lemma ="leben" & (pos="VVFIN" \| pos="VVPP")]', cutoff => 1
				66
				67
				68	<h4>Sequence queries</h4>
				69
				70	<p>Combining two terms in a sequence query</p>
				71	%= doc_query fcsql => '[opennlp:pos="ADJA"] "leben"', cutoff => 1
				72
				73
				74	<h4>Empty token</h4>
Akron	3cfa26d	2019-10-24 15:17:34 +0200	[diff] [blame]	75	<p>Like in <%= embedded_link_to 'doc', 'Poliqarp', 'ql', 'poliqarp-plus' %>, an empty token is signified by <code>[]</code>
margaretha	14ce4d6	2019-07-17 18:38:45 +0200	[diff] [blame]	76	which means any token. Due to the
				77	excessive number of results, empty token is not allowed to be used independently, but in
				78	combination with other tokens, for instance in a sequence query.</p>
				79	%= doc_query fcsql => '[] "Wolke"', cutoff => 1
				80
				81
				82	<h4>Negation</h4>
				83	<p>Similar to empty token, negation is not allowed to be used independently due to the
				84	excessive number of results. However, it can be used in a sequence query.</p>
				85	%= doc_query fcsql => '[pos != "ADJA"] "Buch"', cutoff => 1
				86
				87
				88	<h4>Querying using quantifier</h4>
				89	<p>Quantifiers indicate repetition of a term, for instance it can be used to search for
				90	exactly two consecutive occurrences of <code>"die"</code>.</p>
				91	%= doc_query fcsql => '"die" {2}', cutoff => 1
				92
				93	<p>Quantifiers are also useful to search for the occurrences of any tokens near other
				94	specific tokens, for instance two to three occurrences of any token between <code>"wir"</code> and
				95	<code>"leben"</code>.</p>
				96	%= doc_query fcsql => '"wir" []{2,3} "leben"', cutoff => 1
				97
				98
				99	<h4>Querying a term within a sentence</h4>
				100	%= doc_query fcsql => '"Boot" within s', cutoff => 1
				101
				102	</section>