blob: f3edb853702b6ee88f43eff81b62180210442c2f [file] [log] [blame]
Akronc13a1702016-03-15 19:33:14 +01001=pod
2
3=encoding utf8
4
5=head1 NAME
6
Akron5c71a852016-10-31 16:00:33 +01007korapxml2krill - Merge KorapXML data and create Krill documents
Akronc13a1702016-03-15 19:33:14 +01008
9
10=head1 SYNOPSIS
11
Akron5c71a852016-10-31 16:00:33 +010012 korapxml2krill [archive|extract] --input <directory|archive> [options]
Akron2fd402b2016-10-27 21:26:48 +020013
Akronc13a1702016-03-15 19:33:14 +010014
15=head1 DESCRIPTION
16
Akron5c71a852016-10-31 16:00:33 +010017L<KorAP::XML::Krill> is a library to convert KorAP-XML documents to files
18compatible with the L<Krill|https://github.com/KorAP/Krill> indexer.
Akron8f69d632020-01-15 16:58:11 +010019The C<korapxml2krill> command line tool is a simple wrapper of this library.
Akronc13a1702016-03-15 19:33:14 +010020
21
Akron5c71a852016-10-31 16:00:33 +010022=head1 INSTALLATION
Akronc13a1702016-03-15 19:33:14 +010023
Akron5c71a852016-10-31 16:00:33 +010024The preferred way to install L<KorAP::XML::Krill> is to use L<cpanm|App::cpanminus>.
Akronc13a1702016-03-15 19:33:14 +010025
Akron5c71a852016-10-31 16:00:33 +010026 $ cpanm https://github.com/KorAP/KorAP-XML-Krill.git
Akronc13a1702016-03-15 19:33:14 +010027
Akron5c71a852016-10-31 16:00:33 +010028In case everything went well, the C<korapxml2krill> tool will
29be available on your command line immediately.
Akron6eff23b2018-09-24 10:31:20 +020030Minimum requirement for L<KorAP::XML::Krill> is Perl 5.16.
Akron5c71a852016-10-31 16:00:33 +010031In addition to work with zip archives, the C<unzip> tool needs to be present.
Akronc13a1702016-03-15 19:33:14 +010032
Akron5c71a852016-10-31 16:00:33 +010033=head1 ARGUMENTS
Akronc13a1702016-03-15 19:33:14 +010034
Akron5c71a852016-10-31 16:00:33 +010035 $ korapxml2krill -z --input <directory> --output <filename>
36
37Without arguments, C<korapxml2krill> converts a directory of a single KorAP-XML document.
38It expects the input to point to the text level folder.
39
40=over 2
41
42=item B<archive>
43
Akronf73ffb62018-06-27 12:13:59 +020044 $ korapxml2krill archive -z --input <directory|archive> --output <directory|tar>
Akron5c71a852016-10-31 16:00:33 +010045
46Converts an archive of KorAP-XML documents. It expects a directory
47(pointing to the corpus level folder) or one or more zip files as input.
48
49=item B<extract>
50
51 $ korapxml2krill extract --input <archive> --output <directory> --sigle <SIGLE>
52
53Extracts KorAP-XML documents from a zip file.
54
Akron442c4e92017-04-10 23:41:31 +020055=item B<serial>
56
57 $ korapxml2krill serial -i <archive1> -i <archive2> -o <directory> -cfg <config-file>
58
59Convert archives sequentially. The inputs are not merged but treated
60as they are (so they may be premerged or globs).
61the C<--out> directory is treated as the base directory where subdirectories
Akronf73ffb62018-06-27 12:13:59 +020062are created based on the archive name. In case the C<--to-tar> flag is given,
63the output will be a tar file.
Akron442c4e92017-04-10 23:41:31 +020064
65
Akron5c71a852016-10-31 16:00:33 +010066=back
Akrona76d8352016-10-27 16:27:32 +020067
Akron7606afa2016-10-25 16:23:49 +020068
Akron5c71a852016-10-31 16:00:33 +010069=head1 OPTIONS
Akronc13a1702016-03-15 19:33:14 +010070
Akron5c71a852016-10-31 16:00:33 +010071=over 2
Akronc13a1702016-03-15 19:33:14 +010072
Akron5c71a852016-10-31 16:00:33 +010073=item B<--input|-i> <directory|zip file>
Akrona76d8352016-10-27 16:27:32 +020074
Akron5c71a852016-10-31 16:00:33 +010075Directory or zip file(s) of documents to convert.
Akronc13a1702016-03-15 19:33:14 +010076
Akron5c71a852016-10-31 16:00:33 +010077Without arguments, C<korapxml2krill> expects a folder of a single KorAP-XML
Akronf1a1de92016-11-02 17:32:12 +010078document, while C<archive> expects a KorAP-XML corpus folder or a zip
79file to batch process multiple files.
80C<extract> expects zip files only.
Akronc13a1702016-03-15 19:33:14 +010081
Akron5c71a852016-10-31 16:00:33 +010082C<archive> supports multiple input zip files with the constraint,
83that the first archive listed contains all primary data files
84and all meta data files.
Akrona76d8352016-10-27 16:27:32 +020085
Akron5c71a852016-10-31 16:00:33 +010086 -i file/news.zip -i file/news.malt.zip -i "#file/news.tt.zip"
Akronc13a1702016-03-15 19:33:14 +010087
Akron821db3d2017-04-06 21:19:31 +020088Input may also be defined using BSD glob wildcards.
89
90 -i 'file/news*.zip'
91
92The extended input array will be sorted in length order, so the shortest
93path needs to contain all primary data files and all meta data files.
94
Akron5c71a852016-10-31 16:00:33 +010095(The directory structure follows the base directory format,
96that may include a C<.> root folder.
97In this case further archives lacking a C<.> root folder
98need to be passed with a hash sign in front of the archive's name.
99This may require to quote the parameter.)
Akronc13a1702016-03-15 19:33:14 +0100100
Akron5c71a852016-10-31 16:00:33 +0100101To support zip files, a version of C<unzip> needs to be installed that is
102compatible with the archive file.
Akronc13a1702016-03-15 19:33:14 +0100103
Akron5c71a852016-10-31 16:00:33 +0100104B<The root folder switch using the hash sign is experimental and
105may vanish in future versions.>
Akronc13a1702016-03-15 19:33:14 +0100106
Akronf73ffb62018-06-27 12:13:59 +0200107
Akron442c4e92017-04-10 23:41:31 +0200108=item B<--input-base|-ib> <directory>
109
110The base directory for inputs.
111
112
Akron5c71a852016-10-31 16:00:33 +0100113=item B<--output|-o> <directory|file>
Akronc13a1702016-03-15 19:33:14 +0100114
Akron5c71a852016-10-31 16:00:33 +0100115Output folder for archive processing or
116document name for single output (optional),
117writes to C<STDOUT> by default
118(in case C<output> is not mandatory due to further options).
Akronc13a1702016-03-15 19:33:14 +0100119
Akron5c71a852016-10-31 16:00:33 +0100120=item B<--overwrite|-w>
Akronc13a1702016-03-15 19:33:14 +0100121
Akron5c71a852016-10-31 16:00:33 +0100122Overwrite files that already exist.
Akron7606afa2016-10-25 16:23:49 +0200123
Akronf73ffb62018-06-27 12:13:59 +0200124
Akron3741f8b2016-12-21 19:55:21 +0100125=item B<--token|-t> <foundry>#<file>
Akrona5920b12016-06-29 18:51:21 +0200126
Akron5c71a852016-10-31 16:00:33 +0100127Define the default tokenization by specifying
128the name of the foundry and optionally the name
129of the layer-file. Defaults to C<OpenNLP#tokens>.
Akronf1849aa2019-12-16 23:35:33 +0100130This will directly take the file instead of running
131the layer implementation!
Akron3741f8b2016-12-21 19:55:21 +0100132
Akron8f69d632020-01-15 16:58:11 +0100133
Akron3741f8b2016-12-21 19:55:21 +0100134=item B<--base-sentences|-bs> <foundry>#<layer>
135
136Define the layer for base sentences.
137If given, this will be used instead of using C<Base#Sentences>.
Akronc29b8e12019-12-16 14:28:09 +0100138Currently C<DeReKo#Structure> and C<DGD#Structure> are the only additional
139layers supported.
Akron3741f8b2016-12-21 19:55:21 +0100140
141 Defaults to unset.
142
143
144=item B<--base-paragraphs|-bp> <foundry>#<layer>
145
146Define the layer for base paragraphs.
147If given, this will be used instead of using C<Base#Paragraphs>.
148Currently C<DeReKo#Structure> is the only additional layer supported.
149
150 Defaults to unset.
151
152
Akron821db3d2017-04-06 21:19:31 +0200153=item B<--base-pagebreaks|-bpb> <foundry>#<layer>
154
155Define the layer for base pagebreaks.
156Currently C<DeReKo#Structure> is the only layer supported.
157
158 Defaults to unset.
159
160
Akron5c71a852016-10-31 16:00:33 +0100161=item B<--skip|-s> <foundry>[#<layer>]
162
163Skip specific annotations by specifying the foundry
164(and optionally the layer with a C<#>-prefix),
165e.g. C<Mate> or C<Mate#Morpho>. Alternatively you can skip C<#ALL>.
166Can be set multiple times.
167
Akronf73ffb62018-06-27 12:13:59 +0200168
Akron5c71a852016-10-31 16:00:33 +0100169=item B<--anno|-a> <foundry>#<layer>
170
171Convert specific annotations by specifying the foundry
172(and optionally the layer with a C<#>-prefix),
173e.g. C<Mate> or C<Mate#Morpho>.
174Can be set multiple times.
175
Akronf73ffb62018-06-27 12:13:59 +0200176
Akron5c71a852016-10-31 16:00:33 +0100177=item B<--primary|-p>
178
179Output primary data or not. Defaults to C<true>.
180Can be flagged using C<--no-primary> as well.
181This is I<deprecated>.
182
Akronf73ffb62018-06-27 12:13:59 +0200183
Akroned9baf02019-01-22 17:03:25 +0100184=item B<--non-word-tokens|-nwt>
185
186Tokenize non-word tokens like word tokens (defined as matching
187C</[\d\w]/>). Useful to treat punctuations as tokens.
188
189 Defaults to unset.
190
Akronf1849aa2019-12-16 23:35:33 +0100191
192=item B<--non-verbal-tokens|-nvt>
193
194Tokenize non-verbal tokens marked as in the primary data as
195the unicode symbol 'Black Vertical Rectangle' aka \x25ae.
196
197 Defaults to unset.
198
199
Akron5c71a852016-10-31 16:00:33 +0100200=item B<--jobs|-j>
201
202Define the number of concurrent jobs in seperated forks
203for archive processing.
204Defaults to C<0> (everything runs in a single process).
Akronf73ffb62018-06-27 12:13:59 +0200205
206If C<sequential-extraction> is not set to false, this will
207also apply to extraction.
208
Akron821db3d2017-04-06 21:19:31 +0200209Pass -1, and the value will be set automatically to 5
210times the number of available cores.
Akron5c71a852016-10-31 16:00:33 +0100211This is I<experimental>.
212
Akronf73ffb62018-06-27 12:13:59 +0200213
Akron263274c2019-02-07 09:48:30 +0100214=item B<--koral|-k>
215
216Version of the output format. Supported versions are:
217C<0> for legacy serialization, C<0.03> for serialization
218with metadata fields as key-values on the root object,
219C<0.4> for serialization with metadata fields as a list
220of C<"@type":"koral:field"> objects.
221
222Currently defaults to C<0.03>.
223
224
Akronf73ffb62018-06-27 12:13:59 +0200225=item B<--sequential-extraction|-se>
226
227Flag to indicate, if the C<jobs> value also applies to extraction.
228Some systems may have problems with extracting multiple archives
229to the same folder at the same time.
230Can be flagged using C<--no-sequential-extraction> as well.
231Defaults to C<false>.
232
233
Akron5c71a852016-10-31 16:00:33 +0100234=item B<--meta|-m>
235
236Define the metadata parser to use. Defaults to C<I5>.
237Metadata parsers can be defined in the C<KorAP::XML::Meta> namespace.
238This is I<experimental>.
239
Akronf73ffb62018-06-27 12:13:59 +0200240
Akron5c71a852016-10-31 16:00:33 +0100241=item B<--pretty|-y>
242
243Pretty print JSON output. Defaults to C<false>.
244This is I<deprecated>.
245
Akronf73ffb62018-06-27 12:13:59 +0200246
Akron5c71a852016-10-31 16:00:33 +0100247=item B<--gzip|-z>
248
249Compress the output.
250Expects a defined C<output> file in single processing.
251
Akronf73ffb62018-06-27 12:13:59 +0200252
Akron5c71a852016-10-31 16:00:33 +0100253=item B<--cache|-c>
254
255File to mmap a cache (using L<Cache::FastMmap>).
256Defaults to C<korapxml2krill.cache> in the calling directory.
257
Akronf73ffb62018-06-27 12:13:59 +0200258
Akron5c71a852016-10-31 16:00:33 +0100259=item B<--cache-size|-cs>
260
261Size of the cache. Defaults to C<50m>.
262
Akronf73ffb62018-06-27 12:13:59 +0200263
Akron5c71a852016-10-31 16:00:33 +0100264=item B<--cache-init|-ci>
265
266Initialize cache file.
267Can be flagged using C<--no-cache-init> as well.
268Defaults to C<true>.
269
Akronf73ffb62018-06-27 12:13:59 +0200270
Akron5c71a852016-10-31 16:00:33 +0100271=item B<--cache-delete|-cd>
272
273Delete cache file after processing.
274Can be flagged using C<--no-cache-delete> as well.
275Defaults to C<true>.
276
Akronf73ffb62018-06-27 12:13:59 +0200277
Akron636aa112017-04-07 18:48:56 +0200278=item B<--config|-cfg>
279
280Configure the parameters of your call in a file
281of key-value pairs with whitespace separator
282
283 overwrite 1
284 token DeReKo#Structure
285 ...
286
287Supported parameters are:
Akron442c4e92017-04-10 23:41:31 +0200288C<overwrite>, C<gzip>, C<jobs>, C<input-base>,
Akron636aa112017-04-07 18:48:56 +0200289C<token>, C<log>, C<cache>, C<cache-size>, C<cache-delete>, C<meta>,
Akron57510c12019-01-04 14:58:53 +0100290C<output>, C<koral>,
291C<tempary-extract>, C<sequential-extraction>,
Akronf73ffb62018-06-27 12:13:59 +0200292C<base-sentences>, C<base-paragraphs>,
293C<base-pagebreaks>,
294C<skip> (semicolon separated), C<sigle>
Akron636aa112017-04-07 18:48:56 +0200295(semicolon separated), C<anno> (semicolon separated).
296
Akronf73ffb62018-06-27 12:13:59 +0200297Configuration parameters will always be overwritten by
298passed parameters.
299
300
Akron81500102017-04-07 20:45:44 +0200301=item B<--temporary-extract|-te>
302
303Only valid for the C<archive> command.
304
305This will first extract all files into a
306directory and then will archive.
307If the directory is given as C<:temp:>,
308a temporary directory is used.
309This is especially useful to avoid
310massive unzipping and potential
311network latency.
Akron636aa112017-04-07 18:48:56 +0200312
Akronf73ffb62018-06-27 12:13:59 +0200313
Akronc93a0802019-07-11 15:48:34 +0200314=item B<--to-tar>
315
316Only valid for the C<archive> command.
317
318Writes the output into a tar archive.
319
320
Akron5c71a852016-10-31 16:00:33 +0100321=item B<--sigle|-sg>
322
323Extract the given texts.
324Can be set multiple times.
325I<Currently only supported on C<extract>.>
326Sigles have the structure C<Corpus>/C<Document>/C<Text>.
327In case the C<Text> path is omitted, the whole document will be extracted.
328On the document level, the postfix wildcard C<*> is supported.
329
Akronf73ffb62018-06-27 12:13:59 +0200330
Akron5c71a852016-10-31 16:00:33 +0100331=item B<--log|-l>
332
333The L<Log4perl> log level, defaults to C<ERROR>.
334
Akronf73ffb62018-06-27 12:13:59 +0200335
Akron5c71a852016-10-31 16:00:33 +0100336=item B<--help|-h>
337
338Print this document.
339
Akronf73ffb62018-06-27 12:13:59 +0200340
Akron5c71a852016-10-31 16:00:33 +0100341=item B<--version|-v>
342
343Print version information.
344
345=back
346
Akronf73ffb62018-06-27 12:13:59 +0200347
Akron5c71a852016-10-31 16:00:33 +0100348=head1 ANNOTATION SUPPORT
349
350L<KorAP::XML::Krill> has built-in importer for some annotation foundries and layers
351developed in the KorAP project that are part of the KorAP preprocessing pipeline.
352The base foundry with paragraphs, sentences, and the text element are mandatory for
353L<Krill|https://github.com/KorAP/Krill>.
354
Akron821db3d2017-04-06 21:19:31 +0200355 Base
356 #Paragraphs
357 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100358
Akron821db3d2017-04-06 21:19:31 +0200359 Connexor
360 #Morpho
361 #Phrase
362 #Sentences
363 #Syntax
Akron5c71a852016-10-31 16:00:33 +0100364
Akron821db3d2017-04-06 21:19:31 +0200365 CoreNLP
366 #Constituency
367 #Morpho
368 #NamedEntities
369 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100370
Akronf73ffb62018-06-27 12:13:59 +0200371 CMC
372 #Morpho
373
Akron821db3d2017-04-06 21:19:31 +0200374 DeReKo
375 #Structure
Akron5c71a852016-10-31 16:00:33 +0100376
Akron57510c12019-01-04 14:58:53 +0100377 DGD
378 #Morpho
Akronc29b8e12019-12-16 14:28:09 +0100379 #Structure
Akron57510c12019-01-04 14:58:53 +0100380
Akron821db3d2017-04-06 21:19:31 +0200381 DRuKoLa
382 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100383
Akron821db3d2017-04-06 21:19:31 +0200384 Glemm
385 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100386
Akroned9baf02019-01-22 17:03:25 +0100387 HNC
388 #Morpho
389
Akronf73ffb62018-06-27 12:13:59 +0200390 LWC
391 #Dependency
392
Akron821db3d2017-04-06 21:19:31 +0200393 Malt
394 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100395
Akron821db3d2017-04-06 21:19:31 +0200396 MarMoT
397 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100398
Akron821db3d2017-04-06 21:19:31 +0200399 Mate
400 #Dependency
401 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100402
Akron821db3d2017-04-06 21:19:31 +0200403 MDParser
404 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100405
Akron821db3d2017-04-06 21:19:31 +0200406 OpenNLP
407 #Morpho
408 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100409
Akron821db3d2017-04-06 21:19:31 +0200410 Sgbr
411 #Lemma
412 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100413
Akron7d5e6382019-08-08 16:36:27 +0200414 Talismane
415 #Dependency
416 #Morpho
417
Akron821db3d2017-04-06 21:19:31 +0200418 TreeTagger
419 #Morpho
420 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100421
Akron821db3d2017-04-06 21:19:31 +0200422 XIP
423 #Constituency
424 #Morpho
425 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100426
Akron5c71a852016-10-31 16:00:33 +0100427
428More importers are in preparation.
429New annotation importers can be defined in the C<KorAP::XML::Annotation> namespace.
430See the built-in annotation importers as examples.
Akronc13a1702016-03-15 19:33:14 +0100431
Akronf73ffb62018-06-27 12:13:59 +0200432
Akron8f69d632020-01-15 16:58:11 +0100433=head1 About KorAP-XML
434
435KorAP-XML (Bański et al. 2012) is an implementation of the KorAP
436data model (Bański et al. 2013), where text data are stored physically
437separated from their interpretations (i.e. annotations).
438A text document in KorAP-XML therefore consists of several files
439containing primary data, metadata and annotations.
440
441The structure of a single KorAP-XML document can be as follows:
442
443 - data.xml
444 - header.xml
445 + base
446 - tokens.xml
447 - ...
448 + struct
449 - structure.xml
450 - ...
451 + corenlp
452 - morpho.xml
453 - constituency.xml
454 - ...
455 + tree_tagger
456 - morpho.xml
457 - ...
458 - ...
459
460The C<data.xml> contains the primary data, the C<header.xml> contains
461the metadata, and the annotation layers are stored in subfolders
462like C<base>, C<struct> or C<corenlp>
463(so-called "foundries"; Bański et al. 2013).
464
465Metadata is available in the TEI-P5 variant I5
Akrond4c5c102020-02-11 11:47:59 +0100466(Lüngen and Sperberg-McQueen 2012). See the documentation in
467L<KorAP::XML::Meta::I5> for translatable fields.
468
469Annotations correspond to a variant of the TEI-P5 feature structures
470(TEI Consortium; Lee et al. 2004).
Akron72bc5222020-02-06 16:00:13 +0100471Annotation feature structures refer to character sequences of the primary text
472inside the C<text> element of the C<data.xml>.
473A single annotation containing the lemma of a token can have the following structure:
474
475 <span from="0" to="3">
476 <fs type="lex" xmlns="http://www.tei-c.org/ns/1.0">
477 <f name="lex">
478 <fs>
479 <f name="lemma">zum</f>
480 </fs>
481 </f>
482 </fs>
483 </span>
484
485The C<from> and C<to> attributes are refering to the character span
486in the primary text.
487Depending on the kind of annotation (e.g. token-based, span-based, relation-based),
488the structure may vary. See L<KorAP::XML::Annotation::*> for various
489annotation preprocessors.
Akron8f69d632020-01-15 16:58:11 +0100490
491Multiple KorAP-XML documents are organized on three levels following
492the "IDS Textmodell" (Lüngen and Sperberg-McQueen 2012):
493corpus E<gt> document E<gt> text. On each level metadata information
494can be stored, that C<korapxml2krill> will merge to a single metadata
495object per text. A corpus is therefore structured as follows:
496
497 + <corpus>
498 - header.xml
499 + <document>
500 - header.xml
501 + <text>
502 - data.xml
503 - header.xml
504 - ...
505 - ...
506
507A single text can be identified by the concatenation of
508the corpus identifier, the document identifier and the text identifier.
509This identifier is called the text sigle
510(e.g. a text with the identifier C<18486> in the document C<060> in the
511corpus C<WPD17> has the text sigle C<WPD17/060/18486>, see C<--sigle>).
512
513These corpora are often stored in zip files, with which C<korapxml2krill>
514can deal with. Corpora may also be split in multiple zip archives
515(e.g. one zip file per foundry), which is also supported (see C<--input>).
516
517Examples for KorAP-XML files are included in L<KorAP::XML::Krill>
518in form of a test suite.
519The resulting JSON format merges all annotation layers
520based on a single token stream.
521
522=head2 References
523
524Piotr Bański, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
525KorAP data model: first approximation, December.
526
527Piotr Bański, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
528"The New IDS Corpus Analysis Platform: Challenges and Prospects",
529Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012).
530L<PDF|http://www.lrec-conf.org/proceedings/lrec2012/pdf/789_Paper.pdf>
531
532Piotr Bański, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
533"Robust corpus architecture: a new look at virtual collections and data access",
534Corpus Linguistics 2013. Abstract Book. Lancaster: UCREL, pp. 23-25.
535L<PDF|https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4485/file/Ba%c5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf>
536
537Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Thierry Declerck,
538Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly and Claude Roux (2004):
539"Towards an international standard on featurestructure representation",
540Proceedings of the fourth International Conference on Language Resources and Evaluation (LREC 2004),
541pp. 373-376.
542L<PDF|http://www.lrec-conf.org/proceedings/lrec2004/pdf/687.pdf>
543
544Harald Lüngen and C. M. Sperberg-McQueen (2012):
545"A TEI P5 Document Grammar for the IDS Text Model",
546Journal of the Text Encoding Initiative, Issue 3 | November 2012.
547L<PDF|https://journals.openedition.org/jtei/pdf/508>
548
549TEI Consortium, eds:
550"Feature Structures",
551Guidelines for Electronic Text Encoding and Interchange.
552L<html|https://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html>
553
Akronc13a1702016-03-15 19:33:14 +0100554=head1 AVAILABILITY
555
556 https://github.com/KorAP/KorAP-XML-Krill
557
558
559=head1 COPYRIGHT AND LICENSE
560
Akron8f69d632020-01-15 16:58:11 +0100561Copyright (C) 2015-2020, L<IDS Mannheim|https://www.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100562
Akron8f69d632020-01-15 16:58:11 +0100563Author: L<Nils Diewald|https://nils-diewald.de/>
Akron81500102017-04-07 20:45:44 +0200564
Akron5c71a852016-10-31 16:00:33 +0100565Contributor: Eliza Margaretha
566
567L<KorAP::XML::Krill> is developed as part of the L<KorAP|http://korap.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100568Corpus Analysis Platform at the
Akron94262ce2019-02-28 21:42:43 +0100569L<Leibniz Institute for the German Language (IDS)|http://ids-mannheim.de/>,
Akronc13a1702016-03-15 19:33:14 +0100570member of the
Akronf1849aa2019-12-16 23:35:33 +0100571L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/>.
Akronc13a1702016-03-15 19:33:14 +0100572
Akron5c71a852016-10-31 16:00:33 +0100573This program is free software published under the
Akronc13a1702016-03-15 19:33:14 +0100574L<BSD-2 License|https://raw.githubusercontent.com/KorAP/KorAP-XML-Krill/master/LICENSE>.
575
576=cut