blob: 7cc0d20af2dec18a33389961d55785164fe4c8c2 [file] [log] [blame]
Akronc13a1702016-03-15 19:33:14 +01001=pod
2
3=encoding utf8
4
5=head1 NAME
6
Akron42f48c12020-02-14 13:08:13 +01007korapxml2krill - Merge KorAP-XML data and create Krill documents
Akronc13a1702016-03-15 19:33:14 +01008
9
10=head1 SYNOPSIS
11
Akron5c71a852016-10-31 16:00:33 +010012 korapxml2krill [archive|extract] --input <directory|archive> [options]
Akron2fd402b2016-10-27 21:26:48 +020013
Akronc13a1702016-03-15 19:33:14 +010014
15=head1 DESCRIPTION
16
Akron5c71a852016-10-31 16:00:33 +010017L<KorAP::XML::Krill> is a library to convert KorAP-XML documents to files
18compatible with the L<Krill|https://github.com/KorAP/Krill> indexer.
Akron8f69d632020-01-15 16:58:11 +010019The C<korapxml2krill> command line tool is a simple wrapper of this library.
Akronc13a1702016-03-15 19:33:14 +010020
21
Akron5c71a852016-10-31 16:00:33 +010022=head1 INSTALLATION
Akronc13a1702016-03-15 19:33:14 +010023
Akron5c71a852016-10-31 16:00:33 +010024The preferred way to install L<KorAP::XML::Krill> is to use L<cpanm|App::cpanminus>.
Akronc13a1702016-03-15 19:33:14 +010025
Akron5c71a852016-10-31 16:00:33 +010026 $ cpanm https://github.com/KorAP/KorAP-XML-Krill.git
Akronc13a1702016-03-15 19:33:14 +010027
Akron5c71a852016-10-31 16:00:33 +010028In case everything went well, the C<korapxml2krill> tool will
29be available on your command line immediately.
Akron6eff23b2018-09-24 10:31:20 +020030Minimum requirement for L<KorAP::XML::Krill> is Perl 5.16.
Akron0b04b312020-10-30 17:39:18 +010031Optional support for L<Sys::Info> to calculate available cores.
Akron5c71a852016-10-31 16:00:33 +010032In addition to work with zip archives, the C<unzip> tool needs to be present.
Akronc13a1702016-03-15 19:33:14 +010033
Akron5c71a852016-10-31 16:00:33 +010034=head1 ARGUMENTS
Akronc13a1702016-03-15 19:33:14 +010035
Akron5c71a852016-10-31 16:00:33 +010036 $ korapxml2krill -z --input <directory> --output <filename>
37
38Without arguments, C<korapxml2krill> converts a directory of a single KorAP-XML document.
39It expects the input to point to the text level folder.
40
41=over 2
42
43=item B<archive>
44
Akronf73ffb62018-06-27 12:13:59 +020045 $ korapxml2krill archive -z --input <directory|archive> --output <directory|tar>
Akron5c71a852016-10-31 16:00:33 +010046
47Converts an archive of KorAP-XML documents. It expects a directory
48(pointing to the corpus level folder) or one or more zip files as input.
49
50=item B<extract>
51
52 $ korapxml2krill extract --input <archive> --output <directory> --sigle <SIGLE>
53
54Extracts KorAP-XML documents from a zip file.
55
Akron442c4e92017-04-10 23:41:31 +020056=item B<serial>
57
58 $ korapxml2krill serial -i <archive1> -i <archive2> -o <directory> -cfg <config-file>
59
60Convert archives sequentially. The inputs are not merged but treated
61as they are (so they may be premerged or globs).
62the C<--out> directory is treated as the base directory where subdirectories
Akronf73ffb62018-06-27 12:13:59 +020063are created based on the archive name. In case the C<--to-tar> flag is given,
64the output will be a tar file.
Akron442c4e92017-04-10 23:41:31 +020065
66
Akron5c71a852016-10-31 16:00:33 +010067=back
Akrona76d8352016-10-27 16:27:32 +020068
Akron7606afa2016-10-25 16:23:49 +020069
Akron5c71a852016-10-31 16:00:33 +010070=head1 OPTIONS
Akronc13a1702016-03-15 19:33:14 +010071
Akron5c71a852016-10-31 16:00:33 +010072=over 2
Akronc13a1702016-03-15 19:33:14 +010073
Akron5c71a852016-10-31 16:00:33 +010074=item B<--input|-i> <directory|zip file>
Akrona76d8352016-10-27 16:27:32 +020075
Akron5c71a852016-10-31 16:00:33 +010076Directory or zip file(s) of documents to convert.
Akronc13a1702016-03-15 19:33:14 +010077
Akron5c71a852016-10-31 16:00:33 +010078Without arguments, C<korapxml2krill> expects a folder of a single KorAP-XML
Akronf1a1de92016-11-02 17:32:12 +010079document, while C<archive> expects a KorAP-XML corpus folder or a zip
80file to batch process multiple files.
81C<extract> expects zip files only.
Akronc13a1702016-03-15 19:33:14 +010082
Akron5c71a852016-10-31 16:00:33 +010083C<archive> supports multiple input zip files with the constraint,
84that the first archive listed contains all primary data files
85and all meta data files.
Akrona76d8352016-10-27 16:27:32 +020086
Akron5c71a852016-10-31 16:00:33 +010087 -i file/news.zip -i file/news.malt.zip -i "#file/news.tt.zip"
Akronc13a1702016-03-15 19:33:14 +010088
Akron821db3d2017-04-06 21:19:31 +020089Input may also be defined using BSD glob wildcards.
90
91 -i 'file/news*.zip'
92
93The extended input array will be sorted in length order, so the shortest
94path needs to contain all primary data files and all meta data files.
95
Akron5c71a852016-10-31 16:00:33 +010096(The directory structure follows the base directory format,
97that may include a C<.> root folder.
98In this case further archives lacking a C<.> root folder
99need to be passed with a hash sign in front of the archive's name.
100This may require to quote the parameter.)
Akronc13a1702016-03-15 19:33:14 +0100101
Akron5c71a852016-10-31 16:00:33 +0100102To support zip files, a version of C<unzip> needs to be installed that is
103compatible with the archive file.
Akronc13a1702016-03-15 19:33:14 +0100104
Akron5c71a852016-10-31 16:00:33 +0100105B<The root folder switch using the hash sign is experimental and
106may vanish in future versions.>
Akronc13a1702016-03-15 19:33:14 +0100107
Akronf73ffb62018-06-27 12:13:59 +0200108
Akron442c4e92017-04-10 23:41:31 +0200109=item B<--input-base|-ib> <directory>
110
111The base directory for inputs.
112
113
Akron5c71a852016-10-31 16:00:33 +0100114=item B<--output|-o> <directory|file>
Akronc13a1702016-03-15 19:33:14 +0100115
Akron5c71a852016-10-31 16:00:33 +0100116Output folder for archive processing or
117document name for single output (optional),
118writes to C<STDOUT> by default
119(in case C<output> is not mandatory due to further options).
Akronc13a1702016-03-15 19:33:14 +0100120
Akron5c71a852016-10-31 16:00:33 +0100121=item B<--overwrite|-w>
Akronc13a1702016-03-15 19:33:14 +0100122
Akron5c71a852016-10-31 16:00:33 +0100123Overwrite files that already exist.
Akron7606afa2016-10-25 16:23:49 +0200124
Akronf73ffb62018-06-27 12:13:59 +0200125
Akron3741f8b2016-12-21 19:55:21 +0100126=item B<--token|-t> <foundry>#<file>
Akrona5920b12016-06-29 18:51:21 +0200127
Akron5c71a852016-10-31 16:00:33 +0100128Define the default tokenization by specifying
129the name of the foundry and optionally the name
130of the layer-file. Defaults to C<OpenNLP#tokens>.
Akronf1849aa2019-12-16 23:35:33 +0100131This will directly take the file instead of running
132the layer implementation!
Akron3741f8b2016-12-21 19:55:21 +0100133
Akron8f69d632020-01-15 16:58:11 +0100134
Akron3741f8b2016-12-21 19:55:21 +0100135=item B<--base-sentences|-bs> <foundry>#<layer>
136
137Define the layer for base sentences.
138If given, this will be used instead of using C<Base#Sentences>.
Akronc29b8e12019-12-16 14:28:09 +0100139Currently C<DeReKo#Structure> and C<DGD#Structure> are the only additional
140layers supported.
Akron3741f8b2016-12-21 19:55:21 +0100141
142 Defaults to unset.
143
144
145=item B<--base-paragraphs|-bp> <foundry>#<layer>
146
147Define the layer for base paragraphs.
148If given, this will be used instead of using C<Base#Paragraphs>.
149Currently C<DeReKo#Structure> is the only additional layer supported.
150
151 Defaults to unset.
152
153
Akron821db3d2017-04-06 21:19:31 +0200154=item B<--base-pagebreaks|-bpb> <foundry>#<layer>
155
156Define the layer for base pagebreaks.
157Currently C<DeReKo#Structure> is the only layer supported.
158
159 Defaults to unset.
160
161
Akron5c71a852016-10-31 16:00:33 +0100162=item B<--skip|-s> <foundry>[#<layer>]
163
164Skip specific annotations by specifying the foundry
165(and optionally the layer with a C<#>-prefix),
166e.g. C<Mate> or C<Mate#Morpho>. Alternatively you can skip C<#ALL>.
167Can be set multiple times.
168
Akronf73ffb62018-06-27 12:13:59 +0200169
Akron5c71a852016-10-31 16:00:33 +0100170=item B<--anno|-a> <foundry>#<layer>
171
172Convert specific annotations by specifying the foundry
173(and optionally the layer with a C<#>-prefix),
174e.g. C<Mate> or C<Mate#Morpho>.
175Can be set multiple times.
176
Akronf73ffb62018-06-27 12:13:59 +0200177
Akroned9baf02019-01-22 17:03:25 +0100178=item B<--non-word-tokens|-nwt>
179
180Tokenize non-word tokens like word tokens (defined as matching
181C</[\d\w]/>). Useful to treat punctuations as tokens.
182
183 Defaults to unset.
184
Akronf1849aa2019-12-16 23:35:33 +0100185
186=item B<--non-verbal-tokens|-nvt>
187
188Tokenize non-verbal tokens marked as in the primary data as
189the unicode symbol 'Black Vertical Rectangle' aka \x25ae.
190
191 Defaults to unset.
192
193
Akron5c71a852016-10-31 16:00:33 +0100194=item B<--jobs|-j>
195
196Define the number of concurrent jobs in seperated forks
197for archive processing.
198Defaults to C<0> (everything runs in a single process).
Akronf73ffb62018-06-27 12:13:59 +0200199
200If C<sequential-extraction> is not set to false, this will
201also apply to extraction.
202
Akron821db3d2017-04-06 21:19:31 +0200203Pass -1, and the value will be set automatically to 5
Akron0b04b312020-10-30 17:39:18 +0100204times the number of available cores, in case L<Sys::Info>
205is available.
Akron5c71a852016-10-31 16:00:33 +0100206This is I<experimental>.
207
Akronf73ffb62018-06-27 12:13:59 +0200208
Akron263274c2019-02-07 09:48:30 +0100209=item B<--koral|-k>
210
211Version of the output format. Supported versions are:
212C<0> for legacy serialization, C<0.03> for serialization
213with metadata fields as key-values on the root object,
214C<0.4> for serialization with metadata fields as a list
215of C<"@type":"koral:field"> objects.
216
217Currently defaults to C<0.03>.
218
219
Akronf73ffb62018-06-27 12:13:59 +0200220=item B<--sequential-extraction|-se>
221
222Flag to indicate, if the C<jobs> value also applies to extraction.
223Some systems may have problems with extracting multiple archives
224to the same folder at the same time.
225Can be flagged using C<--no-sequential-extraction> as well.
226Defaults to C<false>.
227
228
Akron5c71a852016-10-31 16:00:33 +0100229=item B<--meta|-m>
230
231Define the metadata parser to use. Defaults to C<I5>.
232Metadata parsers can be defined in the C<KorAP::XML::Meta> namespace.
233This is I<experimental>.
234
Akronf73ffb62018-06-27 12:13:59 +0200235
Akron5c71a852016-10-31 16:00:33 +0100236=item B<--gzip|-z>
237
238Compress the output.
239Expects a defined C<output> file in single processing.
240
Akronf73ffb62018-06-27 12:13:59 +0200241
Akron5c71a852016-10-31 16:00:33 +0100242=item B<--cache|-c>
243
244File to mmap a cache (using L<Cache::FastMmap>).
245Defaults to C<korapxml2krill.cache> in the calling directory.
246
Akronf73ffb62018-06-27 12:13:59 +0200247
Akron5c71a852016-10-31 16:00:33 +0100248=item B<--cache-size|-cs>
249
250Size of the cache. Defaults to C<50m>.
251
Akronf73ffb62018-06-27 12:13:59 +0200252
Akron5c71a852016-10-31 16:00:33 +0100253=item B<--cache-init|-ci>
254
255Initialize cache file.
256Can be flagged using C<--no-cache-init> as well.
257Defaults to C<true>.
258
Akronf73ffb62018-06-27 12:13:59 +0200259
Akron5c71a852016-10-31 16:00:33 +0100260=item B<--cache-delete|-cd>
261
262Delete cache file after processing.
263Can be flagged using C<--no-cache-delete> as well.
264Defaults to C<true>.
265
Akronf73ffb62018-06-27 12:13:59 +0200266
Akron636aa112017-04-07 18:48:56 +0200267=item B<--config|-cfg>
268
269Configure the parameters of your call in a file
270of key-value pairs with whitespace separator
271
272 overwrite 1
273 token DeReKo#Structure
274 ...
275
276Supported parameters are:
Akron442c4e92017-04-10 23:41:31 +0200277C<overwrite>, C<gzip>, C<jobs>, C<input-base>,
Akron636aa112017-04-07 18:48:56 +0200278C<token>, C<log>, C<cache>, C<cache-size>, C<cache-delete>, C<meta>,
Akron57510c12019-01-04 14:58:53 +0100279C<output>, C<koral>,
280C<tempary-extract>, C<sequential-extraction>,
Akronf73ffb62018-06-27 12:13:59 +0200281C<base-sentences>, C<base-paragraphs>,
282C<base-pagebreaks>,
283C<skip> (semicolon separated), C<sigle>
Akron636aa112017-04-07 18:48:56 +0200284(semicolon separated), C<anno> (semicolon separated).
285
Akronf73ffb62018-06-27 12:13:59 +0200286Configuration parameters will always be overwritten by
287passed parameters.
288
289
Akron81500102017-04-07 20:45:44 +0200290=item B<--temporary-extract|-te>
291
292Only valid for the C<archive> command.
293
294This will first extract all files into a
295directory and then will archive.
296If the directory is given as C<:temp:>,
297a temporary directory is used.
298This is especially useful to avoid
299massive unzipping and potential
300network latency.
Akron636aa112017-04-07 18:48:56 +0200301
Akronf73ffb62018-06-27 12:13:59 +0200302
Akronc93a0802019-07-11 15:48:34 +0200303=item B<--to-tar>
304
305Only valid for the C<archive> command.
306
307Writes the output into a tar archive.
308
309
Akron5c71a852016-10-31 16:00:33 +0100310=item B<--sigle|-sg>
311
312Extract the given texts.
313Can be set multiple times.
314I<Currently only supported on C<extract>.>
315Sigles have the structure C<Corpus>/C<Document>/C<Text>.
316In case the C<Text> path is omitted, the whole document will be extracted.
317On the document level, the postfix wildcard C<*> is supported.
318
Akronf73ffb62018-06-27 12:13:59 +0200319
Akron5c71a852016-10-31 16:00:33 +0100320=item B<--log|-l>
321
322The L<Log4perl> log level, defaults to C<ERROR>.
323
Akronf73ffb62018-06-27 12:13:59 +0200324
Akron5c71a852016-10-31 16:00:33 +0100325=item B<--help|-h>
326
Akron42f48c12020-02-14 13:08:13 +0100327Print help information.
Akron5c71a852016-10-31 16:00:33 +0100328
Akronf73ffb62018-06-27 12:13:59 +0200329
Akron5c71a852016-10-31 16:00:33 +0100330=item B<--version|-v>
331
332Print version information.
333
334=back
335
Akronf73ffb62018-06-27 12:13:59 +0200336
Akron5c71a852016-10-31 16:00:33 +0100337=head1 ANNOTATION SUPPORT
338
339L<KorAP::XML::Krill> has built-in importer for some annotation foundries and layers
340developed in the KorAP project that are part of the KorAP preprocessing pipeline.
341The base foundry with paragraphs, sentences, and the text element are mandatory for
342L<Krill|https://github.com/KorAP/Krill>.
343
Akron821db3d2017-04-06 21:19:31 +0200344 Base
345 #Paragraphs
346 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100347
Akron821db3d2017-04-06 21:19:31 +0200348 Connexor
349 #Morpho
350 #Phrase
351 #Sentences
352 #Syntax
Akron5c71a852016-10-31 16:00:33 +0100353
Akron821db3d2017-04-06 21:19:31 +0200354 CoreNLP
355 #Constituency
356 #Morpho
357 #NamedEntities
358 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100359
Akronf73ffb62018-06-27 12:13:59 +0200360 CMC
361 #Morpho
362
Akron821db3d2017-04-06 21:19:31 +0200363 DeReKo
364 #Structure
Akron5c71a852016-10-31 16:00:33 +0100365
Akron57510c12019-01-04 14:58:53 +0100366 DGD
367 #Morpho
Akronc29b8e12019-12-16 14:28:09 +0100368 #Structure
Akron57510c12019-01-04 14:58:53 +0100369
Akron821db3d2017-04-06 21:19:31 +0200370 DRuKoLa
371 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100372
Akron821db3d2017-04-06 21:19:31 +0200373 Glemm
374 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100375
Akroned9baf02019-01-22 17:03:25 +0100376 HNC
377 #Morpho
378
Akronf73ffb62018-06-27 12:13:59 +0200379 LWC
380 #Dependency
381
Akron821db3d2017-04-06 21:19:31 +0200382 Malt
383 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100384
Akron821db3d2017-04-06 21:19:31 +0200385 MarMoT
386 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100387
Akron821db3d2017-04-06 21:19:31 +0200388 Mate
389 #Dependency
390 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100391
Akron821db3d2017-04-06 21:19:31 +0200392 MDParser
393 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100394
Akron821db3d2017-04-06 21:19:31 +0200395 OpenNLP
396 #Morpho
397 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100398
Akron0b04b312020-10-30 17:39:18 +0100399 RWK
400 #Morpho
401 #Structure
402
Akron821db3d2017-04-06 21:19:31 +0200403 Sgbr
404 #Lemma
405 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100406
Akron7d5e6382019-08-08 16:36:27 +0200407 Talismane
408 #Dependency
409 #Morpho
410
Akron821db3d2017-04-06 21:19:31 +0200411 TreeTagger
412 #Morpho
413 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100414
Akron821db3d2017-04-06 21:19:31 +0200415 XIP
416 #Constituency
417 #Morpho
418 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100419
Akron5c71a852016-10-31 16:00:33 +0100420
421More importers are in preparation.
422New annotation importers can be defined in the C<KorAP::XML::Annotation> namespace.
423See the built-in annotation importers as examples.
Akronc13a1702016-03-15 19:33:14 +0100424
Akronf73ffb62018-06-27 12:13:59 +0200425
Akron8f69d632020-01-15 16:58:11 +0100426=head1 About KorAP-XML
427
428KorAP-XML (Bański et al. 2012) is an implementation of the KorAP
429data model (Bański et al. 2013), where text data are stored physically
430separated from their interpretations (i.e. annotations).
431A text document in KorAP-XML therefore consists of several files
432containing primary data, metadata and annotations.
433
434The structure of a single KorAP-XML document can be as follows:
435
436 - data.xml
437 - header.xml
438 + base
439 - tokens.xml
440 - ...
441 + struct
442 - structure.xml
443 - ...
444 + corenlp
445 - morpho.xml
446 - constituency.xml
447 - ...
448 + tree_tagger
449 - morpho.xml
450 - ...
451 - ...
452
453The C<data.xml> contains the primary data, the C<header.xml> contains
454the metadata, and the annotation layers are stored in subfolders
455like C<base>, C<struct> or C<corenlp>
456(so-called "foundries"; Bański et al. 2013).
457
458Metadata is available in the TEI-P5 variant I5
Akrond4c5c102020-02-11 11:47:59 +0100459(Lüngen and Sperberg-McQueen 2012). See the documentation in
460L<KorAP::XML::Meta::I5> for translatable fields.
461
462Annotations correspond to a variant of the TEI-P5 feature structures
463(TEI Consortium; Lee et al. 2004).
Akron72bc5222020-02-06 16:00:13 +0100464Annotation feature structures refer to character sequences of the primary text
465inside the C<text> element of the C<data.xml>.
466A single annotation containing the lemma of a token can have the following structure:
467
468 <span from="0" to="3">
469 <fs type="lex" xmlns="http://www.tei-c.org/ns/1.0">
470 <f name="lex">
471 <fs>
472 <f name="lemma">zum</f>
473 </fs>
474 </f>
475 </fs>
476 </span>
477
478The C<from> and C<to> attributes are refering to the character span
479in the primary text.
480Depending on the kind of annotation (e.g. token-based, span-based, relation-based),
481the structure may vary. See L<KorAP::XML::Annotation::*> for various
482annotation preprocessors.
Akron8f69d632020-01-15 16:58:11 +0100483
484Multiple KorAP-XML documents are organized on three levels following
485the "IDS Textmodell" (Lüngen and Sperberg-McQueen 2012):
486corpus E<gt> document E<gt> text. On each level metadata information
487can be stored, that C<korapxml2krill> will merge to a single metadata
488object per text. A corpus is therefore structured as follows:
489
490 + <corpus>
491 - header.xml
492 + <document>
493 - header.xml
494 + <text>
495 - data.xml
496 - header.xml
497 - ...
498 - ...
499
500A single text can be identified by the concatenation of
501the corpus identifier, the document identifier and the text identifier.
502This identifier is called the text sigle
503(e.g. a text with the identifier C<18486> in the document C<060> in the
504corpus C<WPD17> has the text sigle C<WPD17/060/18486>, see C<--sigle>).
505
506These corpora are often stored in zip files, with which C<korapxml2krill>
507can deal with. Corpora may also be split in multiple zip archives
508(e.g. one zip file per foundry), which is also supported (see C<--input>).
509
510Examples for KorAP-XML files are included in L<KorAP::XML::Krill>
511in form of a test suite.
512The resulting JSON format merges all annotation layers
513based on a single token stream.
514
515=head2 References
516
517Piotr Bański, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
518KorAP data model: first approximation, December.
519
520Piotr Bański, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
521"The New IDS Corpus Analysis Platform: Challenges and Prospects",
522Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012).
523L<PDF|http://www.lrec-conf.org/proceedings/lrec2012/pdf/789_Paper.pdf>
524
525Piotr Bański, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
526"Robust corpus architecture: a new look at virtual collections and data access",
527Corpus Linguistics 2013. Abstract Book. Lancaster: UCREL, pp. 23-25.
528L<PDF|https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4485/file/Ba%c5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf>
529
530Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Thierry Declerck,
531Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly and Claude Roux (2004):
532"Towards an international standard on featurestructure representation",
533Proceedings of the fourth International Conference on Language Resources and Evaluation (LREC 2004),
534pp. 373-376.
535L<PDF|http://www.lrec-conf.org/proceedings/lrec2004/pdf/687.pdf>
536
537Harald Lüngen and C. M. Sperberg-McQueen (2012):
538"A TEI P5 Document Grammar for the IDS Text Model",
539Journal of the Text Encoding Initiative, Issue 3 | November 2012.
540L<PDF|https://journals.openedition.org/jtei/pdf/508>
541
542TEI Consortium, eds:
543"Feature Structures",
544Guidelines for Electronic Text Encoding and Interchange.
545L<html|https://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html>
546
Akronc13a1702016-03-15 19:33:14 +0100547=head1 AVAILABILITY
548
549 https://github.com/KorAP/KorAP-XML-Krill
550
551
552=head1 COPYRIGHT AND LICENSE
553
Akron8f69d632020-01-15 16:58:11 +0100554Copyright (C) 2015-2020, L<IDS Mannheim|https://www.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100555
Akron8f69d632020-01-15 16:58:11 +0100556Author: L<Nils Diewald|https://nils-diewald.de/>
Akron81500102017-04-07 20:45:44 +0200557
Akron5c71a852016-10-31 16:00:33 +0100558Contributor: Eliza Margaretha
559
560L<KorAP::XML::Krill> is developed as part of the L<KorAP|http://korap.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100561Corpus Analysis Platform at the
Akron94262ce2019-02-28 21:42:43 +0100562L<Leibniz Institute for the German Language (IDS)|http://ids-mannheim.de/>,
Akronc13a1702016-03-15 19:33:14 +0100563member of the
Akronf1849aa2019-12-16 23:35:33 +0100564L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/>.
Akronc13a1702016-03-15 19:33:14 +0100565
Akron5c71a852016-10-31 16:00:33 +0100566This program is free software published under the
Akronc13a1702016-03-15 19:33:14 +0100567L<BSD-2 License|https://raw.githubusercontent.com/KorAP/KorAP-XML-Krill/master/LICENSE>.
568
569=cut