blob: 80a4d5f02ae33c4d258f1f0652fe50eb292f33b7 [file] [log] [blame]
Akronc13a1702016-03-15 19:33:14 +01001=pod
2
3=encoding utf8
4
5=head1 NAME
6
Akron42f48c12020-02-14 13:08:13 +01007korapxml2krill - Merge KorAP-XML data and create Krill documents
Akronc13a1702016-03-15 19:33:14 +01008
9
10=head1 SYNOPSIS
11
Akron5c71a852016-10-31 16:00:33 +010012 korapxml2krill [archive|extract] --input <directory|archive> [options]
Akron2fd402b2016-10-27 21:26:48 +020013
Akronc13a1702016-03-15 19:33:14 +010014
15=head1 DESCRIPTION
16
Akron5c71a852016-10-31 16:00:33 +010017L<KorAP::XML::Krill> is a library to convert KorAP-XML documents to files
18compatible with the L<Krill|https://github.com/KorAP/Krill> indexer.
Akron8f69d632020-01-15 16:58:11 +010019The C<korapxml2krill> command line tool is a simple wrapper of this library.
Akronc13a1702016-03-15 19:33:14 +010020
21
Akron5c71a852016-10-31 16:00:33 +010022=head1 INSTALLATION
Akronc13a1702016-03-15 19:33:14 +010023
Akron5c71a852016-10-31 16:00:33 +010024The preferred way to install L<KorAP::XML::Krill> is to use L<cpanm|App::cpanminus>.
Akronc13a1702016-03-15 19:33:14 +010025
Akron5c71a852016-10-31 16:00:33 +010026 $ cpanm https://github.com/KorAP/KorAP-XML-Krill.git
Akronc13a1702016-03-15 19:33:14 +010027
Akron5c71a852016-10-31 16:00:33 +010028In case everything went well, the C<korapxml2krill> tool will
29be available on your command line immediately.
Akron6eff23b2018-09-24 10:31:20 +020030Minimum requirement for L<KorAP::XML::Krill> is Perl 5.16.
Akron0b04b312020-10-30 17:39:18 +010031Optional support for L<Sys::Info> to calculate available cores.
Akron5c71a852016-10-31 16:00:33 +010032In addition to work with zip archives, the C<unzip> tool needs to be present.
Akronc13a1702016-03-15 19:33:14 +010033
Akron5c71a852016-10-31 16:00:33 +010034=head1 ARGUMENTS
Akronc13a1702016-03-15 19:33:14 +010035
Akron5c71a852016-10-31 16:00:33 +010036 $ korapxml2krill -z --input <directory> --output <filename>
37
38Without arguments, C<korapxml2krill> converts a directory of a single KorAP-XML document.
39It expects the input to point to the text level folder.
40
41=over 2
42
43=item B<archive>
44
Akronf73ffb62018-06-27 12:13:59 +020045 $ korapxml2krill archive -z --input <directory|archive> --output <directory|tar>
Akron5c71a852016-10-31 16:00:33 +010046
47Converts an archive of KorAP-XML documents. It expects a directory
48(pointing to the corpus level folder) or one or more zip files as input.
49
50=item B<extract>
51
52 $ korapxml2krill extract --input <archive> --output <directory> --sigle <SIGLE>
53
54Extracts KorAP-XML documents from a zip file.
55
Akron442c4e92017-04-10 23:41:31 +020056=item B<serial>
57
58 $ korapxml2krill serial -i <archive1> -i <archive2> -o <directory> -cfg <config-file>
59
60Convert archives sequentially. The inputs are not merged but treated
61as they are (so they may be premerged or globs).
62the C<--out> directory is treated as the base directory where subdirectories
Akronf73ffb62018-06-27 12:13:59 +020063are created based on the archive name. In case the C<--to-tar> flag is given,
64the output will be a tar file.
Akron442c4e92017-04-10 23:41:31 +020065
66
Akron5c71a852016-10-31 16:00:33 +010067=back
Akrona76d8352016-10-27 16:27:32 +020068
Akron7606afa2016-10-25 16:23:49 +020069
Akron5c71a852016-10-31 16:00:33 +010070=head1 OPTIONS
Akronc13a1702016-03-15 19:33:14 +010071
Akron5c71a852016-10-31 16:00:33 +010072=over 2
Akronc13a1702016-03-15 19:33:14 +010073
Akron5c71a852016-10-31 16:00:33 +010074=item B<--input|-i> <directory|zip file>
Akrona76d8352016-10-27 16:27:32 +020075
Akron5c71a852016-10-31 16:00:33 +010076Directory or zip file(s) of documents to convert.
Akronc13a1702016-03-15 19:33:14 +010077
Akron5c71a852016-10-31 16:00:33 +010078Without arguments, C<korapxml2krill> expects a folder of a single KorAP-XML
Akronf1a1de92016-11-02 17:32:12 +010079document, while C<archive> expects a KorAP-XML corpus folder or a zip
80file to batch process multiple files.
81C<extract> expects zip files only.
Akronc13a1702016-03-15 19:33:14 +010082
Akron5c71a852016-10-31 16:00:33 +010083C<archive> supports multiple input zip files with the constraint,
84that the first archive listed contains all primary data files
85and all meta data files.
Akrona76d8352016-10-27 16:27:32 +020086
Akron5c71a852016-10-31 16:00:33 +010087 -i file/news.zip -i file/news.malt.zip -i "#file/news.tt.zip"
Akronc13a1702016-03-15 19:33:14 +010088
Akron821db3d2017-04-06 21:19:31 +020089Input may also be defined using BSD glob wildcards.
90
91 -i 'file/news*.zip'
92
93The extended input array will be sorted in length order, so the shortest
94path needs to contain all primary data files and all meta data files.
95
Akron5c71a852016-10-31 16:00:33 +010096(The directory structure follows the base directory format,
97that may include a C<.> root folder.
98In this case further archives lacking a C<.> root folder
99need to be passed with a hash sign in front of the archive's name.
100This may require to quote the parameter.)
Akronc13a1702016-03-15 19:33:14 +0100101
Akron5c71a852016-10-31 16:00:33 +0100102To support zip files, a version of C<unzip> needs to be installed that is
103compatible with the archive file.
Akronc13a1702016-03-15 19:33:14 +0100104
Akron5c71a852016-10-31 16:00:33 +0100105B<The root folder switch using the hash sign is experimental and
106may vanish in future versions.>
Akronc13a1702016-03-15 19:33:14 +0100107
Akronf73ffb62018-06-27 12:13:59 +0200108
Akron442c4e92017-04-10 23:41:31 +0200109=item B<--input-base|-ib> <directory>
110
111The base directory for inputs.
112
113
Akron5c71a852016-10-31 16:00:33 +0100114=item B<--output|-o> <directory|file>
Akronc13a1702016-03-15 19:33:14 +0100115
Akron5c71a852016-10-31 16:00:33 +0100116Output folder for archive processing or
117document name for single output (optional),
118writes to C<STDOUT> by default
119(in case C<output> is not mandatory due to further options).
Akronc13a1702016-03-15 19:33:14 +0100120
Akron5c71a852016-10-31 16:00:33 +0100121=item B<--overwrite|-w>
Akronc13a1702016-03-15 19:33:14 +0100122
Akron5c71a852016-10-31 16:00:33 +0100123Overwrite files that already exist.
Akron7606afa2016-10-25 16:23:49 +0200124
Akronf73ffb62018-06-27 12:13:59 +0200125
Akron3741f8b2016-12-21 19:55:21 +0100126=item B<--token|-t> <foundry>#<file>
Akrona5920b12016-06-29 18:51:21 +0200127
Akron5c71a852016-10-31 16:00:33 +0100128Define the default tokenization by specifying
129the name of the foundry and optionally the name
130of the layer-file. Defaults to C<OpenNLP#tokens>.
Akronf1849aa2019-12-16 23:35:33 +0100131This will directly take the file instead of running
132the layer implementation!
Akron3741f8b2016-12-21 19:55:21 +0100133
Akron8f69d632020-01-15 16:58:11 +0100134
Akron3741f8b2016-12-21 19:55:21 +0100135=item B<--base-sentences|-bs> <foundry>#<layer>
136
137Define the layer for base sentences.
138If given, this will be used instead of using C<Base#Sentences>.
Akronc29b8e12019-12-16 14:28:09 +0100139Currently C<DeReKo#Structure> and C<DGD#Structure> are the only additional
140layers supported.
Akron3741f8b2016-12-21 19:55:21 +0100141
142 Defaults to unset.
143
144
145=item B<--base-paragraphs|-bp> <foundry>#<layer>
146
147Define the layer for base paragraphs.
148If given, this will be used instead of using C<Base#Paragraphs>.
149Currently C<DeReKo#Structure> is the only additional layer supported.
150
151 Defaults to unset.
152
153
Akron821db3d2017-04-06 21:19:31 +0200154=item B<--base-pagebreaks|-bpb> <foundry>#<layer>
155
156Define the layer for base pagebreaks.
157Currently C<DeReKo#Structure> is the only layer supported.
158
159 Defaults to unset.
160
161
Akron5c71a852016-10-31 16:00:33 +0100162=item B<--skip|-s> <foundry>[#<layer>]
163
164Skip specific annotations by specifying the foundry
165(and optionally the layer with a C<#>-prefix),
166e.g. C<Mate> or C<Mate#Morpho>. Alternatively you can skip C<#ALL>.
167Can be set multiple times.
168
Akronf73ffb62018-06-27 12:13:59 +0200169
Akron5c71a852016-10-31 16:00:33 +0100170=item B<--anno|-a> <foundry>#<layer>
171
172Convert specific annotations by specifying the foundry
173(and optionally the layer with a C<#>-prefix),
174e.g. C<Mate> or C<Mate#Morpho>.
175Can be set multiple times.
176
Akronf73ffb62018-06-27 12:13:59 +0200177
Akroned9baf02019-01-22 17:03:25 +0100178=item B<--non-word-tokens|-nwt>
179
180Tokenize non-word tokens like word tokens (defined as matching
181C</[\d\w]/>). Useful to treat punctuations as tokens.
182
183 Defaults to unset.
184
Akronf1849aa2019-12-16 23:35:33 +0100185
186=item B<--non-verbal-tokens|-nvt>
187
188Tokenize non-verbal tokens marked as in the primary data as
189the unicode symbol 'Black Vertical Rectangle' aka \x25ae.
190
191 Defaults to unset.
192
193
Akron5c71a852016-10-31 16:00:33 +0100194=item B<--jobs|-j>
195
196Define the number of concurrent jobs in seperated forks
197for archive processing.
198Defaults to C<0> (everything runs in a single process).
Akronf73ffb62018-06-27 12:13:59 +0200199
200If C<sequential-extraction> is not set to false, this will
201also apply to extraction.
202
Akron821db3d2017-04-06 21:19:31 +0200203Pass -1, and the value will be set automatically to 5
Akron0b04b312020-10-30 17:39:18 +0100204times the number of available cores, in case L<Sys::Info>
205is available.
Akron5c71a852016-10-31 16:00:33 +0100206This is I<experimental>.
207
Akronf73ffb62018-06-27 12:13:59 +0200208
Akron263274c2019-02-07 09:48:30 +0100209=item B<--koral|-k>
210
211Version of the output format. Supported versions are:
212C<0> for legacy serialization, C<0.03> for serialization
213with metadata fields as key-values on the root object,
214C<0.4> for serialization with metadata fields as a list
215of C<"@type":"koral:field"> objects.
216
217Currently defaults to C<0.03>.
218
219
Akronf73ffb62018-06-27 12:13:59 +0200220=item B<--sequential-extraction|-se>
221
222Flag to indicate, if the C<jobs> value also applies to extraction.
223Some systems may have problems with extracting multiple archives
224to the same folder at the same time.
225Can be flagged using C<--no-sequential-extraction> as well.
226Defaults to C<false>.
227
228
Akron5c71a852016-10-31 16:00:33 +0100229=item B<--meta|-m>
230
231Define the metadata parser to use. Defaults to C<I5>.
232Metadata parsers can be defined in the C<KorAP::XML::Meta> namespace.
233This is I<experimental>.
234
Akronf73ffb62018-06-27 12:13:59 +0200235
Akron5c71a852016-10-31 16:00:33 +0100236=item B<--gzip|-z>
237
238Compress the output.
239Expects a defined C<output> file in single processing.
240
Akronf73ffb62018-06-27 12:13:59 +0200241
Akron5c71a852016-10-31 16:00:33 +0100242=item B<--cache|-c>
243
244File to mmap a cache (using L<Cache::FastMmap>).
245Defaults to C<korapxml2krill.cache> in the calling directory.
246
Akronf73ffb62018-06-27 12:13:59 +0200247
Akron5c71a852016-10-31 16:00:33 +0100248=item B<--cache-size|-cs>
249
250Size of the cache. Defaults to C<50m>.
251
Akronf73ffb62018-06-27 12:13:59 +0200252
Akron5c71a852016-10-31 16:00:33 +0100253=item B<--cache-init|-ci>
254
255Initialize cache file.
256Can be flagged using C<--no-cache-init> as well.
257Defaults to C<true>.
258
Akronf73ffb62018-06-27 12:13:59 +0200259
Akron5c71a852016-10-31 16:00:33 +0100260=item B<--cache-delete|-cd>
261
262Delete cache file after processing.
263Can be flagged using C<--no-cache-delete> as well.
264Defaults to C<true>.
265
Akronf73ffb62018-06-27 12:13:59 +0200266
Akron636aa112017-04-07 18:48:56 +0200267=item B<--config|-cfg>
268
269Configure the parameters of your call in a file
270of key-value pairs with whitespace separator
271
272 overwrite 1
273 token DeReKo#Structure
274 ...
275
276Supported parameters are:
Akron442c4e92017-04-10 23:41:31 +0200277C<overwrite>, C<gzip>, C<jobs>, C<input-base>,
Akron636aa112017-04-07 18:48:56 +0200278C<token>, C<log>, C<cache>, C<cache-size>, C<cache-delete>, C<meta>,
Akron57510c12019-01-04 14:58:53 +0100279C<output>, C<koral>,
Akron9a2545e2022-01-16 15:15:50 +0100280C<temporary-extract>, C<sequential-extraction>,
Akronf73ffb62018-06-27 12:13:59 +0200281C<base-sentences>, C<base-paragraphs>,
282C<base-pagebreaks>,
283C<skip> (semicolon separated), C<sigle>
Akron636aa112017-04-07 18:48:56 +0200284(semicolon separated), C<anno> (semicolon separated).
285
Akronf73ffb62018-06-27 12:13:59 +0200286Configuration parameters will always be overwritten by
287passed parameters.
288
289
Akron81500102017-04-07 20:45:44 +0200290=item B<--temporary-extract|-te>
291
292Only valid for the C<archive> command.
293
294This will first extract all files into a
295directory and then will archive.
296If the directory is given as C<:temp:>,
297a temporary directory is used.
298This is especially useful to avoid
299massive unzipping and potential
300network latency.
Akron636aa112017-04-07 18:48:56 +0200301
Akronf73ffb62018-06-27 12:13:59 +0200302
Akronc93a0802019-07-11 15:48:34 +0200303=item B<--to-tar>
304
305Only valid for the C<archive> command.
306
307Writes the output into a tar archive.
308
309
Akron5c71a852016-10-31 16:00:33 +0100310=item B<--sigle|-sg>
311
312Extract the given texts.
313Can be set multiple times.
314I<Currently only supported on C<extract>.>
315Sigles have the structure C<Corpus>/C<Document>/C<Text>.
316In case the C<Text> path is omitted, the whole document will be extracted.
317On the document level, the postfix wildcard C<*> is supported.
318
Akronf73ffb62018-06-27 12:13:59 +0200319
Akron5c71a852016-10-31 16:00:33 +0100320=item B<--log|-l>
321
Akron6882d7d2021-02-08 09:43:57 +0100322The L<Log::Any> log level, defaults to C<ERROR>.
Akron5c71a852016-10-31 16:00:33 +0100323
Akronf73ffb62018-06-27 12:13:59 +0200324
Akron5c71a852016-10-31 16:00:33 +0100325=item B<--help|-h>
326
Akron42f48c12020-02-14 13:08:13 +0100327Print help information.
Akron5c71a852016-10-31 16:00:33 +0100328
Akronf73ffb62018-06-27 12:13:59 +0200329
Akron5c71a852016-10-31 16:00:33 +0100330=item B<--version|-v>
331
332Print version information.
333
334=back
335
Akronf73ffb62018-06-27 12:13:59 +0200336
Akron5c71a852016-10-31 16:00:33 +0100337=head1 ANNOTATION SUPPORT
338
339L<KorAP::XML::Krill> has built-in importer for some annotation foundries and layers
340developed in the KorAP project that are part of the KorAP preprocessing pipeline.
341The base foundry with paragraphs, sentences, and the text element are mandatory for
342L<Krill|https://github.com/KorAP/Krill>.
343
Akron821db3d2017-04-06 21:19:31 +0200344 Base
345 #Paragraphs
346 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100347
Akron821db3d2017-04-06 21:19:31 +0200348 Connexor
349 #Morpho
350 #Phrase
351 #Sentences
352 #Syntax
Akron5c71a852016-10-31 16:00:33 +0100353
Akron821db3d2017-04-06 21:19:31 +0200354 CoreNLP
355 #Constituency
356 #Morpho
357 #NamedEntities
358 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100359
Akronf73ffb62018-06-27 12:13:59 +0200360 CMC
361 #Morpho
362
Akron821db3d2017-04-06 21:19:31 +0200363 DeReKo
364 #Structure
Akron5c71a852016-10-31 16:00:33 +0100365
Akron57510c12019-01-04 14:58:53 +0100366 DGD
367 #Morpho
Akronc29b8e12019-12-16 14:28:09 +0100368 #Structure
Akron57510c12019-01-04 14:58:53 +0100369
Akron821db3d2017-04-06 21:19:31 +0200370 DRuKoLa
371 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100372
Akronabb36902021-10-11 15:51:06 +0200373 Gingko
374 #Morpho
375
Akron821db3d2017-04-06 21:19:31 +0200376 Glemm
377 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100378
Akroned9baf02019-01-22 17:03:25 +0100379 HNC
380 #Morpho
381
Akronf73ffb62018-06-27 12:13:59 +0200382 LWC
383 #Dependency
384
Akron821db3d2017-04-06 21:19:31 +0200385 Malt
386 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100387
Akron821db3d2017-04-06 21:19:31 +0200388 MarMoT
389 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100390
Akron821db3d2017-04-06 21:19:31 +0200391 Mate
392 #Dependency
393 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100394
Akron821db3d2017-04-06 21:19:31 +0200395 MDParser
396 #Dependency
Akron5c71a852016-10-31 16:00:33 +0100397
Akron821db3d2017-04-06 21:19:31 +0200398 OpenNLP
399 #Morpho
400 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100401
Akron0b04b312020-10-30 17:39:18 +0100402 RWK
403 #Morpho
404 #Structure
405
Akron821db3d2017-04-06 21:19:31 +0200406 Sgbr
407 #Lemma
408 #Morpho
Akron5c71a852016-10-31 16:00:33 +0100409
Akron7d5e6382019-08-08 16:36:27 +0200410 Talismane
411 #Dependency
412 #Morpho
413
Akron821db3d2017-04-06 21:19:31 +0200414 TreeTagger
415 #Morpho
416 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100417
Akron821db3d2017-04-06 21:19:31 +0200418 XIP
419 #Constituency
420 #Morpho
421 #Sentences
Akron5c71a852016-10-31 16:00:33 +0100422
Akron5c71a852016-10-31 16:00:33 +0100423
424More importers are in preparation.
425New annotation importers can be defined in the C<KorAP::XML::Annotation> namespace.
426See the built-in annotation importers as examples.
Akronc13a1702016-03-15 19:33:14 +0100427
Akronf73ffb62018-06-27 12:13:59 +0200428
Akron41e6c8b2021-10-14 20:22:18 +0200429=head1 METADATA SUPPORT
430
431L<KorAP::XML::Krill> has built-in importer for some meta data variants
432developed in the KorAP project that are part of the KorAP preprocessing pipeline.
433
434=over 2
435
436=item I5 - Meta data for all I5 files
437
438=item Sgbr - Meta data from the Schreibgebrauch project
439
440=item Gingko - Meta data from the Gingko project in addition to I5
441
442=back
443
444More importers are in preparation.
445New meta data importers can be defined in the C<KorAP::XML::Meta> namespace.
446See the built-in meta data importers as examples.
447
448
Akron8f69d632020-01-15 16:58:11 +0100449=head1 About KorAP-XML
450
451KorAP-XML (Bański et al. 2012) is an implementation of the KorAP
452data model (Bański et al. 2013), where text data are stored physically
453separated from their interpretations (i.e. annotations).
454A text document in KorAP-XML therefore consists of several files
455containing primary data, metadata and annotations.
456
457The structure of a single KorAP-XML document can be as follows:
458
459 - data.xml
460 - header.xml
461 + base
462 - tokens.xml
463 - ...
464 + struct
465 - structure.xml
466 - ...
467 + corenlp
468 - morpho.xml
469 - constituency.xml
470 - ...
471 + tree_tagger
472 - morpho.xml
473 - ...
474 - ...
475
476The C<data.xml> contains the primary data, the C<header.xml> contains
477the metadata, and the annotation layers are stored in subfolders
478like C<base>, C<struct> or C<corenlp>
479(so-called "foundries"; Bański et al. 2013).
480
481Metadata is available in the TEI-P5 variant I5
Akrond4c5c102020-02-11 11:47:59 +0100482(Lüngen and Sperberg-McQueen 2012). See the documentation in
483L<KorAP::XML::Meta::I5> for translatable fields.
484
485Annotations correspond to a variant of the TEI-P5 feature structures
486(TEI Consortium; Lee et al. 2004).
Akron72bc5222020-02-06 16:00:13 +0100487Annotation feature structures refer to character sequences of the primary text
488inside the C<text> element of the C<data.xml>.
489A single annotation containing the lemma of a token can have the following structure:
490
491 <span from="0" to="3">
492 <fs type="lex" xmlns="http://www.tei-c.org/ns/1.0">
493 <f name="lex">
494 <fs>
495 <f name="lemma">zum</f>
496 </fs>
497 </f>
498 </fs>
499 </span>
500
501The C<from> and C<to> attributes are refering to the character span
502in the primary text.
503Depending on the kind of annotation (e.g. token-based, span-based, relation-based),
504the structure may vary. See L<KorAP::XML::Annotation::*> for various
505annotation preprocessors.
Akron8f69d632020-01-15 16:58:11 +0100506
507Multiple KorAP-XML documents are organized on three levels following
508the "IDS Textmodell" (Lüngen and Sperberg-McQueen 2012):
509corpus E<gt> document E<gt> text. On each level metadata information
510can be stored, that C<korapxml2krill> will merge to a single metadata
511object per text. A corpus is therefore structured as follows:
512
513 + <corpus>
514 - header.xml
515 + <document>
516 - header.xml
517 + <text>
518 - data.xml
519 - header.xml
520 - ...
521 - ...
522
523A single text can be identified by the concatenation of
524the corpus identifier, the document identifier and the text identifier.
525This identifier is called the text sigle
526(e.g. a text with the identifier C<18486> in the document C<060> in the
527corpus C<WPD17> has the text sigle C<WPD17/060/18486>, see C<--sigle>).
528
529These corpora are often stored in zip files, with which C<korapxml2krill>
530can deal with. Corpora may also be split in multiple zip archives
531(e.g. one zip file per foundry), which is also supported (see C<--input>).
532
533Examples for KorAP-XML files are included in L<KorAP::XML::Krill>
534in form of a test suite.
535The resulting JSON format merges all annotation layers
536based on a single token stream.
537
538=head2 References
539
540Piotr Bański, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
541KorAP data model: first approximation, December.
542
543Piotr Bański, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
544"The New IDS Corpus Analysis Platform: Challenges and Prospects",
545Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012).
546L<PDF|http://www.lrec-conf.org/proceedings/lrec2012/pdf/789_Paper.pdf>
547
548Piotr Bański, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
549"Robust corpus architecture: a new look at virtual collections and data access",
550Corpus Linguistics 2013. Abstract Book. Lancaster: UCREL, pp. 23-25.
551L<PDF|https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4485/file/Ba%c5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf>
552
553Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Thierry Declerck,
554Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly and Claude Roux (2004):
555"Towards an international standard on featurestructure representation",
556Proceedings of the fourth International Conference on Language Resources and Evaluation (LREC 2004),
557pp. 373-376.
558L<PDF|http://www.lrec-conf.org/proceedings/lrec2004/pdf/687.pdf>
559
560Harald Lüngen and C. M. Sperberg-McQueen (2012):
561"A TEI P5 Document Grammar for the IDS Text Model",
562Journal of the Text Encoding Initiative, Issue 3 | November 2012.
563L<PDF|https://journals.openedition.org/jtei/pdf/508>
564
565TEI Consortium, eds:
566"Feature Structures",
567Guidelines for Electronic Text Encoding and Interchange.
568L<html|https://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html>
569
Akronc13a1702016-03-15 19:33:14 +0100570=head1 AVAILABILITY
571
572 https://github.com/KorAP/KorAP-XML-Krill
573
574
575=head1 COPYRIGHT AND LICENSE
576
Akron9a2545e2022-01-16 15:15:50 +0100577Copyright (C) 2015-2022, L<IDS Mannheim|https://www.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100578
Akron6882d7d2021-02-08 09:43:57 +0100579Author: L<Nils Diewald|https://www.nils-diewald.de/>
Akron81500102017-04-07 20:45:44 +0200580
Akron5c71a852016-10-31 16:00:33 +0100581Contributor: Eliza Margaretha
582
Akron6882d7d2021-02-08 09:43:57 +0100583L<KorAP::XML::Krill> is developed as part of the L<KorAP|https://korap.ids-mannheim.de/>
Akronc13a1702016-03-15 19:33:14 +0100584Corpus Analysis Platform at the
Akron6882d7d2021-02-08 09:43:57 +0100585L<Leibniz Institute for the German Language (IDS)|https://www.ids-mannheim.de/>,
Akronc13a1702016-03-15 19:33:14 +0100586member of the
Akronf1849aa2019-12-16 23:35:33 +0100587L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/>.
Akronc13a1702016-03-15 19:33:14 +0100588
Akron5c71a852016-10-31 16:00:33 +0100589This program is free software published under the
Akron6882d7d2021-02-08 09:43:57 +0100590L<BSD-2 License|https://opensource.org/licenses/BSD-2-Clause>.
Akronc13a1702016-03-15 19:33:14 +0100591
592=cut