Fixed readme by mentioning preference regarding configuration parameters
Change-Id: Ia4372cbb39b60630a42027db3f7eac37321a1cb2
diff --git a/Readme.pod b/Readme.pod
index c38522e..32af215 100644
--- a/Readme.pod
+++ b/Readme.pod
@@ -41,7 +41,7 @@
=item B<archive>
- $ korapxml2krill archive -z --input <directory|archive> --output <directory>
+ $ korapxml2krill archive -z --input <directory|archive> --output <directory|tar>
Converts an archive of KorAP-XML documents. It expects a directory
(pointing to the corpus level folder) or one or more zip files as input.
@@ -59,7 +59,8 @@
Convert archives sequentially. The inputs are not merged but treated
as they are (so they may be premerged or globs).
the C<--out> directory is treated as the base directory where subdirectories
-are created based on the archive name.
+are created based on the archive name. In case the C<--to-tar> flag is given,
+the output will be a tar file.
=back
@@ -103,6 +104,7 @@
B<The root folder switch using the hash sign is experimental and
may vanish in future versions.>
+
=item B<--input-base|-ib> <directory>
The base directory for inputs.
@@ -119,6 +121,7 @@
Overwrite files that already exist.
+
=item B<--token|-t> <foundry>#<file>
Define the default tokenization by specifying
@@ -159,6 +162,7 @@
e.g. C<Mate> or C<Mate#Morpho>. Alternatively you can skip C<#ALL>.
Can be set multiple times.
+
=item B<--anno|-a> <foundry>#<layer>
Convert specific annotations by specifying the foundry
@@ -166,58 +170,81 @@
e.g. C<Mate> or C<Mate#Morpho>.
Can be set multiple times.
+
=item B<--primary|-p>
Output primary data or not. Defaults to C<true>.
Can be flagged using C<--no-primary> as well.
This is I<deprecated>.
+
=item B<--jobs|-j>
Define the number of concurrent jobs in seperated forks
for archive processing.
Defaults to C<0> (everything runs in a single process).
+
+If C<sequential-extraction> is not set to false, this will
+also apply to extraction.
+
Pass -1, and the value will be set automatically to 5
times the number of available cores.
This is I<experimental>.
+
+=item B<--sequential-extraction|-se>
+
+Flag to indicate, if the C<jobs> value also applies to extraction.
+Some systems may have problems with extracting multiple archives
+to the same folder at the same time.
+Can be flagged using C<--no-sequential-extraction> as well.
+Defaults to C<false>.
+
+
=item B<--meta|-m>
Define the metadata parser to use. Defaults to C<I5>.
Metadata parsers can be defined in the C<KorAP::XML::Meta> namespace.
This is I<experimental>.
+
=item B<--pretty|-y>
Pretty print JSON output. Defaults to C<false>.
This is I<deprecated>.
+
=item B<--gzip|-z>
Compress the output.
Expects a defined C<output> file in single processing.
+
=item B<--cache|-c>
File to mmap a cache (using L<Cache::FastMmap>).
Defaults to C<korapxml2krill.cache> in the calling directory.
+
=item B<--cache-size|-cs>
Size of the cache. Defaults to C<50m>.
+
=item B<--cache-init|-ci>
Initialize cache file.
Can be flagged using C<--no-cache-init> as well.
Defaults to C<true>.
+
=item B<--cache-delete|-cd>
Delete cache file after processing.
Can be flagged using C<--no-cache-delete> as well.
Defaults to C<true>.
+
=item B<--config|-cfg>
Configure the parameters of your call in a file
@@ -230,10 +257,17 @@
Supported parameters are:
C<overwrite>, C<gzip>, C<jobs>, C<input-base>,
C<token>, C<log>, C<cache>, C<cache-size>, C<cache-delete>, C<meta>,
-C<output>, C<base-sentences>, C<temp-extract>, C<base-paragraphs>,
-C<base-pagebreaks>, C<skip> (semicolon separated), C<sigle>
+C<output>,
+C<temp-extract>, C<sequential-extraction>,
+C<base-sentences>, C<base-paragraphs>,
+C<base-pagebreaks>,
+C<skip> (semicolon separated), C<sigle>
(semicolon separated), C<anno> (semicolon separated).
+Configuration parameters will always be overwritten by
+passed parameters.
+
+
=item B<--temporary-extract|-te>
Only valid for the C<archive> command.
@@ -246,6 +280,7 @@
massive unzipping and potential
network latency.
+
=item B<--sigle|-sg>
Extract the given texts.
@@ -255,20 +290,24 @@
In case the C<Text> path is omitted, the whole document will be extracted.
On the document level, the postfix wildcard C<*> is supported.
+
=item B<--log|-l>
The L<Log4perl> log level, defaults to C<ERROR>.
+
=item B<--help|-h>
Print this document.
+
=item B<--version|-v>
Print version information.
=back
+
=head1 ANNOTATION SUPPORT
L<KorAP::XML::Krill> has built-in importer for some annotation foundries and layers
@@ -292,6 +331,9 @@
#NamedEntities
#Sentences
+ CMC
+ #Morpho
+
DeReKo
#Structure
@@ -301,6 +343,9 @@
Glemm
#Morpho
+ LWC
+ #Dependency
+
Malt
#Dependency
@@ -336,6 +381,7 @@
New annotation importers can be defined in the C<KorAP::XML::Annotation> namespace.
See the built-in annotation importers as examples.
+
=head1 AVAILABILITY
https://github.com/KorAP/KorAP-XML-Krill
@@ -343,7 +389,7 @@
=head1 COPYRIGHT AND LICENSE
-Copyright (C) 2015-2017, L<IDS Mannheim|http://www.ids-mannheim.de/>
+Copyright (C) 2015-2018, L<IDS Mannheim|http://www.ids-mannheim.de/>
Author: L<Nils Diewald|http://nils-diewald.de/>
diff --git a/script/korapxml2krill b/script/korapxml2krill
index 4ff4b5b..bcdadc8 100644
--- a/script/korapxml2krill
+++ b/script/korapxml2krill
@@ -1141,6 +1141,7 @@
B<The root folder switch using the hash sign is experimental and
may vanish in future versions.>
+
=item B<--input-base|-ib> <directory>
The base directory for inputs.
@@ -1157,6 +1158,7 @@
Overwrite files that already exist.
+
=item B<--token|-t> <foundry>#<file>
Define the default tokenization by specifying
@@ -1197,6 +1199,7 @@
e.g. C<Mate> or C<Mate#Morpho>. Alternatively you can skip C<#ALL>.
Can be set multiple times.
+
=item B<--anno|-a> <foundry>#<layer>
Convert specific annotations by specifying the foundry
@@ -1204,12 +1207,14 @@
e.g. C<Mate> or C<Mate#Morpho>.
Can be set multiple times.
+
=item B<--primary|-p>
Output primary data or not. Defaults to C<true>.
Can be flagged using C<--no-primary> as well.
This is I<deprecated>.
+
=item B<--jobs|-j>
Define the number of concurrent jobs in seperated forks
@@ -1223,6 +1228,7 @@
times the number of available cores.
This is I<experimental>.
+
=item B<--sequential-extraction|-se>
Flag to indicate, if the C<jobs> value also applies to extraction.
@@ -1231,43 +1237,51 @@
Can be flagged using C<--no-sequential-extraction> as well.
Defaults to C<false>.
+
=item B<--meta|-m>
Define the metadata parser to use. Defaults to C<I5>.
Metadata parsers can be defined in the C<KorAP::XML::Meta> namespace.
This is I<experimental>.
+
=item B<--pretty|-y>
Pretty print JSON output. Defaults to C<false>.
This is I<deprecated>.
+
=item B<--gzip|-z>
Compress the output.
Expects a defined C<output> file in single processing.
+
=item B<--cache|-c>
File to mmap a cache (using L<Cache::FastMmap>).
Defaults to C<korapxml2krill.cache> in the calling directory.
+
=item B<--cache-size|-cs>
Size of the cache. Defaults to C<50m>.
+
=item B<--cache-init|-ci>
Initialize cache file.
Can be flagged using C<--no-cache-init> as well.
Defaults to C<true>.
+
=item B<--cache-delete|-cd>
Delete cache file after processing.
Can be flagged using C<--no-cache-delete> as well.
Defaults to C<true>.
+
=item B<--config|-cfg>
Configure the parameters of your call in a file
@@ -1287,6 +1301,10 @@
C<skip> (semicolon separated), C<sigle>
(semicolon separated), C<anno> (semicolon separated).
+Configuration parameters will always be overwritten by
+passed parameters.
+
+
=item B<--temporary-extract|-te>
Only valid for the C<archive> command.
@@ -1299,6 +1317,7 @@
massive unzipping and potential
network latency.
+
=item B<--sigle|-sg>
Extract the given texts.
@@ -1308,20 +1327,24 @@
In case the C<Text> path is omitted, the whole document will be extracted.
On the document level, the postfix wildcard C<*> is supported.
+
=item B<--log|-l>
The L<Log4perl> log level, defaults to C<ERROR>.
+
=item B<--help|-h>
Print this document.
+
=item B<--version|-v>
Print version information.
=back
+
=head1 ANNOTATION SUPPORT
L<KorAP::XML::Krill> has built-in importer for some annotation foundries and layers
@@ -1395,6 +1418,7 @@
New annotation importers can be defined in the C<KorAP::XML::Annotation> namespace.
See the built-in annotation importers as examples.
+
=head1 AVAILABILITY
https://github.com/KorAP/KorAP-XML-Krill