Merge changes Id4185a86,Iaa0333c0,I85fb0618
* changes:
Sync version to with KorAP-Tokenizer version
Add tests for clitics and contractions: French, English, German
Upgrade to KorAP-Tokenizer v2.2.0
diff --git a/Makefile.PL b/Makefile.PL
index 3e15add..fc62eaa 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -13,6 +13,10 @@
ABSTRACT => 'Conversion of TEI P5 based formats to KorAP-XML',
VERSION_FROM => 'script/tei2korapxml',
LICENSE => 'freebsd',
+ CONFIGURE_REQUIRES => {
+ 'ExtUtils::MakeMaker' => '6.52',
+ 'File::ShareDir::Install' => '0.13',
+ },
BUILD_REQUIRES => {
'Test::More' => 0,
'Test::Output' => 0,
diff --git a/Readme.pod b/Readme.pod
index f9814a2..7825c4e 100644
--- a/Readme.pod
+++ b/Readme.pod
@@ -13,7 +13,7 @@
=head1 DESCRIPTION
C<tei2korapxml> is a script to convert TEI P5 and
-L<I5|https://www1.ids-mannheim.de/kl/projekte/korpora/textmodell.html>
+L<I5|https://www.ids-mannheim.de/digspra/kl/projekte/korpora/textmodell>
based documents to the
L<KorAP-XML format|https://github.com/KorAP/KorAP-XML-Krill#about-korap-xml>.
If no specific input is defined, data is
@@ -53,7 +53,7 @@
into blanks between 2 tokens could lead to additional blanks,
where there should be none (e.g.: punctuation characters like C<,> or
C<.> should not be seperated from their predecessor token).
-(see also code section C<~ whitespace handling ~>).
+(see also code section C<~ whitespace handling ~> in C<script/tei2korapxml>).
=back
@@ -70,8 +70,8 @@
=head1 INSTALLATION
-C<tei2korapxml> requires L<libxml2-dev> bindings to build. When
-these bindings are available, the preferred way to install the script is
+C<tei2korapxml> requires L<libxml2-dev> bindings and L<File::ShareDir::Install> to be installed.
+When these requirements are met, the preferred way to install the script is
to use L<cpanm|App::cpanminus>.
$ cpanm https://github.com/KorAP/KorAP-XML-TEI.git
@@ -132,8 +132,8 @@
Define the foundry and file (without extension)
to store inline token information in.
-If L</KORAPXMLTEI_INLINE> is set, this will contain
-annotations as well.
+Unless C<--skip-inline-token-annotations> is set,
+this will contain annotations as well.
Defaults to C<tokens> and C<morpho>.
=item B<--inline-structures> <foundry>#[<file>]
@@ -200,7 +200,7 @@
L<KorAP::XML::TEI> is developed as part of the L<KorAP|https://korap.ids-mannheim.de/>
Corpus Analysis Platform at the
-L<Leibniz Institute for the German Language (IDS)|http://ids-mannheim.de/>,
+L<Leibniz Institute for the German Language (IDS)|https://www.ids-mannheim.de/>,
member of the
L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/>.
diff --git a/script/tei2korapxml b/script/tei2korapxml
index 123d937..2d1a6bf 100755
--- a/script/tei2korapxml
+++ b/script/tei2korapxml
@@ -404,7 +404,7 @@
=head1 DESCRIPTION
C<tei2korapxml> is a script to convert TEI P5 and
-L<I5|https://www1.ids-mannheim.de/kl/projekte/korpora/textmodell.html>
+L<I5|https://www.ids-mannheim.de/digspra/kl/projekte/korpora/textmodell>
based documents to the
L<KorAP-XML format|https://github.com/KorAP/KorAP-XML-Krill#about-korap-xml>.
If no specific input is defined, data is
@@ -444,7 +444,7 @@
into blanks between 2 tokens could lead to additional blanks,
where there should be none (e.g.: punctuation characters like C<,> or
C<.> should not be seperated from their predecessor token).
-(see also code section C<~ whitespace handling ~>).
+(see also code section C<~ whitespace handling ~> in C<script/tei2korapxml>).
=back
@@ -461,8 +461,8 @@
=head1 INSTALLATION
-C<tei2korapxml> requires L<libxml2-dev> bindings to build. When
-these bindings are available, the preferred way to install the script is
+C<tei2korapxml> requires L<libxml2-dev> bindings and L<File::ShareDir::Install> to be installed.
+When these requirements are met, the preferred way to install the script is
to use L<cpanm|App::cpanminus>.
$ cpanm https://github.com/KorAP/KorAP-XML-TEI.git
@@ -523,8 +523,8 @@
Define the foundry and file (without extension)
to store inline token information in.
-If L</KORAPXMLTEI_INLINE> is set, this will contain
-annotations as well.
+Unless C<--skip-inline-token-annotations> is set,
+this will contain annotations as well.
Defaults to C<tokens> and C<morpho>.
=item B<--inline-structures> <foundry>#[<file>]
@@ -591,7 +591,7 @@
L<KorAP::XML::TEI> is developed as part of the L<KorAP|https://korap.ids-mannheim.de/>
Corpus Analysis Platform at the
-L<Leibniz Institute for the German Language (IDS)|http://ids-mannheim.de/>,
+L<Leibniz Institute for the German Language (IDS)|https://www.ids-mannheim.de/>,
member of the
L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/>.