Update Readme and improve GitHub wrong handling of unicode characters
Change-Id: Ie89f02064f0cfa99cf8b277d449a82a7b4348786
diff --git a/Readme.pod b/Readme.pod
index 7a2eb49..db7a3bf 100644
--- a/Readme.pod
+++ b/Readme.pod
@@ -326,6 +326,15 @@
In case the C<Text> path is omitted, the whole document will be extracted.
On the document level, the postfix wildcard C<*> is supported.
+=item B<--lang>
+
+Preferred language for metadata fields. In case multiple titles are
+given (on any level) with different C<xml:lang> attributes,
+the language given is preferred.
+Because titles may have different sources and different priorities,
+non-specific language titles may still be preferred in case the title
+source has a higher priority.
+
=item B<--log|-l>
@@ -462,8 +471,8 @@
=head1 About KorAP-XML
-KorAP-XML (Bański et al. 2012) is an implementation of the KorAP
-data model (Bański et al. 2013), where text data are stored physically
+KorAP-XML (Banski et al. 2012) is an implementation of the KorAP
+data model (Banski et al. 2013), where text data are stored physically
separated from their interpretations (i.e. annotations).
A text document in KorAP-XML therefore consists of several files
containing primary data, metadata and annotations.
@@ -490,7 +499,7 @@
The C<data.xml> contains the primary data, the C<header.xml> contains
the metadata, and the annotation layers are stored in subfolders
like C<base>, C<struct> or C<corenlp>
-(so-called "foundries"; Bański et al. 2013).
+(so-called "foundries"; Banski et al. 2013).
Metadata is available in the TEI-P5 variant I5
(Lüngen and Sperberg-McQueen 2012). See the documentation in
@@ -551,15 +560,15 @@
=head2 References
-Piotr Bański, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
+Piotr Banski, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
KorAP data model: first approximation, December.
-Piotr Bański, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
+Piotr Banski, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
"The New IDS Corpus Analysis Platform: Challenges and Prospects",
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012).
L<PDF|http://www.lrec-conf.org/proceedings/lrec2012/pdf/789_Paper.pdf>
-Piotr Bański, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
+Piotr Banski, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
"Robust corpus architecture: a new look at virtual collections and data access",
Corpus Linguistics 2013. Abstract Book. Lancaster: UCREL, pp. 23-25.
L<PDF|https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4485/file/Ba%c5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf>