Update Readme and improve GitHub wrong handling of unicode characters Change-Id: Ie89f02064f0cfa99cf8b277d449a82a7b4348786

commit: 55fc212d3bfa42d1105d3ad8515adfd64aa028ff [log] [tgz]
author: Akron <nils@diewald-online.de> Wed Jul 27 13:24:39 2022 +0200
committer: Akron <nils@diewald-online.de> Wed Jul 27 13:24:39 2022 +0200
tree: 4b837bf8ba675a2f8da75da42ca09c40a62a25c4
parent: 64f7faead93871ffa3e3612c823b267846ad35d4 [diff] [blame]
diff --git a/Readme.pod b/Readme.pod
index 7a2eb49..db7a3bf 100644
--- a/Readme.pod
+++ b/Readme.pod

@@ -326,6 +326,15 @@
 In case the C<Text> path is omitted, the whole document will be extracted.
 On the document level, the postfix wildcard C<*> is supported.
 
+=item B<--lang>
+
+Preferred language for metadata fields. In case multiple titles are
+given (on any level) with different C<xml:lang> attributes,
+the language given is preferred.
+Because titles may have different sources and different priorities,
+non-specific language titles may still be preferred in case the title
+source has a higher priority.
+
 
 =item B<--log|-l>
 
@@ -462,8 +471,8 @@
 
 =head1 About KorAP-XML
 
-KorAP-XML (Bański et al. 2012) is an implementation of the KorAP
-data model (Bański et al. 2013), where text data are stored physically
+KorAP-XML (Banski et al. 2012) is an implementation of the KorAP
+data model (Banski et al. 2013), where text data are stored physically
 separated from their interpretations (i.e. annotations).
 A text document in KorAP-XML therefore consists of several files
 containing primary data, metadata and annotations.
@@ -490,7 +499,7 @@
 The C<data.xml> contains the primary data, the C<header.xml> contains
 the metadata, and the annotation layers are stored in subfolders
 like C<base>, C<struct> or C<corenlp>
-(so-called "foundries"; Bański et al. 2013).
+(so-called "foundries"; Banski et al. 2013).
 
 Metadata is available in the TEI-P5 variant I5
 (Lüngen and Sperberg-McQueen 2012). See the documentation in
@@ -551,15 +560,15 @@
 
 =head2 References
 
-Piotr Bański, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
+Piotr Banski, Cyril Belica, Helge Krause, Marc Kupietz, Carsten Schnober, Oliver Schonefeld, and Andreas Witt (2011):
 KorAP data model: first approximation, December.
 
-Piotr Bański, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
+Piotr Banski, Peter M. Fischer, Elena Frick, Erik Ketzan, Marc Kupietz, Carsten Schnober, Oliver Schonefeld and Andreas Witt (2012):
 "The New IDS Corpus Analysis Platform: Challenges and Prospects",
 Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012).
 L<PDF|http://www.lrec-conf.org/proceedings/lrec2012/pdf/789_Paper.pdf>
 
-Piotr Bański, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
+Piotr Banski, Elena Frick, Michael Hanl, Marc Kupietz, Carsten Schnober and Andreas Witt (2013):
 "Robust corpus architecture: a new look at virtual collections and data access",
 Corpus Linguistics 2013. Abstract Book. Lancaster: UCREL, pp. 23-25.
 L<PDF|https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/4485/file/Ba%c5%84ski_Frick_Hanl_Robust_corpus_architecture_2013.pdf>
commit	55fc212d3bfa42d1105d3ad8515adfd64aa028ff	[log] [tgz]
author	Akron <nils@diewald-online.de>	Wed Jul 27 13:24:39 2022 +0200
committer	Akron <nils@diewald-online.de>	Wed Jul 27 13:24:39 2022 +0200
tree	4b837bf8ba675a2f8da75da42ca09c40a62a25c4
parent	64f7faead93871ffa3e3612c823b267846ad35d4 [diff] [blame]