Added documentation for supported I5 metadata fields

Change-Id: I9af7e848533216386c8de9e5873db6b28ad2159d
diff --git a/Changes b/Changes
index 906a0d0..5f5b710 100644
--- a/Changes
+++ b/Changes
@@ -1,4 +1,4 @@
-0.39 2020-01-16
+0.39 2020-02-11
         - Added Talismane support.
         - Added "distributor" field to I5 metadata.
         - Added DGD link field to I5 metadata.
@@ -6,6 +6,9 @@
         - Added support for DGD pseudo-sentences
           based on anchor milestones.
         - Added brief explanation of the format.
+        - Fixed parsing of editionStmt.
+        - Added documentation for supported I5 metadata
+          fields.
 
 0.38 2019-05-22
         - Stop file processing when base tokenization
diff --git a/Readme.pod b/Readme.pod
index eac3a7e..6627baa 100644
--- a/Readme.pod
+++ b/Readme.pod
@@ -463,8 +463,11 @@
 (so-called "foundries"; Bański et al. 2013).
 
 Metadata is available in the TEI-P5 variant I5
-(Lüngen and Sperberg-McQueen 2012), while annotations correspond to
-a variant of the TEI-P5 feature structures (TEI Consortium; Lee et al. 2004).
+(Lüngen and Sperberg-McQueen 2012). See the documentation in
+L<KorAP::XML::Meta::I5> for translatable fields.
+
+Annotations correspond to a variant of the TEI-P5 feature structures
+(TEI Consortium; Lee et al. 2004).
 
 Multiple KorAP-XML documents are organized on three levels following
 the "IDS Textmodell" (Lüngen and Sperberg-McQueen 2012):
diff --git a/lib/KorAP/XML/Krill.pm b/lib/KorAP/XML/Krill.pm
index d35396e..ec30176 100644
--- a/lib/KorAP/XML/Krill.pm
+++ b/lib/KorAP/XML/Krill.pm
@@ -414,15 +414,15 @@
 
 =head1 COPYRIGHT AND LICENSE
 
-Copyright (C) 2015-2018, L<IDS Mannheim|http://www.ids-mannheim.de/>
-Author: L<Nils Diewald|http://nils-diewald.de/>
+Copyright (C) 2015-2020, L<IDS Mannheim|https://www.ids-mannheim.de/>
+Author: L<Nils Diewald|https://nils-diewald.de/>
 
 KorAP::XML::Krill is developed as part of the
 L<KorAP|http://korap.ids-mannheim.de/>
 Corpus Analysis Platform at the
-L<Institute for the German Language (IDS)|http://ids-mannheim.de/>,
+L<Institute for the German Language (IDS)|https://www.ids-mannheim.de/>,
 member of the
-L<Leibniz-Gemeinschaft|http://www.leibniz-gemeinschaft.de/en/about-us/leibniz-competition/projekte-2011/2011-funding-line-2/>
+L<Leibniz-Gemeinschaft|https://www.leibniz-gemeinschaft.de/en/>
 and supported by the L<KobRA|http://www.kobra.tu-dortmund.de> project,
 funded by the
 L<Federal Ministry of Education and Research (BMBF)|http://www.bmbf.de/en/>.
diff --git a/lib/KorAP/XML/Meta/I5.pm b/lib/KorAP/XML/Meta/I5.pm
index 48565ae..2df85ad 100644
--- a/lib/KorAP/XML/Meta/I5.pm
+++ b/lib/KorAP/XML/Meta/I5.pm
@@ -408,3 +408,152 @@
 
 1;
 
+
+__END__
+
+=pod
+
+=encoding utf8
+
+=head1 NAME
+
+KorAP::XML::Meta::I5 - Parses I5 meta data of a KorAP-XML document
+
+=head1 DESCRIPTION
+
+Parses I5 meta data of a KorAP-XML document.
+
+Following the data model, all 3 levels of metadata are parsed, while not all
+metadata levels contain the same information. The precedence is that metadata
+defined on the text level will override metadata on the document level. And
+metadata on the document level will override metadata on the corpus level.
+
+=head2 Metadata categories
+
+Krill currently supports the following types of metadata to be indexed.
+They differ especially in the way they can be used to construct a virtual corpus.
+
+=over 2
+
+=item B<String>
+
+A simple string representation of a meta data field. Useful for fixed values,
+such as I<corpusSigle> or I<language>.
+
+=item B<Text>
+
+A string representation that will be indexed as a text, so fulltext search
+(like phrase search) is supported. Useful for values where partial matches are
+useful, like I<title> or I<author>.
+
+=item B<Keywords>
+
+Multiple string representations. Identical to string, but supports multiple
+values in the same field. Useful for multiple given values such as I<textClass>.
+
+=item B<Attachement>
+
+Values that can't be used for the construction of virtual corpora, but are stored
+per document and can be retrieved. Useful for static data to be retrieved such as
+I<reference> or I<externalLink>.
+
+=item B<Date>
+
+A representation of a date, that can later be used for date range queries to construct
+virtual corpora. Useful for all date related information, such as I<pubDate> or I<createDate>.
+
+=back
+
+=head2 Metadata fields
+
+Currently L<KorAP::XML::Meta::I5> recognizes and transfers the following fields, given as
+a SCSS selector rule (plus C<@> for attribute values) followed by the field name and
+the metadata category.
+The order may indicate a field to be overwritten.
+
+=over 2
+
+=item B<On all levels>
+
+  (analytic, monogr) editor[role=translator]   translator            ATTACHEMENT
+  pubPlace@key                                 pubPlaceKey           STRING
+  pubPlace                                     pubPlace              STRING
+  imprint publisher                            publisher             ATTACHEMENT
+  textDesc textType                            textType              STRING
+  textDesc textDomain                          textDomain            STRING
+  textDesc textTypeArt                         textTypeArt           STRING
+  textDesc textTypeRef                         textTypeRef           STRING
+  pubDate[type=year]
+    & pubDate[type=month]
+    & pubDate[type=day]                        pubDate               DATE
+  creatDate                                    creationDate          DATE
+  textClass catRef@target                      textClass             KEYWORDS
+  textClass h.keywords > keyTerm               keywords              KEYWORDS
+  biblFull editionStmt                         biblEditionStatement  ATTACHEMENT
+  fileDesc editionStmt                         fileEditionStatement  ATTACHEMENT
+  fileDesc publicationStmt > availability      availability          STRING
+  fileDesc publicationStmt > distributor       distributor           ATTACHEMENT
+  profileDesc > langUsage > language[id]@id    language              STRING
+
+=item B<On text level>
+
+  textSigle                                    textSigle             STRING
+  fileDesc > titleStmt > t.title               title                 TEXT
+  (analytic, monogr) h.title[type=main]        title                 TEXT
+  (analytic, monogr) h.title[type=sub]         subTitle              TEXT
+  (analytic, monogr) h.author                  author                TEXT
+  (analytic, monogr) editor[role!=translator]  editor                ATTACHEMENT
+  sourceDesc reference[type=complete]          reference             ATTACHEMENT
+  textDesc > column                            textColumn            STRING
+  biblStruct biblScope[type=pp]                srcPages              ATTACHEMENT
+
+=item B<On document level>
+
+  dokumentSigle                                docSigle              STRING
+  fileDesc > titleStmt > d.title               docTitle              TEXT
+  (analytic, monogr) h.title[type=main]        docTitle              TEXT
+  (analytic, monogr) h.title[type=sub]         docSubTitle           TEXT
+  (analytic, monogr) h.author                  docAuthor             TEXT
+  (analytic, monogr) editor[role!=translator]  docEditor             ATTACHEMENT
+
+=item B<On corpus level>
+
+  korpusSigle                                  corpusSigle           STRING
+  fileDesc > titleStmt > c.title               corpusTitle           TEXT
+  (analytic, monogr) h.title[type=main]        corpusTitle           TEXT
+  (analytic, monogr) h.title[type=sub]         corpusSubTitle        TEXT
+  (analytic, monogr) h.author                  corpusAuthor          TEXT
+  (analytic, monogr) editor[role!=translator]  corpusEditor          ATTACHEMENT
+
+=back
+
+Some fields are specially formated, like C<srcPages> or dates.
+In case of Wikipedia texts, C<sourceDesc reference[type=complete]> will be
+turned into an C<externalLink>. In case of DGD/AGD documents, an external link
+to the DGD will be created as C<externalLink>.
+
+
+=head1 AVAILABILITY
+
+  https://github.com/KorAP/KorAP-XML-Krill
+
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright (C) 2015-2020, L<IDS Mannheim|https://www.ids-mannheim.de/>
+Author: L<Nils Diewald|https://nils-diewald.de/>
+
+KorAP::XML::Krill is developed as part of the
+L<KorAP|https://korap.ids-mannheim.de/>
+Corpus Analysis Platform at the
+L<Institute for the German Language (IDS)|https://www.ids-mannheim.de/>,
+member of the
+L<Leibniz-Gemeinschaft|https://www.leibniz-gemeinschaft.de/en/>
+and supported by the L<KobRA|http://www.kobra.tu-dortmund.de> project,
+funded by the
+L<Federal Ministry of Education and Research (BMBF)|http://www.bmbf.de/en/>.
+
+KorAP::XML::Krill is free software published under the
+L<BSD-2 License|https://raw.githubusercontent.com/KorAP/KorAP-XML-Krill/master/LICENSE>.
+
+=cut
diff --git a/script/korapxml2krill b/script/korapxml2krill
index 99d0300..fd95337 100644
--- a/script/korapxml2krill
+++ b/script/korapxml2krill
@@ -1573,8 +1573,11 @@
 (so-called "foundries"; Bański et al. 2013).
 
 Metadata is available in the TEI-P5 variant I5
-(Lüngen and Sperberg-McQueen 2012), while annotations correspond to
-a variant of the TEI-P5 feature structures (TEI Consortium; Lee et al. 2004).
+(Lüngen and Sperberg-McQueen 2012). See the documentation in
+L<KorAP::XML::Meta::I5> for translatable fields.
+
+Annotations correspond to a variant of the TEI-P5 feature structures
+(TEI Consortium; Lee et al. 2004).
 
 Multiple KorAP-XML documents are organized on three levels following
 the "IDS Textmodell" (Lüngen and Sperberg-McQueen 2012):