Automatically replace entities with their corresponding characters
Source of the symbolic entities is the entity file from the TEI-I5 DTD
http://corpora.ids-mannheim.de/I5/DTD/ids-lat1.ent which contains all
entities that have been used in DeReKo. The list is very similar to
the Mathematical, Greek and Symbolic characters for XHTML
http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent, but not identical.
Numeric decimal and hexadecimal entities are replaced, too
Change-Id: Id00376c6953e9ac96ef04703872f38d37ef68096
diff --git a/Changes b/Changes
index 106043e..d3252c4 100644
--- a/Changes
+++ b/Changes
@@ -1,6 +1,7 @@
- -s option added that uses sentence boundaries provided by the KorAP tokenizer (-tk)
- tokenizer invocation comments removed from KorAP XML output
- indentation of </span> tags fixed
+ - character entities that used in DeReKo are automatically replaced by their corresponding characters
0.03 2021-01-12
- Update KorAP-Tokenizer to released 2.0 version
- Improve test suite for recent version