blob: c060b035e2a722ac80429ed4ba386b22c1ce77fe [file] [log] [blame]
2.5.0 2023-01-24
- Upgrade minimal Perl version to 5.36 to improve
unicode handling.
- Upgrade KorAP-Tokenizer to v2.2.5 and Java to 17 to
improve unicode handling.
2.4.4 2023-04-25
- Allow line-breaks in text only lines.
2.4.3 2023-03-02
- Allow closing elements to start with "text".
2.4.2 2023-02-10
- Improve checks for numerical annotation bounds.
2.4.1 2023-02-07
- Fix test.
2.4.0 2023-02-07
- Conversion of standard TEI P5 should now work, at least
in some cases.
- Option --xmlid-to-textsigle <from-regex>@<to-c/to-d/to-t>
added to convert standard P5 text id attributes to I5
sigles with three parts.
- Add --no-tokenizer parameter as a requirement
for relying on inline tokens only.
2.3.4 2022-11-09
- Improve stability of XML entity replacement.
- Check version for script and KorAP-Tokenizer
library when requested.
2.3.3 2022-03-30
- Load KorAP-Tokenizer only on request.
2.3.2 2022-03-23
- Do not reference metadata.xml
- Remove schema references from header files.
- Improve test suite for unability to use
KorAP-Tokenizer.
2.3.1 2022-01-14 Release
- Improve script handling of broken data
- Improve handling of unknown header types
- Check for valid sigles to avoid broken directories
- Introduce exclusivity for inline tokens handling.
- Use single dash for STDIN.
- Update KorAP-Tokenizer to v2.2.2 (single quote, "du." bug fixes)
2.2.0 2021-08-26 Release
- Remove unnecessary branch in recursive call
- Support inline-structures parameter
- Introduce --base-foundry, --data-file, and --header-file parameters
- Introduce --tokens-file parameter
- Introduce --skip-inline-tokens parameter
- Minor cleanups and improvements
- Introduce --skip-inline-tags parameter
- Introduce KorAP::XML::TEI::Inline class
- Introduce --skip-inline-token-annotations parameter
- Deprecate KORAPXMLTEI_INLINE environment variable
in favor of --skip-inline-token-annotations
1.0.0 2021-02-18 Release
- -s option added that uses sentence boundaries
provided by the KorAP tokenizer (-tk)
- Tokenizer invocation comments removed from KorAP XML output
- Indentation of </span> tags fixed
- Character entities used in DeReKo are automatically
replaced by their corresponding characters
- Resources defined in Makefile
- Fixed possible IO deadlock with KorAP tokenizer
- Simplified debugging by combining with X::C::T line numbers
- Support inline-tokens parameter
- Move verbose code documentation to trailing
script section
0.03 2021-01-12
- Update KorAP-Tokenizer to released 2.0 version
- Improve test suite for recent version
of Mojolicious.
0.02 2020-11-27
- Update KorAP-Tokenizer to v2.0.0.
- Switch input encoding based on XML
processing instruction.
- Fix handling of UTF-8 in sigles.
0.01 2020-09-28
- Initial release to GitHub.