blob: 45d24ca55af21eac2ed3770c55c6c7a738e64ec2 [file] [log] [blame]
2.7.2 2026-03-05
- Fix XML parser error caused by elements (e.g. <ref>) whose
attributes span multiple lines.
- Progress bar now writes directly to /dev/tty (CON on Windows)
instead of stderr, so it is not captured by log redirection.
Automatically disabled when no controlling terminal is available
(e.g. detached container or CI environment).
2.7.1 2026-03-05
- Fix parser error when closing body and text tags
appear on the same line.
2.7.0 2026-03-03
- Upgrade KorAP-Tokenizer to v2.4.0
with fixes for soft hyphens, thousands separators, and
support for German sensitive spelling forms, separeted by colons, slashes, and brackets.
2.6.2 2025-12-10
- Upgrade KorAP-Tokenizer to v2.3.0 (resolves issues with
gendersternchen after hyphens, emoji clusters, and Wikipedia templates).
- Upgrade Java dependency to 21.
- Added --progress option.
2.6.1 2025-04-16
- Fix ASCII entity resolution.
- Make KorAP-Tokenizer heap size configurable via environment
variable KORAPXMLTEI_TOKENIZER_HEAP_SIZE.
2.6.0 2024-11-11
- Add -o parameter.
- Add support for inline dependency relations.
- Add support for --auto-textsigle.
- Add support for multiple input files.
2.5.0 2024-01-24
- Upgrade minimal Perl version to 5.36 to improve
unicode handling.
- Upgrade KorAP-Tokenizer to v2.2.5 and Java to 17 to
improve unicode handling.
2.4.4 2023-04-25
- Allow line-breaks in text only lines.
2.4.3 2023-03-02
- Allow closing elements to start with "text".
2.4.2 2023-02-10
- Improve checks for numerical annotation bounds.
2.4.1 2023-02-07
- Fix test.
2.4.0 2023-02-07
- Conversion of standard TEI P5 should now work, at least
in some cases.
- Option --xmlid-to-textsigle <from-regex>@<to-c/to-d/to-t>
added to convert standard P5 text id attributes to I5
sigles with three parts.
- Add --no-tokenizer parameter as a requirement
for relying on inline tokens only.
2.3.4 2022-11-09
- Improve stability of XML entity replacement.
- Check version for script and KorAP-Tokenizer
library when requested.
2.3.3 2022-03-30
- Load KorAP-Tokenizer only on request.
2.3.2 2022-03-23
- Do not reference metadata.xml
- Remove schema references from header files.
- Improve test suite for unability to use
KorAP-Tokenizer.
2.3.1 2022-01-14 Release
- Improve script handling of broken data
- Improve handling of unknown header types
- Check for valid sigles to avoid broken directories
- Introduce exclusivity for inline tokens handling.
- Use single dash for STDIN.
- Update KorAP-Tokenizer to v2.2.2 (single quote, "du." bug fixes)
2.2.0 2021-08-26 Release
- Remove unnecessary branch in recursive call
- Support inline-structures parameter
- Introduce --base-foundry, --data-file, and --header-file parameters
- Introduce --tokens-file parameter
- Introduce --skip-inline-tokens parameter
- Minor cleanups and improvements
- Introduce --skip-inline-tags parameter
- Introduce KorAP::XML::TEI::Inline class
- Introduce --skip-inline-token-annotations parameter
- Deprecate KORAPXMLTEI_INLINE environment variable
in favor of --skip-inline-token-annotations
1.0.0 2021-02-18 Release
- -s option added that uses sentence boundaries
provided by the KorAP tokenizer (-tk)
- Tokenizer invocation comments removed from KorAP XML output
- Indentation of </span> tags fixed
- Character entities used in DeReKo are automatically
replaced by their corresponding characters
- Resources defined in Makefile
- Fixed possible IO deadlock with KorAP tokenizer
- Simplified debugging by combining with X::C::T line numbers
- Support inline-tokens parameter
- Move verbose code documentation to trailing
script section
0.03 2021-01-12
- Update KorAP-Tokenizer to released 2.0 version
- Improve test suite for recent version
of Mojolicious.
0.02 2020-11-27
- Update KorAP-Tokenizer to v2.0.0.
- Switch input encoding based on XML
processing instruction.
- Fix handling of UTF-8 in sigles.
0.01 2020-09-28
- Initial release to GitHub.