blob: 45d24ca55af21eac2ed3770c55c6c7a738e64ec2 [file] [log] [blame]
Marc Kupietz32781e92026-03-05 18:32:43 +010012.7.2 2026-03-05
2 - Fix XML parser error caused by elements (e.g. <ref>) whose
3 attributes span multiple lines.
4 - Progress bar now writes directly to /dev/tty (CON on Windows)
5 instead of stderr, so it is not captured by log redirection.
6 Automatically disabled when no controlling terminal is available
7 (e.g. detached container or CI environment).
8
Marc Kupietzff061ef2026-03-05 09:59:35 +010092.7.1 2026-03-05
10 - Fix parser error when closing body and text tags
11 appear on the same line.
12
Marc Kupietz67ee44e2026-03-03 10:04:48 +0100132.7.0 2026-03-03
14 - Upgrade KorAP-Tokenizer to v2.4.0
15 with fixes for soft hyphens, thousands separators, and
16 support for German sensitive spelling forms, separeted by colons, slashes, and brackets.
17
Marc Kupietz4ad648e2025-12-10 10:38:46 +0100182.6.2 2025-12-10
19 - Upgrade KorAP-Tokenizer to v2.3.0 (resolves issues with
20 gendersternchen after hyphens, emoji clusters, and Wikipedia templates).
21 - Upgrade Java dependency to 21.
Marc Kupietz2115ecc2025-12-10 11:37:03 +010022 - Added --progress option.
Marc Kupietz4ad648e2025-12-10 10:38:46 +010023
Marc Kupietzb6fd6bc2025-04-16 12:47:26 +0200242.6.1 2025-04-16
25 - Fix ASCII entity resolution.
Marc Kupietzd254f5c2025-04-16 10:37:08 +020026 - Make KorAP-Tokenizer heap size configurable via environment
27 variable KORAPXMLTEI_TOKENIZER_HEAP_SIZE.
28
Marc Kupietz5b3f1d82024-07-05 17:50:55 +0200292.6.0 2024-11-11
Akron132bdeb2024-06-06 14:28:56 +020030 - Add -o parameter.
Akron6b1f26b2024-09-19 11:35:32 +020031 - Add support for inline dependency relations.
Marc Kupietzfc3a0ee2024-07-05 16:58:16 +020032 - Add support for --auto-textsigle.
Marc Kupietz5b3f1d82024-07-05 17:50:55 +020033 - Add support for multiple input files.
Akron132bdeb2024-06-06 14:28:56 +020034
Akron6b1f26b2024-09-19 11:35:32 +0200352.5.0 2024-01-24
Akron568b22f2024-01-23 10:12:34 +010036 - Upgrade minimal Perl version to 5.36 to improve
37 unicode handling.
38 - Upgrade KorAP-Tokenizer to v2.2.5 and Java to 17 to
39 improve unicode handling.
40
Akronec503252023-04-24 18:03:17 +0200412.4.4 2023-04-25
42 - Allow line-breaks in text only lines.
43
Akron72f4a882023-03-02 09:48:14 +0100442.4.3 2023-03-02
45 - Allow closing elements to start with "text".
46
Akron997aa222023-02-10 11:26:28 +0100472.4.2 2023-02-10
48 - Improve checks for numerical annotation bounds.
49
Akronfcff7342023-02-07 14:05:15 +0100502.4.1 2023-02-07
51 - Fix test.
52
Akronfc2a82a2023-02-07 11:29:11 +0100532.4.0 2023-02-07
Marc Kupietza671ae52022-12-22 16:28:14 +010054 - Conversion of standard TEI P5 should now work, at least
55 in some cases.
56 - Option --xmlid-to-textsigle <from-regex>@<to-c/to-d/to-t>
57 added to convert standard P5 text id attributes to I5
58 sigles with three parts.
Akronb93fabb2023-01-13 12:05:44 +010059 - Add --no-tokenizer parameter as a requirement
60 for relying on inline tokens only.
Marc Kupietza671ae52022-12-22 16:28:14 +010061
Akron2520a342022-03-29 18:18:05 +0200622.3.4 2022-11-09
Akron85269c02022-11-07 14:03:31 +010063 - Improve stability of XML entity replacement.
Akron2520a342022-03-29 18:18:05 +020064 - Check version for script and KorAP-Tokenizer
65 library when requested.
Akron85269c02022-11-07 14:03:31 +010066
Akron2520a342022-03-29 18:18:05 +0200672.3.3 2022-03-30
Akronbd4281e2022-03-28 08:31:40 +020068 - Load KorAP-Tokenizer only on request.
69
Akrond708a612022-03-21 16:00:01 +0100702.3.2 2022-03-23
Akron540fd622022-03-21 18:20:05 +010071 - Do not reference metadata.xml
Akrond708a612022-03-21 16:00:01 +010072 - Remove schema references from header files.
Akron4ee372a2022-02-24 17:54:24 +010073 - Improve test suite for unability to use
74 KorAP-Tokenizer.
Akron540fd622022-03-21 18:20:05 +010075
Marc Kupietz0bca4f12022-01-14 13:24:22 +0100762.3.1 2022-01-14 Release
Akrona3799ce2021-10-15 16:27:30 +020077 - Improve script handling of broken data
78 - Improve handling of unknown header types
79 - Check for valid sigles to avoid broken directories
80 - Introduce exclusivity for inline tokens handling.
Akrona2cb2812021-10-30 10:29:08 +020081 - Use single dash for STDIN.
Marc Kupietz0bca4f12022-01-14 13:24:22 +010082 - Update KorAP-Tokenizer to v2.2.2 (single quote, "du." bug fixes)
Akrona3799ce2021-10-15 16:27:30 +020083
842.2.0 2021-08-26 Release
Akrond658df72021-02-18 18:58:56 +010085 - Remove unnecessary branch in recursive call
Akrondd0be8f2021-02-18 19:29:41 +010086 - Support inline-structures parameter
Akron26a71522021-02-19 10:27:37 +010087 - Introduce --base-foundry, --data-file, and --header-file parameters
Akron91705d72021-02-19 10:59:45 +010088 - Introduce --tokens-file parameter
Akron75d63142021-02-23 18:40:56 +010089 - Introduce --skip-inline-tokens parameter
Akrond3e1d282021-02-24 14:51:27 +010090 - Minor cleanups and improvements
Akron54c3ff12021-02-25 11:33:37 +010091 - Introduce --skip-inline-tags parameter
Akroneb12e232021-02-25 13:49:50 +010092 - Introduce KorAP::XML::TEI::Inline class
Akron692d17d2021-03-05 13:21:03 +010093 - Introduce --skip-inline-token-annotations parameter
94 - Deprecate KORAPXMLTEI_INLINE environment variable
95 in favor of --skip-inline-token-annotations
Akrond658df72021-02-18 18:58:56 +010096
Akrona3799ce2021-10-15 16:27:30 +0200971.0.0 2021-02-18 Release
Akrond3e1d282021-02-24 14:51:27 +010098 - -s option added that uses sentence boundaries
99 provided by the KorAP tokenizer (-tk)
Marc Kupietza1421f02021-02-18 15:32:38 +0100100 - Tokenizer invocation comments removed from KorAP XML output
101 - Indentation of </span> tags fixed
Akrond3e1d282021-02-24 14:51:27 +0100102 - Character entities used in DeReKo are automatically
103 replaced by their corresponding characters
Marc Kupietza1421f02021-02-18 15:32:38 +0100104 - Resources defined in Makefile
105 - Fixed possible IO deadlock with KorAP tokenizer
Akron4e3c7e32021-02-18 15:19:53 +0100106 - Simplified debugging by combining with X::C::T line numbers
Akron1a5271a2021-02-18 13:18:15 +0100107 - Support inline-tokens parameter
Akronf8088e62021-02-18 16:18:59 +0100108 - Move verbose code documentation to trailing
109 script section
Marc Kupietzeed4cb12021-02-17 19:39:32 +0100110
Akronf7084c42021-01-07 10:25:22 +01001110.03 2021-01-12
Marc Kupietzb505d442021-01-06 16:40:29 +0100112 - Update KorAP-Tokenizer to released 2.0 version
Akronf7084c42021-01-07 10:25:22 +0100113 - Improve test suite for recent version
114 of Mojolicious.
115
Marc Kupietz44b1f252020-11-26 16:31:40 +01001160.02 2020-11-27
Akronf7084c42021-01-07 10:25:22 +0100117 - Update KorAP-Tokenizer to v2.0.0.
Akroneaa96232020-10-15 17:06:15 +0200118 - Switch input encoding based on XML
119 processing instruction.
Marc Kupietz44b1f252020-11-26 16:31:40 +0100120 - Fix handling of UTF-8 in sigles.
Akroneaa96232020-10-15 17:06:15 +0200121
Akron0c41ab32020-09-29 07:33:33 +02001220.01 2020-09-28
123 - Initial release to GitHub.