Marc Kupietz | 6dc97b7 | 2024-01-24 13:01:33 +0100 | [diff] [blame^] | 1 | 0.6.2 2024-01-24 |
| 2 | - Bump minimal perl version to 5.36 to improve unicode handling. |
| 3 | - korapxml2conllu |
| 4 | - Use implicit default utf8 encoding instead of explicit de/encodes. Speeds up processing by 10%. |
| 5 | |
Akron | e24c933 | 2023-03-22 10:59:33 +0100 | [diff] [blame] | 6 | 0.6.1 2023-03-22 |
| 7 | - conllu2korapxml: |
| 8 | - Fix append for filehandle output. |
| 9 | |
Marc Kupietz | 66bb495 | 2023-01-13 15:04:38 +0100 | [diff] [blame] | 10 | 0.6.0 2023-01-13 |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 11 | - korapxml2conllu: |
Marc Kupietz | 534df18 | 2022-12-16 15:00:30 +0100 | [diff] [blame] | 12 | - the sigle-pattern option now affects the entire sigle |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 13 | - handle docid attributes correctly if they are in a different line than their parent element <layer> |
Akron | f2b0bba | 2022-12-16 18:00:08 +0100 | [diff] [blame] | 14 | - Improve identification of offset errors |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 15 | |
Marc Kupietz | 12f64e4 | 2022-09-29 08:58:16 +0200 | [diff] [blame] | 16 | 0.5.0 2022-09-29 |
| 17 | - korapxml2conllu: |
| 18 | - --word2vec|lm-training-data option added to print word2vec input format |
| 19 | - --extract-metadata-regex added to extract some metadata values as context input for language model training |
| 20 | - by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise) |
| 21 | - use morpho.xml if present when run on base zips |
| 22 | - new option -c <columns> |
| 23 | - conllu2korapxml: |
| 24 | - ignore _-lemmas |
| 25 | - handle UDPipe comments |
| 26 | - ignore non-interpretable comments |
| 27 | - improve error handling for missing text ids and offsets |
Marc Kupietz | f1fdc19 | 2021-10-08 13:29:59 +0200 | [diff] [blame] | 28 | |
Marc Kupietz | a7d90c6 | 2021-07-31 23:48:13 +0200 | [diff] [blame] | 29 | 0.4.1 2021-07-31 |
| 30 | - korapxml2conllu: fix patterns not extracted for last texts in archive |
| 31 | |
Marc Kupietz | 6beca9d | 2021-07-29 18:26:09 +0200 | [diff] [blame] | 32 | 0.4 2021-07-29 |
Marc Kupietz | eb7d06a | 2021-03-19 16:29:16 +0100 | [diff] [blame] | 33 | - korapxml2conllu option -e <regex> added to extract element/attributes to comments |
Marc Kupietz | 0ab8a2c | 2021-03-19 16:21:00 +0100 | [diff] [blame] | 34 | |
Marc Kupietz | 22858f8 | 2021-02-15 14:22:05 +0100 | [diff] [blame] | 35 | 0.3 2021-02-15 |
Marc Kupietz | 79ba1e5 | 2021-02-12 17:26:54 +0100 | [diff] [blame] | 36 | - Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip |
| 37 | |
Marc Kupietz | b96c386 | 2021-02-12 08:33:44 +0100 | [diff] [blame] | 38 | 0.2 2021-02-12 |
Marc Kupietz | d845583 | 2021-02-11 17:30:29 +0100 | [diff] [blame] | 39 | - Convert also KorAP-XML base zips |
| 40 | |
Marc Kupietz | 396b4d6 | 2021-02-12 08:29:35 +0100 | [diff] [blame] | 41 | 0.1 2020-09-23 |
| 42 | - Initial release to GitHub. |