Akron | 249fc83 | 2024-06-04 16:36:44 +0200 | [diff] [blame] | 1 | 0.6.3 2024-06-04 |
| 2 | - Trim filenames to fix double space after filename metadata |
| 3 | - Set permissions of zip contents to 666 |
| 4 | - Allow different foundries for morpho and dependency annotations |
| 5 | - Readme: Drop early stage warning |
| 6 | - Drop cherry-pick unfriendly test count prediction |
| 7 | - conllu2korapxml: |
| 8 | - escape &, <, > |
| 9 | - convert upos column to upos features |
| 10 | |
Marc Kupietz | 6dc97b7 | 2024-01-24 13:01:33 +0100 | [diff] [blame] | 11 | 0.6.2 2024-01-24 |
| 12 | - Bump minimal perl version to 5.36 to improve unicode handling. |
| 13 | - korapxml2conllu |
| 14 | - Use implicit default utf8 encoding instead of explicit de/encodes. Speeds up processing by 10%. |
| 15 | |
Akron | e24c933 | 2023-03-22 10:59:33 +0100 | [diff] [blame] | 16 | 0.6.1 2023-03-22 |
| 17 | - conllu2korapxml: |
| 18 | - Fix append for filehandle output. |
| 19 | |
Marc Kupietz | 66bb495 | 2023-01-13 15:04:38 +0100 | [diff] [blame] | 20 | 0.6.0 2023-01-13 |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 21 | - korapxml2conllu: |
Marc Kupietz | 534df18 | 2022-12-16 15:00:30 +0100 | [diff] [blame] | 22 | - the sigle-pattern option now affects the entire sigle |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 23 | - handle docid attributes correctly if they are in a different line than their parent element <layer> |
Akron | f2b0bba | 2022-12-16 18:00:08 +0100 | [diff] [blame] | 24 | - Improve identification of offset errors |
Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame] | 25 | |
Marc Kupietz | 12f64e4 | 2022-09-29 08:58:16 +0200 | [diff] [blame] | 26 | 0.5.0 2022-09-29 |
| 27 | - korapxml2conllu: |
| 28 | - --word2vec|lm-training-data option added to print word2vec input format |
| 29 | - --extract-metadata-regex added to extract some metadata values as context input for language model training |
| 30 | - by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise) |
| 31 | - use morpho.xml if present when run on base zips |
| 32 | - new option -c <columns> |
| 33 | - conllu2korapxml: |
| 34 | - ignore _-lemmas |
| 35 | - handle UDPipe comments |
| 36 | - ignore non-interpretable comments |
| 37 | - improve error handling for missing text ids and offsets |
Marc Kupietz | f1fdc19 | 2021-10-08 13:29:59 +0200 | [diff] [blame] | 38 | |
Marc Kupietz | a7d90c6 | 2021-07-31 23:48:13 +0200 | [diff] [blame] | 39 | 0.4.1 2021-07-31 |
| 40 | - korapxml2conllu: fix patterns not extracted for last texts in archive |
| 41 | |
Marc Kupietz | 6beca9d | 2021-07-29 18:26:09 +0200 | [diff] [blame] | 42 | 0.4 2021-07-29 |
Marc Kupietz | eb7d06a | 2021-03-19 16:29:16 +0100 | [diff] [blame] | 43 | - korapxml2conllu option -e <regex> added to extract element/attributes to comments |
Marc Kupietz | 0ab8a2c | 2021-03-19 16:21:00 +0100 | [diff] [blame] | 44 | |
Marc Kupietz | 22858f8 | 2021-02-15 14:22:05 +0100 | [diff] [blame] | 45 | 0.3 2021-02-15 |
Marc Kupietz | 79ba1e5 | 2021-02-12 17:26:54 +0100 | [diff] [blame] | 46 | - Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip |
| 47 | |
Marc Kupietz | b96c386 | 2021-02-12 08:33:44 +0100 | [diff] [blame] | 48 | 0.2 2021-02-12 |
Marc Kupietz | d845583 | 2021-02-11 17:30:29 +0100 | [diff] [blame] | 49 | - Convert also KorAP-XML base zips |
| 50 | |
Marc Kupietz | 396b4d6 | 2021-02-12 08:29:35 +0100 | [diff] [blame] | 51 | 0.1 2020-09-23 |
| 52 | - Initial release to GitHub. |