Marc Kupietz | 7e4cd6c | 2022-12-15 18:34:37 +0100 | [diff] [blame^] | 1 | - korapxml2conllu: |
| 2 | - handle docid attributes correctly if they are in a different line than their parent element <layer> |
| 3 | |
Marc Kupietz | 12f64e4 | 2022-09-29 08:58:16 +0200 | [diff] [blame] | 4 | 0.5.0 2022-09-29 |
| 5 | - korapxml2conllu: |
| 6 | - --word2vec|lm-training-data option added to print word2vec input format |
| 7 | - --extract-metadata-regex added to extract some metadata values as context input for language model training |
| 8 | - by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise) |
| 9 | - use morpho.xml if present when run on base zips |
| 10 | - new option -c <columns> |
| 11 | - conllu2korapxml: |
| 12 | - ignore _-lemmas |
| 13 | - handle UDPipe comments |
| 14 | - ignore non-interpretable comments |
| 15 | - improve error handling for missing text ids and offsets |
Marc Kupietz | f1fdc19 | 2021-10-08 13:29:59 +0200 | [diff] [blame] | 16 | |
Marc Kupietz | a7d90c6 | 2021-07-31 23:48:13 +0200 | [diff] [blame] | 17 | 0.4.1 2021-07-31 |
| 18 | - korapxml2conllu: fix patterns not extracted for last texts in archive |
| 19 | |
Marc Kupietz | 6beca9d | 2021-07-29 18:26:09 +0200 | [diff] [blame] | 20 | 0.4 2021-07-29 |
Marc Kupietz | eb7d06a | 2021-03-19 16:29:16 +0100 | [diff] [blame] | 21 | - korapxml2conllu option -e <regex> added to extract element/attributes to comments |
Marc Kupietz | 0ab8a2c | 2021-03-19 16:21:00 +0100 | [diff] [blame] | 22 | |
Marc Kupietz | 22858f8 | 2021-02-15 14:22:05 +0100 | [diff] [blame] | 23 | 0.3 2021-02-15 |
Marc Kupietz | 79ba1e5 | 2021-02-12 17:26:54 +0100 | [diff] [blame] | 24 | - Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip |
| 25 | |
Marc Kupietz | b96c386 | 2021-02-12 08:33:44 +0100 | [diff] [blame] | 26 | 0.2 2021-02-12 |
Marc Kupietz | d845583 | 2021-02-11 17:30:29 +0100 | [diff] [blame] | 27 | - Convert also KorAP-XML base zips |
| 28 | |
Marc Kupietz | 396b4d6 | 2021-02-12 08:29:35 +0100 | [diff] [blame] | 29 | 0.1 2020-09-23 |
| 30 | - Initial release to GitHub. |