| 0.6.2 2024-01-24 |
| - Bump minimal perl version to 5.36 to improve unicode handling. |
| - korapxml2conllu |
| - Use implicit default utf8 encoding instead of explicit de/encodes. Speeds up processing by 10%. |
| |
| 0.6.1 2023-03-22 |
| - conllu2korapxml: |
| - Fix append for filehandle output. |
| |
| 0.6.0 2023-01-13 |
| - korapxml2conllu: |
| - the sigle-pattern option now affects the entire sigle |
| - handle docid attributes correctly if they are in a different line than their parent element <layer> |
| - Improve identification of offset errors |
| |
| 0.5.0 2022-09-29 |
| - korapxml2conllu: |
| - --word2vec|lm-training-data option added to print word2vec input format |
| - --extract-metadata-regex added to extract some metadata values as context input for language model training |
| - by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise) |
| - use morpho.xml if present when run on base zips |
| - new option -c <columns> |
| - conllu2korapxml: |
| - ignore _-lemmas |
| - handle UDPipe comments |
| - ignore non-interpretable comments |
| - improve error handling for missing text ids and offsets |
| |
| 0.4.1 2021-07-31 |
| - korapxml2conllu: fix patterns not extracted for last texts in archive |
| |
| 0.4 2021-07-29 |
| - korapxml2conllu option -e <regex> added to extract element/attributes to comments |
| |
| 0.3 2021-02-15 |
| - Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip |
| |
| 0.2 2021-02-12 |
| - Convert also KorAP-XML base zips |
| |
| 0.1 2020-09-23 |
| - Initial release to GitHub. |