blob: 9f2fdf90101b888e5d2d8d21e7b048a7d837a728 [file] [log] [blame]
0.6.3 2024-06-04
- Trim filenames to fix double space after filename metadata
- Set permissions of zip contents to 666
- Allow different foundries for morpho and dependency annotations
- Readme: Drop early stage warning
- Drop cherry-pick unfriendly test count prediction
- conllu2korapxml:
- escape &, <, >
- convert upos column to upos features
0.6.2 2024-01-24
- Bump minimal perl version to 5.36 to improve unicode handling.
- korapxml2conllu
- Use implicit default utf8 encoding instead of explicit de/encodes. Speeds up processing by 10%.
0.6.1 2023-03-22
- conllu2korapxml:
- Fix append for filehandle output.
0.6.0 2023-01-13
- korapxml2conllu:
- the sigle-pattern option now affects the entire sigle
- handle docid attributes correctly if they are in a different line than their parent element <layer>
- Improve identification of offset errors
0.5.0 2022-09-29
- korapxml2conllu:
- --word2vec|lm-training-data option added to print word2vec input format
- --extract-metadata-regex added to extract some metadata values as context input for language model training
- by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise)
- use morpho.xml if present when run on base zips
- new option -c <columns>
- conllu2korapxml:
- ignore _-lemmas
- handle UDPipe comments
- ignore non-interpretable comments
- improve error handling for missing text ids and offsets
0.4.1 2021-07-31
- korapxml2conllu: fix patterns not extracted for last texts in archive
0.4 2021-07-29
- korapxml2conllu option -e <regex> added to extract element/attributes to comments
0.3 2021-02-15
- Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip
0.2 2021-02-12
- Convert also KorAP-XML base zips
0.1 2020-09-23
- Initial release to GitHub.