blob: 55212cac0a9dc1ab70dfbca34d4ef2998bd93483 [file] [log] [blame]
- korapxml2conllu:
- the sigle-pattern option now affects the entire sigle
- handle docid attributes correctly if they are in a different line than their parent element <layer>
- Improve identification of offset errors
0.5.0 2022-09-29
- korapxml2conllu:
- --word2vec|lm-training-data option added to print word2vec input format
- --extract-metadata-regex added to extract some metadata values as context input for language model training
- by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise)
- use morpho.xml if present when run on base zips
- new option -c <columns>
- conllu2korapxml:
- ignore _-lemmas
- handle UDPipe comments
- ignore non-interpretable comments
- improve error handling for missing text ids and offsets
0.4.1 2021-07-31
- korapxml2conllu: fix patterns not extracted for last texts in archive
0.4 2021-07-29
- korapxml2conllu option -e <regex> added to extract element/attributes to comments
0.3 2021-02-15
- Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip
0.2 2021-02-12
- Convert also KorAP-XML base zips
0.1 2020-09-23
- Initial release to GitHub.