blob: c3762275326cdd2ed7e40c6bcc1d0aae4c994ebb [file] [log] [blame]
- --word2vec|lm-training-data option added to print word2vec input format
- --extract-metadata-regex added to extract some metadata values as context input for language model training
- by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise)
- korapxml2conllu: use morpho.xml if present when run on base zips
- korapxml2conllu: new option -c <columns>
- conllu2korapxml: ignore _-lemmas
- conllu2korapxml: handle UDPipe comments
- conllu2korapxml: ignore non-interpretable comments
0.4.1 2021-07-31
- korapxml2conllu: fix patterns not extracted for last texts in archive
0.4 2021-07-29
- korapxml2conllu option -e <regex> added to extract element/attributes to comments
0.3 2021-02-15
- Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip
0.2 2021-02-12
- Convert also KorAP-XML base zips
0.1 2020-09-23
- Initial release to GitHub.