blob: 77350ea6b8b2cf0f0adb0d918b99db229a16834b [file] [log] [blame]
Marc Kupietz7e4cd6c2022-12-15 18:34:37 +01001 - korapxml2conllu:
Marc Kupietz534df182022-12-16 15:00:30 +01002 - the sigle-pattern option now affects the entire sigle
Marc Kupietz7e4cd6c2022-12-15 18:34:37 +01003 - handle docid attributes correctly if they are in a different line than their parent element <layer>
4
Marc Kupietz12f64e42022-09-29 08:58:16 +020050.5.0 2022-09-29
6 - korapxml2conllu:
7 - --word2vec|lm-training-data option added to print word2vec input format
8 - --extract-metadata-regex added to extract some metadata values as context input for language model training
9 - by default sentence boundary information is now read from structure.xml files (use --s-bounds-from-morpho otherwise)
10 - use morpho.xml if present when run on base zips
11 - new option -c <columns>
12 - conllu2korapxml:
13 - ignore _-lemmas
14 - handle UDPipe comments
15 - ignore non-interpretable comments
16 - improve error handling for missing text ids and offsets
Marc Kupietzf1fdc192021-10-08 13:29:59 +020017
Marc Kupietza7d90c62021-07-31 23:48:13 +0200180.4.1 2021-07-31
19 - korapxml2conllu: fix patterns not extracted for last texts in archive
20
Marc Kupietz6beca9d2021-07-29 18:26:09 +0200210.4 2021-07-29
Marc Kupietzeb7d06a2021-03-19 16:29:16 +010022 - korapxml2conllu option -e <regex> added to extract element/attributes to comments
Marc Kupietz0ab8a2c2021-03-19 16:21:00 +010023
Marc Kupietz22858f82021-02-15 14:22:05 +0100240.3 2021-02-15
Marc Kupietz79ba1e52021-02-12 17:26:54 +010025 - Provide conllu2korapxml to convert from ConLL-U to KorAP-XML zip
26
Marc Kupietzb96c3862021-02-12 08:33:44 +0100270.2 2021-02-12
Marc Kupietzd8455832021-02-11 17:30:29 +010028 - Convert also KorAP-XML base zips
29
Marc Kupietz396b4d62021-02-12 08:29:35 +0100300.1 2020-09-23
31 - Initial release to GitHub.