Akron | e3e0536 | 2020-06-16 17:19:09 +0200 | [diff] [blame] | 1 | 0.41 2020-06-16 |
Akron | 07e2477 | 2020-04-23 14:00:54 +0200 | [diff] [blame] | 2 | - Added support for RWK annotations. |
Akron | 1cdbc9d | 2020-05-07 15:28:54 +0200 | [diff] [blame] | 3 | - Improved DGD support. |
Akron | e3e0536 | 2020-06-16 17:19:09 +0200 | [diff] [blame] | 4 | - Fixed bug in RWK support that broke on |
5 | some KorAP-XML files. | ||||
Akron | 07e2477 | 2020-04-23 14:00:54 +0200 | [diff] [blame] | 6 | |
Akron | dec4312 | 2020-03-03 11:22:25 +0100 | [diff] [blame] | 7 | 0.40 2020-03-03 |
Akron | a0d5af3 | 2020-03-01 12:46:30 +0100 | [diff] [blame] | 8 | - Fixed XIP parser. |
Akron | b62d92a | 2020-03-01 16:32:00 +0100 | [diff] [blame] | 9 | - Added example corpus of the |
10 | Redewiedergabe-Korpus. | ||||
11 | - Fixed span offset bug. | ||||
12 | - Fixed milestones behind the last | ||||
13 | token bug. | ||||
Akron | dec4312 | 2020-03-03 11:22:25 +0100 | [diff] [blame] | 14 | - Fixed gap behind last token bug. |
15 | - Fixed <base/s:t> length. | ||||
Akron | a0d5af3 | 2020-03-01 12:46:30 +0100 | [diff] [blame] | 16 | |
Akron | 6e886f7 | 2020-02-19 07:42:32 +0100 | [diff] [blame] | 17 | 0.39 2020-02-19 |
Akron | 7d5e638 | 2019-08-08 16:36:27 +0200 | [diff] [blame] | 18 | - Added Talismane support. |
Akron | 0d68a4b | 2019-11-13 15:42:11 +0100 | [diff] [blame] | 19 | - Added "distributor" field to I5 metadata. |
Akron | 2029455 | 2019-11-29 16:15:35 +0100 | [diff] [blame] | 20 | - Added DGD link field to I5 metadata. |
Akron | b05b842 | 2019-12-11 13:47:57 +0100 | [diff] [blame] | 21 | - Improve logging. |
Akron | c29b8e1 | 2019-12-16 14:28:09 +0100 | [diff] [blame] | 22 | - Added support for DGD pseudo-sentences |
23 | based on anchor milestones. | ||||
Akron | 8f69d63 | 2020-01-15 16:58:11 +0100 | [diff] [blame] | 24 | - Added brief explanation of the format. |
Akron | d4c5c10 | 2020-02-11 11:47:59 +0100 | [diff] [blame] | 25 | - Fixed parsing of editionStmt. |
26 | - Added documentation for supported I5 metadata | ||||
27 | fields. | ||||
Akron | 6e886f7 | 2020-02-19 07:42:32 +0100 | [diff] [blame] | 28 | - Added integrated benchmark mechanism. |
Akron | 7d5e638 | 2019-08-08 16:36:27 +0200 | [diff] [blame] | 29 | |
Akron | 57510c1 | 2019-01-04 14:58:53 +0100 | [diff] [blame] | 30 | 0.38 2019-05-22 |
Akron | 9b04f60 | 2019-03-08 18:45:35 +0100 | [diff] [blame] | 31 | - Stop file processing when base tokenization |
32 | is wrong. | ||||
Akron | 57510c1 | 2019-01-04 14:58:53 +0100 | [diff] [blame] | 33 | - Added DGD support. |
Akron | 9b04f60 | 2019-03-08 18:45:35 +0100 | [diff] [blame] | 34 | |
Akron | eaffe93 | 2019-03-07 17:14:42 +0100 | [diff] [blame] | 35 | 0.37 2019-03-06 |
Akron | 263274c | 2019-02-07 09:48:30 +0100 | [diff] [blame] | 36 | - Support for 'koral:field' array. |
37 | - Support for Koral versioning. | ||||
Akron | 4e1712c | 2019-02-04 22:29:37 +0100 | [diff] [blame] | 38 | - Added tests for english sources. |
Akron | 6bf3cc9 | 2019-02-07 12:11:20 +0100 | [diff] [blame] | 39 | - Added support for external links for |
40 | Wikipedia resources. | ||||
Akron | 63d03ee | 2019-02-13 18:49:38 +0100 | [diff] [blame] | 41 | - Ignore temporary extraction |
42 | on directory archiving. | ||||
Akron | 955b75b | 2019-02-21 14:28:41 +0100 | [diff] [blame] | 43 | - Remove extract_text and extract_doc in |
44 | favor of extract_sigle for archives. | ||||
Akron | 263274c | 2019-02-07 09:48:30 +0100 | [diff] [blame] | 45 | |
Akron | ed9baf0 | 2019-01-22 17:03:25 +0100 | [diff] [blame] | 46 | 0.36 2019-01-22 |
47 | - Support for non-word tokens (fixes #5). | ||||
48 | |||||
Akron | 6eff23b | 2018-09-24 10:31:20 +0200 | [diff] [blame] | 49 | 0.35 2018-09-24 |
50 | - Lift minimum version of Perl to 5.16 as for | ||||
51 | "fc"-feature. | ||||
52 | |||||
Akron | dd1c0f1 | 2018-07-19 06:45:28 +0200 | [diff] [blame] | 53 | 0.34 2018-07-19 |
54 | - Preliminary support for HNC. | ||||
55 | |||||
Akron | 28dc17f | 2018-02-01 15:31:41 +0100 | [diff] [blame] | 56 | 0.33 2018-02-01 |
Akron | 4c67919 | 2018-01-16 17:41:49 +0100 | [diff] [blame] | 57 | - Added LWC support. |
Akron | 28dc17f | 2018-02-01 15:31:41 +0100 | [diff] [blame] | 58 | - Fixed TreeTagger certainties. |
Akron | 4c67919 | 2018-01-16 17:41:49 +0100 | [diff] [blame] | 59 | |
Akron | 3c56f50 | 2017-10-24 15:37:27 +0200 | [diff] [blame] | 60 | 0.32 2017-10-24 |
Akron | 9a062ce | 2017-07-04 19:12:05 +0200 | [diff] [blame] | 61 | - Fixed tar building process in script. |
Akron | 3c56f50 | 2017-10-24 15:37:27 +0200 | [diff] [blame] | 62 | - Support file extensions in base tokenization parameter. |
Akron | 9a062ce | 2017-07-04 19:12:05 +0200 | [diff] [blame] | 63 | |
Akron | 0a6cce1 | 2017-06-30 23:03:21 +0200 | [diff] [blame] | 64 | 0.31 2017-06-30 |
Akron | 3abc03e | 2017-06-29 16:23:35 +0200 | [diff] [blame] | 65 | - Fixed exit codes in script. |
Akron | 0a6cce1 | 2017-06-30 23:03:21 +0200 | [diff] [blame] | 66 | - Use CORE::fc for case folding. |
Akron | 3abc03e | 2017-06-29 16:23:35 +0200 | [diff] [blame] | 67 | |
Akron | d5bb434 | 2017-06-19 11:50:49 +0200 | [diff] [blame] | 68 | 0.30 2017-06-19 |
69 | - Fixed permission handling in test suite. | ||||
Akron | ce125b6 | 2017-06-19 11:54:36 +0200 | [diff] [blame] | 70 | - Added preliminary CMC support. |
Akron | d5bb434 | 2017-06-19 11:50:49 +0200 | [diff] [blame] | 71 | |
Akron | da3097e | 2017-04-23 19:53:57 +0200 | [diff] [blame] | 72 | 0.29 2017-04-23 |
73 | - support --to-tar flag. | ||||
74 | |||||
Akron | 9ec8887 | 2017-04-12 16:29:06 +0200 | [diff] [blame] | 75 | 0.28 2017-04-12 |
Akron | 86db52e | 2017-04-11 20:36:43 +0200 | [diff] [blame] | 76 | - Improved overwriting behaviour for unzip. |
Akron | 9ec8887 | 2017-04-12 16:29:06 +0200 | [diff] [blame] | 77 | - Introduced --sequential-extraction flag. |
Akron | 86db52e | 2017-04-11 20:36:43 +0200 | [diff] [blame] | 78 | |
Akron | 63f20d4 | 2017-04-10 23:40:29 +0200 | [diff] [blame] | 79 | 0.27 2017-04-10 |
Akron | 636aa11 | 2017-04-07 18:48:56 +0200 | [diff] [blame] | 80 | - Support configuration files. |
Akron | 8150010 | 2017-04-07 20:45:44 +0200 | [diff] [blame] | 81 | - Support temporary extraction. |
Akron | 63f20d4 | 2017-04-10 23:40:29 +0200 | [diff] [blame] | 82 | - Support serial conversion. |
83 | - Support input-base. | ||||
Akron | 636aa11 | 2017-04-07 18:48:56 +0200 | [diff] [blame] | 84 | |
85 | 0.26 2017-04-06 | ||||
86 | - Support wildcards on input. | ||||
87 | |||||
Akron | 5809fea | 2017-03-14 20:02:26 +0100 | [diff] [blame] | 88 | 0.25 2017-03-14 |
Akron | 7e2eb88 | 2017-01-18 17:28:07 +0100 | [diff] [blame] | 89 | - Updated to Mojolicious 7.20 |
90 | - Fixed meta treatment in case analytic and monogr | ||||
91 | are available | ||||
Akron | 4fa37c3 | 2017-01-20 14:43:10 +0100 | [diff] [blame] | 92 | - Added DRuKoLa support to script |
Akron | 3887301 | 2017-02-06 20:27:37 +0100 | [diff] [blame] | 93 | - Liberated document and text sigle handling to be |
94 | compliant with CoRoLa. | ||||
Akron | 41ac10b | 2017-02-08 22:47:25 +0100 | [diff] [blame] | 95 | - Added support for pagebreak annotations. |
Akron | 08d5445 | 2017-02-16 23:19:49 +0100 | [diff] [blame] | 96 | - Renamed "pages" to "srcPages". |
Akron | 60a8caa | 2017-02-17 21:51:27 +0100 | [diff] [blame] | 97 | - Fixed handling of prefixes for text sigles. |
Akron | 3bd942f | 2017-02-20 20:09:14 +0100 | [diff] [blame] | 98 | - Support for MarMoT. |
Akron | 5809fea | 2017-03-14 20:02:26 +0100 | [diff] [blame] | 99 | - Fix case insensitivity. |
Akron | 55778f0 | 2017-03-14 20:47:26 +0100 | [diff] [blame] | 100 | - Added preliminary support for diacritic insensitivity. |
Akron | 3ec0a1c | 2017-01-18 14:41:55 +0100 | [diff] [blame] | 101 | |
Akron | 3741f8b | 2016-12-21 19:55:21 +0100 | [diff] [blame] | 102 | 0.24 2016-12-21 |
103 | - Added --base-sentences and --base-paragraphs options | ||||
104 | |||||
Akron | 6f9fef5 | 2016-11-03 17:06:40 +0100 | [diff] [blame] | 105 | 0.23 2016-11-03 |
Akron | 2fd402b | 2016-10-27 21:26:48 +0200 | [diff] [blame] | 106 | - Added wildcard support for document extraction |
Akron | 2812ba2 | 2016-10-28 21:55:59 +0200 | [diff] [blame] | 107 | - Fixed archive iteration to not duplicate the first archive |
108 | - Added parallel extraction for document sigles | ||||
Akron | 13d5662 | 2016-10-31 14:54:49 +0100 | [diff] [blame] | 109 | - Improved return value for existing files |
Akron | 3741f8b | 2016-12-21 19:55:21 +0100 | [diff] [blame] | 110 | - Don't warn on recursion in CoreNLP/Constituency |
Akron | 2fd402b | 2016-10-27 21:26:48 +0200 | [diff] [blame] | 111 | |
Akron | 2080758 | 2016-10-26 17:11:34 +0200 | [diff] [blame] | 112 | 0.22 2016-10-26 |
113 | - Added support for document extraction | ||||
Akron | b4bbec7 | 2016-10-26 20:21:02 +0200 | [diff] [blame] | 114 | - Fixed archive naming |
Akron | 2080758 | 2016-10-26 17:11:34 +0200 | [diff] [blame] | 115 | |
Akron | b4bbec7 | 2016-10-26 20:21:02 +0200 | [diff] [blame] | 116 | 0.21 2016-10-24 |
Nils Diewald | b3e9ccd | 2016-10-24 15:16:52 +0200 | [diff] [blame] | 117 | - Improved Windows support |
118 | |||||
Akron | 4c0cf31 | 2016-10-15 16:42:09 +0200 | [diff] [blame] | 119 | 0.20 2016-10-15 |
120 | - Fixed treatment of temporary folders in script | ||||
121 | |||||
Akron | bdb6465 | 2016-08-17 23:30:01 +0200 | [diff] [blame] | 122 | 0.19 2016-08-17 |
Akron | 92ad95b | 2016-08-15 23:38:56 +0200 | [diff] [blame] | 123 | - Added test for direct I5 support. |
124 | - Fixed support for Mojolicious 7. | ||||
125 | - Added script test. | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 126 | - Fixed setting multiple annotations in |
127 | script. | ||||
Akron | e2b902d | 2016-08-16 16:50:11 +0200 | [diff] [blame] | 128 | - Fixed output of version and help messages. |
Akron | 7d4cdd8 | 2016-08-17 21:39:45 +0200 | [diff] [blame] | 129 | - Added script test for extraction. |
Akron | 651cb8d | 2016-08-16 21:44:49 +0200 | [diff] [blame] | 130 | - Fixed extraction with multiple archives and prefix |
131 | negation support. | ||||
Akron | 7d4cdd8 | 2016-08-17 21:39:45 +0200 | [diff] [blame] | 132 | - Added script test for archives. |
Akron | 1924bbe | 2016-06-22 16:05:41 +0200 | [diff] [blame] | 133 | |
Akron | bdb6465 | 2016-08-17 23:30:01 +0200 | [diff] [blame] | 134 | 0.18 2016-07-08 |
135 | - Added REI test. | ||||
136 | - Added multiple archive support to korapxml2krill. | ||||
137 | - Added support for prefix negation in korapxml2krill. | ||||
138 | - Added support for Malt#Dependency. | ||||
139 | - Improved test suite for caching and REI. | ||||
140 | - Added support for MDParser annotation. | ||||
141 | - Added batch processing class for documents. | ||||
142 | |||||
Akron | 1cd5b87 | 2016-03-22 00:23:46 +0100 | [diff] [blame] | 143 | 0.17 2016-03-22 |
144 | - Rewrite siglen to use slashes as separators. | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 145 | - Zip listing optimized. Does no longer work with primary data |
146 | in text.xml files. | ||||
Akron | 1cd5b87 | 2016-03-22 00:23:46 +0100 | [diff] [blame] | 147 | |
Akron | 11c8030 | 2016-03-18 19:44:43 +0100 | [diff] [blame] | 148 | 0.16 2016-03-18 |
149 | - Added caching mechanism for | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 150 | metadata. |
Akron | 11c8030 | 2016-03-18 19:44:43 +0100 | [diff] [blame] | 151 | |
Akron | 35db6e3 | 2016-03-17 22:42:22 +0100 | [diff] [blame] | 152 | 0.15 2016-03-17 |
153 | - Modularized metadata handling. | ||||
154 | - Simplified metadata handling. | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 155 | - Added --meta option to script. |
156 | - Removed deprecated --human option from script. | ||||
Akron | 35db6e3 | 2016-03-17 22:42:22 +0100 | [diff] [blame] | 157 | |
Akron | c13a170 | 2016-03-15 19:33:14 +0100 | [diff] [blame] | 158 | 0.14 2016-03-15 |
Akron | 151676d | 2016-03-14 20:12:14 +0100 | [diff] [blame] | 159 | - Renamed ::Index to ::Annotate and ::Field to ::Index. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 160 | - Renamed 'allow' to 'anno' as parameters of the script. |
161 | - Added readme. | ||||
Akron | 151676d | 2016-03-14 20:12:14 +0100 | [diff] [blame] | 162 | |
Akron | 5b25431 | 2016-03-10 00:29:56 +0100 | [diff] [blame] | 163 | 0.13 2016-03-10 |
Akron | 44feb4e | 2016-03-02 12:45:47 +0100 | [diff] [blame] | 164 | - Removed korapxml2krill_dir. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 165 | - Renamed dependency nodes. |
166 | - Made dependency relations more effective (trimmed down TUIs) | ||||
167 | ! This is currently very slow ! | ||||
Akron | 44feb4e | 2016-03-02 12:45:47 +0100 | [diff] [blame] | 168 | |
Akron | dc898d8 | 2016-02-28 23:49:19 +0100 | [diff] [blame] | 169 | 0.12 2016-02-28 |
Akron | e10ad32 | 2016-02-27 10:54:26 +0100 | [diff] [blame] | 170 | - Added extract method to korapxml2krill. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 171 | - Fixed Mate/Dependency. |
172 | - Fixed skip flag in korapxml2krill. | ||||
173 | - Ignore spans outside the token range | ||||
174 | (i.e. character offsets end before tokens have started). | ||||
Akron | e10ad32 | 2016-02-27 10:54:26 +0100 | [diff] [blame] | 175 | |
Akron | 941c1a6 | 2016-02-23 17:41:41 +0100 | [diff] [blame] | 176 | 0.11 2016-02-23 |
Akron | 44feb4e | 2016-03-02 12:45:47 +0100 | [diff] [blame] | 177 | - Merged korapxml2krill and korapxml2krill_dir. |
Akron | 941c1a6 | 2016-02-23 17:41:41 +0100 | [diff] [blame] | 178 | |
Akron | 96165ad | 2016-02-15 18:09:41 +0100 | [diff] [blame] | 179 | 0.10 2016-02-15 |
180 | - Added EXPERIMENTAL support for parallel jobs. | ||||
181 | |||||
Akron | c1babed | 2016-02-15 11:48:18 +0100 | [diff] [blame] | 182 | 0.09 2016-02-15 |
183 | - Fixed temporary directory handling in scripts. | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 184 | - Improved skipping for archive handling in scripts. |
Akron | c1babed | 2016-02-15 11:48:18 +0100 | [diff] [blame] | 185 | |
Akron | 150b29e | 2016-02-14 23:06:48 +0100 | [diff] [blame] | 186 | 0.08 2016-02-14 |
187 | - Added support for archive streaming. | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 188 | - Improved scripts. |
Akron | 150b29e | 2016-02-14 23:06:48 +0100 | [diff] [blame] | 189 | |
Akron | 8c84aa5 | 2016-02-13 21:26:54 +0100 | [diff] [blame] | 190 | 0.07 2016-02-13 |
191 | - Improved support for Schreibgebrauch meta data | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 192 | (IDS flavour). |
Akron | 8c84aa5 | 2016-02-13 21:26:54 +0100 | [diff] [blame] | 193 | |
194 | 0.06 2016-02-11 | ||||
Akron | 49a4765 | 2016-02-12 18:17:19 +0100 | [diff] [blame] | 195 | - Improved support for Schreibgebrauch meta data |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 196 | (Duden flavour). |
Akron | 49a4765 | 2016-02-12 18:17:19 +0100 | [diff] [blame] | 197 | |
Akron | 93d620e | 2016-02-05 19:40:05 +0100 | [diff] [blame] | 198 | 0.05 2016-02-04 |
Akron | e4c2e41 | 2016-01-28 15:10:50 +0100 | [diff] [blame] | 199 | - Changed KorAP::Document to KorAP::XML::Krill. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 200 | - Renamed "Schreibgebrauch" to "Sgbr". |
201 | - Preparation for GitHub release. | ||||
Akron | e4c2e41 | 2016-01-28 15:10:50 +0100 | [diff] [blame] | 202 | |
Akron | 9c0488f | 2016-01-28 14:17:15 +0100 | [diff] [blame] | 203 | 0.04 2016-01-28 |
Akron | 69a4a2f | 2016-01-17 12:55:50 +0100 | [diff] [blame] | 204 | - Added PTI to all payloads. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 205 | - Added support for empty elements. |
206 | - Added support for element attributes in struct. | ||||
207 | - Added meta data support for Schreibgebrauch. | ||||
208 | - Fixed test suite for meta data. | ||||
Akron | 69a4a2f | 2016-01-17 12:55:50 +0100 | [diff] [blame] | 209 | |
210 | 0.03 2014-11-03 | ||||
Nils Diewald | 7867467 | 2014-11-03 21:43:12 +0000 | [diff] [blame] | 211 | - Added new metadata scheme. |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 212 | - Fixed a minor bug in the constituency tree building. |
213 | - Sorted terms in tokens a priori. | ||||
Nils Diewald | 7867467 | 2014-11-03 21:43:12 +0000 | [diff] [blame] | 214 | |
Akron | 69a4a2f | 2016-01-17 12:55:50 +0100 | [diff] [blame] | 215 | 0.02 2014-07-21 |
Nils Diewald | f03c680 | 2014-07-21 16:39:44 +0000 | [diff] [blame] | 216 | - Sentence annotations for all providing foundries |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 217 | - Starting subtokenization |
Nils Diewald | f03c680 | 2014-07-21 16:39:44 +0000 | [diff] [blame] | 218 | |
Akron | 69a4a2f | 2016-01-17 12:55:50 +0100 | [diff] [blame] | 219 | 0.01 2014-04-15 |
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 220 | - [bugfix] for first token annotations |
Nils Diewald | 7b84722 | 2014-04-23 11:14:00 +0000 | [diff] [blame] | 221 | - Sentences are now available from all foundries that have it |
222 | - <>:p is now <>:base/para | ||||
Akron | 5f51d42 | 2016-08-16 16:26:43 +0200 | [diff] [blame] | 223 | - Added <>:base/text |