Log - 1be7b13f10dc2d7eca5e3e551cb75e96d3ea4808 - KorAP/NKJP2KorAP

1be7b13 add SuperExpress with a known offset-related issue, for investigation; requires Xmx set to "8g" by Piotr Banski · 3 years, 6 months ago
ad4b7ce modified to preserve the analogy to ginkgo by Piotr Banski · 3 years, 8 months ago
c4ec608 maintenance commit for myself, to pull this from the laptop by Piotr Banski · 3 years, 9 months ago
381e0c0 handle cases where no NE information is present in an NKJP NE document (don't produce named.xml then) by Piotr Banski · 3 years, 9 months ago
b28e588 catalog fixed. script prepared for processing, morpho files have some new data now (from the new NKJP version) by Piotr Banski · 3 years, 9 months ago
dea799a new sample, NKJP-SGJP, 7 texts as before, state from 29-05-2022 by Piotr Banski · 3 years, 9 months ago
60d3277 updated for the new sample and its new bug by Piotr Banski · 3 years, 9 months ago
a78e59d new dataset, up to NE by Piotr Banski · 3 years, 9 months ago
a44cd7a produce morpho.xml with additional information: translit as 'orig' and all the morphologically possible values by Piotr Banski · 3 years, 9 months ago
d2b78b8 cleaned up, before adding more info to morpho by Piotr Banski · 3 years, 9 months ago
ba6cc63 successfully switched to ann_morpho for morpho.xml; next step: cleanup + more info in morpho descriptions by Piotr Banski · 3 years, 9 months ago
43b9db0 successfully switched to ann_morpho for structure.xml, with some lingering questions asked via slack by Piotr Banski · 3 years, 9 months ago
081c5de successfully switched to ann_morpho for data.xml by Piotr Banski · 3 years, 9 months ago
a0a9fc0 hopefully final refactoring before the switch by Piotr Banski · 3 years, 9 months ago
faa910f more refactoring, for readability by Piotr Banski · 3 years, 9 months ago
f959069 commiting the state after minor refactoring of parameter/variable names, still before eliminating ann_segmentation by Piotr Banski · 3 years, 9 months ago
763b41f recording the state before the transfer to use ann_morphosyntax as the basis (due to the manual corrections present there) by Piotr Banski · 3 years, 9 months ago
65a6d0b produce an initial version of named.xml, with just placeholders but also with properly computed offsets (walking ann_morphosyntactic) by Piotr Banski · 3 years, 9 months ago
e1ac520 revert; the intended lookup from old ID to the new index across the entire tree would require a lot of magic and wouldn't be efficient at all by Piotr Banski · 3 years, 9 months ago
9397ca5 extend the indexing accumulator to be a mapping from NKJP index onto the accumulator-generated traverse value that gets turned into the KorAP index; this way we get free lookup of a sort, at least for cases where all we need is an old->new mapping by Piotr Banski · 3 years, 9 months ago
06520d3 fix @l to be optional by Piotr Banski · 3 years, 9 months ago
ad3581f further optimization by Piotr Banski · 3 years, 9 months ago
8d2609a optimised tei:seg (total net time cut by half) by Piotr Banski · 3 years, 9 months ago
c5950ce added placeholders for handling more layers of annotation; tei:seg in 'morpho' mode needs some streamlining (the profiler suggests) by Piotr Banski · 3 years, 9 months ago
92791a2 Eliminated spurious whitespace (which probably didn't hurt, but...) by Piotr Banski · 3 years, 9 months ago
1ae16bd this is the version used to derive the entire dataset; it doesn't need any parameters and is set to output comments in structure.xml, just in case we need them for debugging by Piotr Banski · 3 years, 10 months ago
6d7f492 sample data for ingest by Piotr Banski · 3 years, 10 months ago
a51907c minimal goal reached: data should be now ingestible; there is a niche of inefficiency though that may prevent my desktop from processing the entire dataset (other than KOT) by Piotr Banski · 3 years, 10 months ago
09096ee many fixes, structure.xml all done, retaining comments with the surface forms for now by Piotr Banski · 3 years, 10 months ago
fdc858a I think it's done. Will clean it tomorrow and extend to morpho and beyond. My goodness. by Piotr Banski · 3 years, 10 months ago
69f3c5f this is just for demonstration that the offsets are now done in the KorAP way by Piotr Banski · 3 years, 10 months ago
5fe4bae far from perfect, but the road is straight now; adding the structure.xml doc, a bit schizophrenic in the indexing by Piotr Banski · 3 years, 10 months ago
6a4a252 this version attempts to re-traverse the tree over 6k times per single output document with structure in it, and I can't seem to be able to help that. It does the necessary calculation perfectly, but, naturally, in doing so it crashes my desktop by Piotr Banski · 3 years, 10 months ago
f8af3a9 just saving the next step by Piotr Banski · 3 years, 10 months ago
4f4c2d2 one step further and I just want to save it by Piotr Banski · 3 years, 10 months ago
9dc1000 begin the switch from text.xml to ann_segmentation.xml; for now, data.xml is properly created (whitespace and tokenization alternatives). A lot of code cleanup has not yet happened. by bansp · 3 years, 10 months ago
d1bf1db before migration from calc_content_length to calc_offsets by bansp · 4 years ago
a8e5cf1 forgot to save, already pushed the previous one, sorry by bansp · 4 years ago
b599253 this is a safety commit, before I take some stuff apart by bansp · 4 years ago
f2b24e6 add ability to skip some document IDs as a comma-separated parameter by bansp · 4 years ago
b8b38e7 new data for ingestion by bansp · 4 years ago
e726b4a stylesheet redone for handling larger datasets; just struct and morpho for now, though by bansp · 4 years ago
8f6700b initial modification that I need to commit by bansp · 4 years ago
24f1b2f fix a bug in the SGJP inclusions; modify the catalog to resolve the inclusion properly by bansp · 4 years ago
97ba7ce update the I5 DTD identifiers, just in case by bansp · 4 years ago
df88424 add a new sample, from a more recent version of NKJP (SGJP, 20220320) by bansp · 4 years ago
1e70945 add I5 schemas and adjust the catalog by bansp · 4 years ago
4da8b06 separate directory for XCES schemas, just in case by bansp · 4 years ago
ba37fb9 update gitignore by bansp · 4 years ago
9103aab attempt to add to the headers (they are black boxes) by bansp · 4 years ago
3e5b20c fix structure.xml, create morpho.xml by bansp · 4 years ago
5f84173 derive structure.xml; the script isn't optimized yet but I would like to submit the output for a check by bansp · 4 years ago
102886a add target for span.rng Change-Id: Ia0b5154616aebdb4468e4838a809936dbdcf34cd by bansp · 4 years ago
54585bb make span valid again, Donald by bansp · 4 years ago
608b102 fix textSigle by bansp · 4 years ago
5e2d1c0 first touch: make sure that I can grab at the data and send it where I want it to go by bansp · 4 years ago
7c373ab apologies, these are just for reference, so that I don't clutter the main stylesheet with comments by bansp · 4 years ago
9c08fc0 crucial addition to makeNKJP validation succeed by bansp · 4 years ago
db92a87 placeholders for foundry content by bansp · 4 years ago
c3cdcb9 add schemas and catalog by bansp · 4 years ago
68529a8 proof of concept that I'm reaching for the right info by bansp · 4 years ago
fcce502 matching data.xml (yes, I do see why rebase is useful ;-) ) by bansp · 4 years ago
0748682 correct empty namespace nodes in the output by bansp · 4 years ago
f79443e version working in oXygen, data.xml should be identical modulo whitespace; missing references: metadata.xml, text.rng by bansp · 4 years ago
8e5a078 metadata placeholders by bansp · 4 years, 1 month ago
4059e5f Update the test suite to reflect the new naming convention by Akron · 4 years, 1 month ago
df0e01a Rename structure by Akron · 4 years, 1 month ago
66313a7 Merge "Added primary data file as conversion target" by Marc Kupietz · 4 years, 1 month ago
9a8ee3e Added initial conversion script and example xspec by Akron · 4 years, 1 month ago
57f588c Added primary data file as conversion target by Akron · 4 years, 1 month ago
e4b232d Directory structure: move sample data to test resources by Marc Kupietz · 4 years, 1 month ago
d33c60d Delete test file by Marc Kupietz · 4 years, 1 month ago
89363b2 added a brief readme to explain what's going on by bansp · 4 years, 4 months ago
99a1df4 make it possible for symbol/value="" to pass validation -- the empty strings in f[@name eq 'msd']/symbol are a regular "feature" of the morphosyntactic description in this sample by bansp · 4 years, 4 months ago
ace5612 Trimming down to sample #76 for the purpose of testing the projected NKJP-to-KorAP converter; adjustments for validation and xinclude. ann_morphosyntax is going to need a separate schema adjustment. by bansp · 4 years, 4 months ago
1e773ad fix validation Change-Id: I89819be341854de9a4379217e814e200afe03366 by bansp · 4 years, 4 months ago
973a7ef add an excerpt from NKJP 1M v 1.2: infrastructure files in the root and the KOT subcorpus by bansp · 4 years, 4 months ago
25b0fc1 test by bansp · 4 years, 4 months ago
18449f7 Add Readme.md template and license by Marc Kupietz · 4 years, 4 months ago