The KorAP Validator iterates over the given directory(s) and validates data in the internal KorAP format regarding both the XML format and the content (to a certain extent). It is intended to be applied in order to validate our test data set. Continue reading
Our setup is a collection of index structures1 that are physically distributed across multiple machines (worker nodes). Their purpose is to allow fast querying of different segmentations (tokenization, sentence boundaries etc.) and annotations (e.g. part-of-speech tags, dependencies, syntactic constituents) on arbitrary document collections (corpora). Conversely, this implies that the union of all the distributed indexes sums up to the complete corpus collection2. Continue reading
We have submitted two papers for LREC 2012! “The New IDS Corpus Analysis Platform: Challenges and Prospects” and “Evaluating Query Languages for a Corpus Processing System”.