opProx feature (Cosmas 2)

Squashed commit consisting of
- verbosity can be switched on/off on command line.
- Prox: parsing %-w1 and %+w1 correctly.
- opPROX: correcting order of Prox options: WIP.
- beliebige Reihenfolge der Abstands-Optionen: WIP.
- Prox: beliebige Reihenfolge der Optionen: OK.
- Prox: beliebige Reihenfolge der Optionen: OK.
- opPROX: grammar should accept any order of prox. options: WIP.
- PROX: return exact error messasge about prox options.
- PROX: emit a meaningfull error message: wip.
- PROX: emit a meaningfull error message: WIP.
- write parsing error to AST.
- trying to write error message into an error node of the AST.
- PROX: Fehlermeldung in KoralQuery schreiben funktioniert.
- Prox...
- Error detection inside Prox done. Returning a precise error message through JSON: done.
- using addError() for error messages in PROX: WIP.
- Prox: reporting exact error messages: works.
- PROX: Tests with RecognitionExceptions removed. All Error Codes in StatusCodes.java.
- Prox: error messages for wrong prox. options.
- Prox: debug output deactivated.
- Prox: deleted debug output.
- Prox: Test added: WiP.
- Prox: 1 working tests added.
- Prox: 3 more tests added.

Change-Id: I8802becaf840660a1512281b3477762a422f8b4f
10 files changed
tree: 87c63d228f23af62e8285ba8f0c4a62e4617f4d5
  1. .github/
  2. misc/
  3. src/
  4. .gitignore
  5. Changes
  6. Format.xml
  7. LICENSE
  8. pom.xml
  9. README.md
README.md

Koral

Koral is a translator tool for converting different corpus query languages to KoralQuery, a JSON-LD-based protocol for the common representation of linguistic queries. KoralQuery specifications are described extensively in Bingel (2015). This work has been carried out within the KorAP project.

Koral supports the following corpus query languages (QLs):

Usage Example

You can use the main class QuerySerializer to translate and serialize queries. Valid QL identifiers are cosmas2, annis, poliqarp, poliqarpplus, cql, cqp, and fcsql.

import de.ids_mannheim.korap.query.serialize.QuerySerialzer;

QuerySerializer qs = new QuerySerializer();
String query = "contains(<s>,[orth=zu][pos=ADJA])";
qs.setQuery(query, "poliqarpplus");
System.out.println(qs.toJSON());

This will print out the following JSON-LD string for the Koralized query. The query asks for a sentence element (<s>) contained in a sequence of the surface form zu and a token with the part-of-speech tag ADJA. In the KoralQuery string, a containment relation is defined over two operands, an s span and a sequence of two tokens.

{
  "@context": "http://korap.ids-mannheim.de/ns/KoralQuery/v0.2/context.jsonld",
  "query": {
    "@type": "koral:group",
    "operation": "operation:position",
    "frames": [
      "frames:isAround"
    ],
    "operands": [
      {
        "@type": "koral:span",
        "key": "s"
      },
      {
        "@type": "koral:group",
        "operation": "operation:sequence",
        "operands": [
          {
            "@type": "koral:token",
            "wrap": {
              "@type": "koral:term",
              "layer": "orth",
              "key": "zu",
              "match": "match:eq"
            }
          },
          {
            "@type": "koral:token",
            "wrap": {
              "@type": "koral:term",
              "layer": "pos",
              "key": "ADJA",
              "match": "match:eq"
            }
          }
        ]
      }
    ]
  }
}

Motivation

Koral allows designing and implementating corpus query systems independent of any specific query languages. The systems only need to have Koral translate a query to a KoralQuery (see usage) and feed the translated query to their search engine. Several query languages can be supported without further adjustments to the search engine.

Koral and KoralQuery have been designed and developed within the KorAP Project, and are used in KorAP to translate queries to a common format before sending them to its search engine.

Setup

Setup is straightforward (Maven3 required):

git clone https://github.com/korap/Koral [install-dir]
cd [install-dir]
mvn test -Dhttps.protocols=TLSv1.2
mvn package

There is also a command line version. After setup, simply run

java -jar target/Koral-0.2.jar [query] [queryLanguage]

Build a Koral library and install it in your local Maven repository (needed for Kustvakt)

mvn install -Dhttps.protocols=TLSv1.2

To update an existing installation, pull the latest version at [install-dir]

git pull origin master

Afterwards, rerun the test suite and package or install the library.

Prerequisites

  • Java 11 (OpenJDK or Oracle JDK with JCE)
  • Git
  • At least Maven 3.2.1
  • Further dependencies are resolved by Maven.

Publications

J. Bingel, "Instantiation and implementation of a corpus query lingua franca," M.S. thesis, University of Heidelberg, Heidelberg, 2015.

J. Bingel and N. Diewald, "KoralQuery – a General Corpus Query Protocol," in Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, Vilnius, 2015, pp. 1-5.

Authorship

Koral and KoralQuery were developed by Joachim Bingel, Nils Diewald, Michael Hanl, Eliza Margaretha, and Franck Bodmer at the Leibniz Institute for the German Language (IDS), member of the Leibniz Association.

The CQP implementation was authored by Elena Irimia.

The ANTLR grammars for parsing ANNIS QL and COSMAS II QL were developed by Thomas Krause (HU Berlin) and Franck Bodmer (IDS Mannheim), respectively. Minor adaptations of those grammars were implemented by the Koral authors.

The authors wish to thank Piotr Bański, Elena Frick and Carsten Schnober for their valuable input.

License

Koral is published under the BSD-2 License. See also the attached LICENSE.

The ANNIS grammar is licensed under the Apache License 2.0.