commit | ea969500fb5631603296a6d95638363394a85a28 | [log] [tgz] |
---|---|---|
author | Nils Diewald <nils@diewald-online.de> | Mon Feb 16 21:10:54 2015 +0000 |
committer | Nils Diewald <nils@diewald-online.de> | Mon Feb 16 21:10:54 2015 +0000 |
tree | 4dc432b8a5501bf3a47207772e845cc976641c10 | |
parent | 641b20ad6d08e9526c5184c1a610eded6b94d017 [diff] |
Rename and refactor KorapFilter (1)
A Corpusdata Retrieval Index using Lucene for Look-Ups
... TODO:
Adding data (JSON via server) Querying data (KoralQuery) Show results (JSON)
Krill is a Lucene based search engine for large annotated corpora, developed at the Institute for German Language (IDS) in Mannheim, Germany.
The software is in its early stages and not stable yet
Krill is the reference implementation for KoralQuery, covering most of the protocols features, including ...
Fulltext search
"Find all occurrences of the phrase 'sea monster'!"
"Find all case-insensitive words matching the regular expression /krak.*/"
Token-based annotation search
"Find all plural nouns in accusative!"
Span-based annotation search
"Find all nominal phrases!"
Distance search
...
Positional search
...
Nested queries
...
and many more ...
Multiple annotation resources; Virtual Collections; Partial highlightings; Support for overlapping spans; Relational queries; Hierarchical queries ...
At least Java 7, Git, Maven. Further dependencies are resolved using Maven.
$ git clone https://github.com/KorAP/Krill $ cd Krill
To run the test suite, type in ...
$ mvn test
To start the server, type in ...
$ mvn compile exec:java
To compile and run the indexer, type ...
$ mvn compile assembly:single
$ java -jar target/KorAP-krill-X.XX.jar src/main/resources/korap.conf src/test/resources/examples/
Krill operates on tokens and is limited to a single tokenization stream. Token annotations therefore have to rely on that tokenization, Span annotations have to wrap at least one token. Punctuations are currently not supported. The order of results is currently bound to the order of documents in the index, but this is likely to change.
Authors: Nils Diewald, Eliza Margaretha
Copyright 2013-2015, IDS Mannheim, Germany
Krill is developed as part of the KorAP Corpus Analysis Platform at the Institute for German Language (IDS).
For recent changes and compatibility issues, please consult the Changes file.
Contributions to Krill are very welcome! Before contribution, please reformat your code according to the korap style guideline, provided by means of an Eclipse style sheet. You can either reformat using Eclipse or using Maven with the command
$ mvn java-formatter:format
Krill is published under the BSD-2 License.
To cite this work, please ...
Named entities annotated in the test data by CoreNLP were using models based on:
Manaal Faruqui and Sebastian Padó (2010): Training and Evaluating a German Named Entity Recognizer with Semantic Generalization, Proceedings of KONVENS 2010, Saarbrücken, Germany