Added readme

commit: 04041eea4e5eae8cc2fbc258ad598bd935cf69c8 [log] [tgz]
author: Nils Diewald <nils@diewald-online.de> Thu Nov 20 21:02:26 2014 +0000
committer: Nils Diewald <nils@diewald-online.de> Thu Nov 20 21:02:26 2014 +0000
tree: fd5fe764a22273a2a28795ca9b51d2eada1691e8
parent: 38a9466f27d8c54aceba39b79d6f33b09e9697d8 [diff]
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..f7f9ce6
--- /dev/null
+++ b/README.md

@@ -0,0 +1,53 @@
+KorAP Lucene Index
+==================
+
+KorAP is available at
+https://korap.ids-mannheim.de/
+
+Limitations
+-----------
+
+### Tokenization
+
+The Lucene backend is not character but token based.
+In addition to that it only has support for one single tokenization.
+Although it supports multiple annotations on tokenizations, these
+annotations have to match the basic token's character offsets.
+
+Token annotations that do not match the basic tokenization are
+not indexed. Span annotations, that span a smaller range than one
+basic token, will not be indexed as well.
+
+Tokens are only indexed in case they are word tokens, i.e. not
+punctuations. This limitation is necessary to make distance query
+work on word levels.
+
+### Repetitions
+
+The maximum value for repetitions is 100.
+
+### Distances
+
+The maximum value for distance units is 100.
+
+Copyright
+---------
+
+Copyright 2014, IDS Mannheim, Germany
+Authors: Nils Diewald, Eliza Margaretha and contributors.
+
+Citation
+--------
+
+???
+
+Further References
+------------------
+
+Named entities annotated in the test data by CoreNLP was done using
+models based on:
+Manaal Faruqui and Sebastian Padó (2010):
+Training and Evaluating a German Named Entity
+Recognizer with Semantic Generalization,
+Proceedings of KONVENS 2010,
+Saarbrücken, Germany
commit	04041eea4e5eae8cc2fbc258ad598bd935cf69c8	[log] [tgz]
author	Nils Diewald <nils@diewald-online.de>	Thu Nov 20 21:02:26 2014 +0000
committer	Nils Diewald <nils@diewald-online.de>	Thu Nov 20 21:02:26 2014 +0000
tree	fd5fe764a22273a2a28795ca9b51d2eada1691e8
parent	38a9466f27d8c54aceba39b79d6f33b09e9697d8 [diff]