Add references Change-Id: Ib3a5379a5211bcf4ff960b705598ed8804c36d93

commit: 79ec99585d453870097b94636f208e74de9182f2 [log] [tgz]
author: Akron <nils@diewald-online.de> Wed May 25 11:59:39 2022 +0200
committer: Akron <nils@diewald-online.de> Wed May 25 11:59:39 2022 +0200
tree: 80c151a94b79e1791d09e712d5199e7231ccdd41
parent: a44944df913511c0b695f8a907ee836160f78fb5 [diff]
diff --git a/Readme.md b/Readme.md
index 353bc36..dd580de 100644
--- a/Readme.md
+++ b/Readme.md

@@ -8,9 +8,26 @@
 fast natural language tokenization, based on a finite state
 transducer generated with [Foma](https://fomafst.github.io/).
 
+## References
+
+Please cite this work as:
+
+> Diewald, N. (2022): *Matrix and Double-Array Representations
+> for Efficient Finite State Tokenization*. In: Proceedings of the
+> 10th Workshop on Challenges in the Management of Large Corpora
+> (CMLC-10) at LREC 2022. Marseille, France.
+
 The library contains sources for a german tokenizer
 based on [KorAP-Tokenizer](https://github.com/KorAP/KorAP-Tokenizer).
 
+For speed and quality analysis in comparison to other tokenizers for German,
+please refer to this article:
+
+> Diewald, N./Kupietz, M./Lüngen, H. (2022): *Tokenizing on scale -
+> Preprocessing large text corpora on the lexical and sentence level*.
+> In: Proceedings of EURALEX 2022. Mannheim, Germany.
+
+The benchmarks can be reproduced using [this test suite](https://github.com/KorAP/Tokenizer-Evaluation).
 
 ## Tokenization
commit	79ec99585d453870097b94636f208e74de9182f2	[log] [tgz]
author	Akron <nils@diewald-online.de>	Wed May 25 11:59:39 2022 +0200
committer	Akron <nils@diewald-online.de>	Wed May 25 11:59:39 2022 +0200
tree	80c151a94b79e1791d09e712d5199e7231ccdd41
parent	a44944df913511c0b695f8a907ee836160f78fb5 [diff]