Add reference to DeReKo
Change-Id: Ibabe26c92cd2443fbcd32cfefbffdbb3cf94e735
diff --git a/Readme.md b/Readme.md
index ee3a9c8..b5dfca6 100644
--- a/Readme.md
+++ b/Readme.md
@@ -5,7 +5,8 @@
data:image/s3,"s3://crabby-images/b598d/b598ddf4336438f0ba3e4477fff492fd322eeb51" alt="Introduction to Datok"
Implementation of a finite state automaton for
-high-performance natural language tokenization, based on a finite state
+high-performance large-scale natural language tokenization,
+based on a finite state
transducer generated with [Foma](https://fomafst.github.io/).
The library contains precompiled tokenizer models for
@@ -13,6 +14,10 @@
- [german](testdata/tokenizer_de.matok)
- [english](testdata/tokenizer_en.matok)
+The focus of development is on the tokenization of
+[DeReKo](https://www.ids-mannheim.de/digspra/kl/projekte/korpora),
+the german reference corpus.
+
## Performance
data:image/s3,"s3://crabby-images/d878e/d878e934c231f13b68f9dfb769a61fab1d8c5527" alt="Speed comparison of german tokenizers"
diff --git a/testdata/tokenizer_en.fst b/testdata/tokenizer_en.fst
index 011934a..ee312d2 100644
--- a/testdata/tokenizer_en.fst
+++ b/testdata/tokenizer_en.fst
Binary files differ