Add reference to DeReKo
Change-Id: Ibabe26c92cd2443fbcd32cfefbffdbb3cf94e735
diff --git a/Readme.md b/Readme.md
index ee3a9c8..b5dfca6 100644
--- a/Readme.md
+++ b/Readme.md
@@ -5,7 +5,8 @@
![Introduction to Datok](https://raw.githubusercontent.com/KorAP/Datok/master/misc/introducing-datok.gif)
Implementation of a finite state automaton for
-high-performance natural language tokenization, based on a finite state
+high-performance large-scale natural language tokenization,
+based on a finite state
transducer generated with [Foma](https://fomafst.github.io/).
The library contains precompiled tokenizer models for
@@ -13,6 +14,10 @@
- [german](testdata/tokenizer_de.matok)
- [english](testdata/tokenizer_en.matok)
+The focus of development is on the tokenization of
+[DeReKo](https://www.ids-mannheim.de/digspra/kl/projekte/korpora),
+the german reference corpus.
+
## Performance
![Speed comparison of german tokenizers](https://raw.githubusercontent.com/KorAP/Datok/master/misc/benchmarks.svg)
diff --git a/testdata/tokenizer_en.fst b/testdata/tokenizer_en.fst
index 011934a..ee312d2 100644
--- a/testdata/tokenizer_en.fst
+++ b/testdata/tokenizer_en.fst
Binary files differ