Improve Readme

commit: 842bc6560f3522275e237d7a25af46c264a64631 [log] [tgz]
author: Akron <nils@diewald-online.de> Sat Oct 09 19:34:15 2021 +0200
committer: Akron <nils@diewald-online.de> Sat Oct 09 19:34:15 2021 +0200
tree: 29646eb841edeb57f49db0a215204a8f2653226d
parent: abcb6a5f199b29d598b983acec72ab6b9031f63f [diff]
diff --git a/Readme.md b/Readme.md
index ce710d9..c2f967e 100644
--- a/Readme.md
+++ b/Readme.md

@@ -1,4 +1,4 @@
-# Datok - Matrix or Double Array FSA based Tokenizer
+# Datok - Finite State Tokenizer
 
 This is an implementation of an FSA for natural language
 tokenization, either in form of a matrix representation
@@ -6,7 +6,8 @@
 The system accepts a finite state transducer (FST)
 describing a tokenizer generated by
 [Foma](https://fomafst.github.io/)
-that needs to follow some rules as described below.
+that needs to follow some conventional rules as described
+below.
 
 # Conventions
 
@@ -113,9 +114,12 @@
 # Technology
 
 The double array representation (Aoe 1989) of all transitions
-in the FST is
-implemented as an extended FSA following Mizobuchi et al. (2000)
-and implementation details following Kanda et al. (2018).
+in the FST is implemented as an extended DFA following Mizobuchi
+et al. (2000) and implementation details following Kanda et al. (2018).
+
+Both representations mark all non-word-character targets with a
+leading bit. The transduction is greedy with a single backtracking
+option to the last ε transition.
 
 The german tokenizer shipped is based on work done by the
 [Lucene project](https://github.com/apache/lucene-solr)
commit	842bc6560f3522275e237d7a25af46c264a64631	[log] [tgz]
author	Akron <nils@diewald-online.de>	Sat Oct 09 19:34:15 2021 +0200
committer	Akron <nils@diewald-online.de>	Sat Oct 09 19:34:15 2021 +0200
tree	29646eb841edeb57f49db0a215204a8f2653226d
parent	abcb6a5f199b29d598b983acec72ab6b9031f63f [diff]