Add library usage explanation
Change-Id: I4401ebd7218a3e699efb73917aa5de8baf6f17f8
diff --git a/Readme.md b/Readme.md
index e03df67..4799932 100644
--- a/Readme.md
+++ b/Readme.md
@@ -9,7 +9,7 @@
based on a finite state
transducer generated with [Foma](https://fomafst.github.io/).
-The library contains precompiled tokenizer models for
+The repository currently contains precompiled tokenizer models for
- [german](testdata/tokenizer_de.matok)
- [english](testdata/tokenizer_en.matok)
@@ -18,6 +18,8 @@
[DeReKo](https://www.ids-mannheim.de/digspra/kl/projekte/korpora),
the german reference corpus.
+Datok can be used as a standalone tool or as a library in Go.
+
## Performance
![Speed comparison of german tokenizers](https://raw.githubusercontent.com/KorAP/Datok/master/misc/benchmarks.svg)
@@ -54,6 +56,7 @@
> *Caution*: When experimenting with STDIN and echo,
> you may need to disable [history expansion](https://www.gnu.org/software/bash/manual/html_node/History-Interaction.html).
+
## Conversion
```
@@ -68,6 +71,38 @@
representation
```
+## Library
+
+```go
+package main
+
+import (
+ "github.com/KorAP/datok"
+ "os"
+ "strings"
+)
+
+func main () {
+
+ // Load transducer binary
+ dat := datok.LoadTokenizerFile("tokenizer_de.matok")
+ if dat == nil {
+ panic("Can't load tokenizer")
+ }
+
+ // Create a new TokenWriter object
+ tw := datok.NewTokenWriter(os.Stdout, datok.TOKENS|datok.SENTENCES)
+ defer tw.Flush()
+
+ // Create an io.Reader object refering to the data to tokenize
+ r := strings.NewReader("Das ist <em>interessant</em>!")
+
+ // The transduceTokenWriter accepts an io.Reader
+ // object and a TokenWriter object to transduce the input
+ dat.TransduceTokenWriter(r, tw)
+}
+```
+
## Conventions
The FST generated by [Foma](https://fomafst.github.io/) must adhere to