Marc Kupietz | 899ba2a | 2022-03-17 21:50:22 +0100 | [diff] [blame] | 1 | Tool Version Sen Tok Model Tokens/ms effi1x - 100 runs Tokens/ms effi10x - 100 runs |
| 2 | KorAP-Tokenizer 72.90 199.28 |
| 3 | Datok x x datok 837.89 2478.71 |
| 4 | x x matok 1371.19 2976.80 |
| 5 | BlingFire 0.1.8 x wbd.bin 431.92 1697.73 |
| 6 | x sbd.bin 417.10 1908.87 |
| 7 | Cutter 2.5 x x 0.38 |
| 8 | JTok 2.1.19 31.19 117.22 |
| 9 | OpenNLP x Simple 290.71 1330.23 |
| 10 | x Tokenizer 74.65 145.08 |
| 11 | x SentenceD 247.84 853.01 |
| 12 | SoMaJo x x P=1 8.15 8.41 |
| 13 | x x P=8 27.32 39.91 |
| 14 | SpaCy x Tokenizer 19.73 44.40 |
| 15 | x Sentencizer 16.94 |
| 16 | x Statistical 4.90 |
| 17 | x Dependency 2.24 |
| 18 | Stanford x 75.47 156.24 |
| 19 | x x T,S,M 46.95 91.56 |
| 20 | Syntok x segmenter 59.66 61.07 |
| 21 | x tokenizer 103.90 108.40 |
| 22 | Waste 2.0.20-1 x x 141.07 144.95 |
| 23 | Elephant x 8.57 8.68 |
| 24 | TreeTagger x 69.92 72.98 |
| 25 | Deep-EOS x bi-lstm-de 0.25 |
| 26 | x cnn-de 0.27 |
| 27 | x lstm-de 0.29 |
| 28 | NNsplit x 0.90 |