1. 0139bc5 Introduce the english model as being on the same level as german by Akron · 1 year, 3 months ago
  2. d47c67e Add minor rules for XML support by Akron · 2 years, 8 months ago
  3. b428755 Support punctuation after quotes by Akron · 2 years, 8 months ago v0.1.4
  4. df27581 Make tokenizer robust and never failing by Akron · 2 years, 8 months ago
  5. d0c6e10 Fix datok tests to be more robust regarding tokenizer changes by Akron · 3 years ago
  6. 00cecd1 Initialize identity for sigma < 256 by Akron · 3 years ago
  7. 4880fb6 Improve rune2symbol conversion by Akron · 3 years ago
  8. fac8abc Reorder longest match operator and update models by Akron · 3 years, 1 month ago
  9. 289414f Update benchmarks by Akron · 3 years, 1 month ago
  10. 04335c6 Update tests by Akron · 3 years, 1 month ago
  11. 9fb63af Optimize tests by avoiding reload of tokenizers by Akron · 3 years, 1 month ago
  12. 98fbfef Improve offset handling in buffers by Akron · 3 years, 1 month ago
  13. c9c0eae Rename tests to better comply with Go test tool by Akron · 3 years, 1 month ago
  14. a854faa Introduce EOT (end-of-transmission) marker by Akron · 3 years, 1 month ago
  15. e396a93 Introduce token_writer object by Akron · 3 years, 1 month ago
  16. 094a4e8 Use serialized matrix representation in test suite by Akron · 3 years, 2 months ago
  17. 28031b7 Introduce matrix serialization and deserialization by Akron · 3 years, 2 months ago
  18. 941f215 Support both matrix and da in the command by Akron · 3 years, 2 months ago
  19. 5c82a92 Add sentence end detection to matrix by Akron · 3 years, 2 months ago
  20. 1c34ce6 Introduce alternative matrix representation by Akron · 3 years, 2 months ago
  21. 7f1097f Rename datokenizer to datok by Akron · 3 years, 2 months ago[Renamed (99%) from datokenizer_test.go]
  22. 29e306f Combine Niu et al. (2013) and Morita et al. (2001) by Akron · 3 years, 3 months ago
  23. 679b486 Add skip-method proposed by Morita et al. (2001) by Akron · 3 years, 3 months ago
  24. 7b1faa6 Add xCheck() improvement proposed by Niu (2013) by Akron · 3 years, 3 months ago
  25. df37a55 Fixed benchmark tests by Akron · 3 years, 3 months ago
  26. 4c2a1ad Introduce XML tests by Akron · 3 years, 3 months ago
  27. 0630be5 Fix parsing of end states by Akron · 3 years, 3 months ago
  28. 92704eb Ignore tokenend accepting transitions by Akron · 3 years, 3 months ago
  29. 4fa28b3 Introduce TransCount method by Akron · 3 years, 3 months ago
  30. 31f3c06 Ignore MCS in sigma if not used in the transducer by Akron · 3 years, 3 months ago
  31. de18e90 Minor optimization on edges by Akron · 3 years, 3 months ago
  32. 6f1c16c Added benchmark for double array creation by Akron · 3 years, 3 months ago
  33. ea46e8a Add ASCII fast lookup to sigma by Akron · 3 years, 3 months ago
  34. f1a1650 Turn uint32 array in bc array by Akron · 3 years, 3 months ago
  35. bb4aac5 Optimize loading of datok files by Akron · 3 years, 3 months ago
  36. 01912fc Remove unnecessary allocation for buffer recasting by Akron · 3 years, 4 months ago
  37. bd40680 Added transducing benchmark by Akron · 3 years, 4 months ago
  38. ec835ad Remove Match() method by Akron · 3 years, 4 months ago
  39. e8837b5 Add file scheme by Akron · 3 years, 4 months ago
  40. fd92d7e Update abbreviations according to KorAP-Tokenizer by Akron · 3 years, 4 months ago
  41. a0bded5 Add ordinals by Akron · 3 years, 4 months ago
  42. 4af79f1 Added support for streetnames by Akron · 3 years, 4 months ago
  43. 03ca425 Adopt tokenizer tests from KorAP-Tokenizer by Akron · 3 years, 4 months ago
  44. 6e70dc8 Fix sentence splitting tests by Akron · 3 years, 4 months ago
  45. 1594cb8 Fix sentence splitting by Akron · 3 years, 4 months ago
  46. c5d8d43 Fix check on final states by Akron · 3 years, 4 months ago
  47. b7e1f13 Simplify transducer (single test broken) by Akron · 3 years, 4 months ago
  48. df0a3ef Correctly handle final data by Akron · 3 years, 4 months ago
  49. 439f4ec Cleanup by Akron · 3 years, 4 months ago
  50. 03c92fe Support for tokenend MCS symbol by Akron · 3 years, 4 months ago
  51. b4bbb47 Added sentence splitter capabilities by Akron · 3 years, 4 months ago
  52. 3610f10 Introduce buffer with single epsilon backtrack by Akron · 3 years, 4 months ago
  53. 3a063ef Fix loading routine by Akron · 3 years, 4 months ago
  54. 524c543 Fix sigma to start with 1 by Akron · 3 years, 4 months ago
  55. 3f8571a Support reader/writer in transduce and add load by Akron · 3 years, 4 months ago
  56. 84d68e6 Support tokenend handling in transducing by Akron · 3 years, 4 months ago
  57. 2a4b929 Switch to 2 leading bits (30 bit addresses) by Akron · 3 years, 4 months ago
  58. 068874c Introduce nontoken handling in preliminary transducer by Akron · 3 years, 4 months ago
  59. 83e75a2 Introduce nontoken information by Akron · 3 years, 4 months ago
  60. 03a3c61 Rename loadLevel to loadFactor by Akron · 3 years, 4 months ago
  61. 3fdfec6 Turn states into uint32 pairs by Akron · 3 years, 4 months ago
  62. 64ffd9a Restructure and rename methods by Akron · 3 years, 4 months ago
  63. c17f1ca Turn special sigma values into properties by Akron · 3 years, 4 months ago
  64. 6247a5d Add serialization method by Akron · 3 years, 4 months ago
  65. 773b1ef Cache loadlevel by Akron · 3 years, 4 months ago
  66. d66a926 Add load factor by Akron · 3 years, 4 months ago
  67. f2120ca Split Tokenizer and DaTokenizer by Akron · 3 years, 4 months ago
  68. c9d84a6 Sort alphabet prior to xCheck by Akron · 3 years, 4 months ago
  69. 740f3d7 Cleanup code by Akron · 3 years, 4 months ago
  70. 49d27ee Fix epsilon handling in match operation by Akron · 3 years, 4 months ago
  71. 465a099 Add support for epsilon symbols by Akron · 3 years, 4 months ago
  72. 730a79c Support unknown and identity symbols by Akron · 3 years, 4 months ago
  73. 75ebe7f Fix foma format parser by Akron · 3 years, 4 months ago
  74. 8ef408b Initial commit by Akron · 3 years, 4 months ago