blob: 235124b3970b1dc14eeace47fbd070e428433481 [file] [log] [blame]
0.3.1 2026-02-11
- Introduce hyphenated abbreviations in german tokenizer.
- Support Wikipedia templates.
- Introduced multiple gender forms for nouns
in german tokenizer.
(from KorAP-Tokenizer)
- Added short forms for determiners, adjectives, pronouns
"eine(n)", "gute:r", "ihm/r", "diese(r)", "ein(e)"
0.2.2 2023-09-06
- Fix behaviour for end of text character positions
when no end of sentence occured before.
0.2.1 2023-09-05
- Add english tokenizer.
- Fix buffer bug.
- Improve Readme.
- Minor performance improvements.
0.1.7 2023-02-28
- Add dependabot checks.
- Add update command.
0.1.6 2022-04-14
- Rename TOKEN_SYMBOL to TOKEN_BOUND.
0.1.5 2022-03-28
- Improve Emoticon-List.
0.1.4 2022-03-27
- Improved handling of ellipsis.
- Make algorithm more robust to nevere fail.
- Remove match option.
0.1.3 2022-03-08
- Introduced refined handling of sentences including speech.
0.1.2 2021-12-07
- Improve performance of rune to symbol conversion in transduction
method.
- Support Plusampersand word list in compounds.