Changelog
[2.2.0] - 2022-12-07
- added option
--exclude-punctuation
[2.1.0] - 2022-12-01
- added script
GeneratePseudonymKey.groovy
to compute pseudonyms - added script
Pseudonymize.groovy
to pseudonymize tokens (and lemmas) - added script
FilterKeys.groovy
to filter pseudonymization keys to contain actually used tokens/lemmas only
[2.0] - 2021-10-07
- for
.*\\.(freq|tsv)(\\.gz)?
input files automatically cumulate frequencies - -N option added to sort keys with same frequency numerically
- --pad option added to optionally add padding symbols at text edges
- jar is now called totalngrams-2.0.jar
- support xz compression for input and output (single-threaded and slow)
- let number of folds default to 1 (-F option)