Changelog

[2.2.2] - 2022-01-23

  • fixed empty cardinals (e.g. "1000" -> "") in FilterKeys result

[2.2.1] - 2022-12-21

  • fixed missing scripts in bin distribution

[2.2.0] - 2022-12-21

  • added option --exclude-punctuation
  • improve binary distribution build
  • add source distribution build

[2.1.0] - 2022-12-01

  • added script GeneratePseudonymKey.groovy to compute pseudonyms
  • added script Pseudonymize.groovy to pseudonymize tokens (and lemmas)
  • added script FilterKeys.groovy to filter pseudonymization keys to contain actually used tokens/lemmas only

[2.0] - 2021-10-07

  • for .*\\.(freq|tsv)(\\.gz)? input files automatically cumulate frequencies
  • -N option added to sort keys with same frequency numerically
  • --pad option added to optionally add padding symbols at text edges
  • jar is now called totalngrams-2.0.jar
  • support xz compression for input and output (single-threaded and slow)
  • let number of folds default to 1 (-F option)