Changelog

[2.2.0] - 2022-12-07

  • added option --exclude-punctuation

[2.1.0] - 2022-12-01

  • added script GeneratePseudonymKey.groovy to compute pseudonyms
  • added script Pseudonymize.groovy to pseudonymize tokens (and lemmas)
  • added script FilterKeys.groovy to filter pseudonymization keys to contain actually used tokens/lemmas only

[2.0] - 2021-10-07

  • for .*\\.(freq|tsv)(\\.gz)? input files automatically cumulate frequencies
  • -N option added to sort keys with same frequency numerically
  • --pad option added to optionally add padding symbols at text edges
  • jar is now called totalngrams-2.0.jar
  • support xz compression for input and output (single-threaded and slow)
  • let number of folds default to 1 (-F option)