Changelog

All notable changes to this project are documented in this file.

1.1.0 - 2026-04-11

Added

  • Hashtag tagging with the HST tag.
  • Regression tests for hashtag tagging and ADR tagging behavior.
  • Expanded documentation covering the tagset, CoNLL-U output examples, limitations, performance, and application contexts.

Changed

  • ADR is now emitted whenever a token matches the @-address pattern, regardless of existing POS values in the input.
  • Purely numeric tokens such as #10 are no longer tagged as HST; hashtags must contain at least one Unicode letter.
  • Test tooling was updated by upgrading Jest from ^29.7.0 to ^30.3.0; .gitlab-ci-local is now ignored during Jest module discovery to avoid local naming collisions.
  • Documentation examples were revised and anonymized for public release.

Fixed

  • Sparse mode now respects stdout backpressure, avoiding Node.js heap exhaustion on very large corpora with many matches.
  • Hashtags containing Unicode letters such as umlauts are now tagged as HST.
  • Emoji-name values in the n FEATS field no longer insert spurious underscores after separators such as : and ,; examples now use forms like thumbs_up:light_skin_tone and family:man,man,boy.

1.0.0

  • Initial release.