Changelog
All notable changes to this project are documented in this file.
1.1.0 - 2026-04-11
Added
- Hashtag tagging with the
HST tag. - Regression tests for hashtag tagging and
ADR tagging behavior. - Expanded documentation covering the tagset, CoNLL-U output examples, limitations, performance, and application contexts.
Changed
ADR is now emitted whenever a token matches the @-address pattern, regardless of existing POS values in the input.- Purely numeric tokens such as
#10 are no longer tagged as HST; hashtags must contain at least one Unicode letter. - Documentation examples were revised and anonymized for public release.
Fixed
- Sparse mode now respects stdout backpressure, avoiding Node.js heap exhaustion on very large corpora with many matches.
- Hashtags containing Unicode letters such as umlauts are now tagged as
HST. - Emoji-name values in the
n FEATS field no longer insert spurious underscores after separators such as : and ,; examples now use forms like thumbs_up:light_skin_tone and family:man,man,boy.
1.0.0