Add support for Wikipedia Emoji Templates: EMOWIKI
3 files changed
tree: 5b5cbbbbd210f09399017fa28418bdfcabbffffe
  1. src/
  2. test/
  3. .gitignore
  4. .gitlab-ci.yml
  5. package-lock.json
  6. package.json
  7. Readme.md
Readme.md

conllu2cmc

Reads CoNLL-U format from stdin and annotates emojis, emoticons, hashtags, URLs, email addresses, action words, @names, and Wikipedia emoji templates with their corresponding STTS-IBK POS tag (Beißwenger/Bartsch/Evert/Würzner 2016). Writes CoNLL-U format to stdout.

Usage

korapxml2conllu kyc.zip | conllu2cmc

Generate KorAP-XML zip with CMC annotations

korapxml2conllu kyc.zip | conllu2cmc -s | conllu2korapxml > kyc.cmc.zip

Installation

npm install 'git+https://gitlab.ids-mannheim.de/KorAP/korap-conllu-cmc.git'

Build from source

npm install

Build standalone

npm run pkg-linux

References

Beißwenger, Michael/Bartsch, Sabine/Evert, Stefan/Würzner, Kay-Michael (2016): EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora. In: Proceedings of the 10th Web as Corpus Workshop. Berlin: Association for Computational Linguistics, S. 44–56. https://doi.org/10.18653/v1/W16-2606.