Clone this repo:

Branches

  1. b75d926 Fix CI docker test by Marc Kupietz · 3 days ago master
  2. b331844 Fix gitlab ci workflow by Marc Kupietz · 3 days ago
  3. 03ba301 Dockerize and rename to conllu-cmc by Marc Kupietz · 3 days ago
  4. b5d80b3 Add test for emoji modifiers and ZWJ by Marc Kupietz · 3 days ago
  5. 7497fc4 Add support for Wikipedia Emoji Templates: EMOWIKI by Marc Kupietz · 3 days ago

conllu-cmc

Docker

Reads CoNLL-U format from stdin and annotates emojis, emoticons, hashtags, URLs, email addresses, action words, @names, and Wikipedia emoji templates with their corresponding STTS-IBK POS tag (Beißwenger/Bartsch/Evert/Würzner 2016). Writes CoNLL-U format to stdout.

Docker Usage

# Annotate CoNLL-U input
korapxml2conllu kyc.zip | docker run --rm -i korap/conllu-cmc

# With sparse output (only annotated lines)
korapxml2conllu kyc.zip | docker run --rm -i korap/conllu-cmc -s

# Generate KorAP-XML zip with CMC annotations
korapxml2conllu kyc.zip | docker run --rm -i korap/conllu-cmc -s | conllu2korapxml > kyc.cmc.zip

# Show help
docker run --rm korap/conllu-cmc --help

Local Usage

korapxml2conllu kyc.zip | conllu-cmc

Generate KorAP-XML zip with CMC annotations

korapxml2conllu kyc.zip | conllu-cmc -s | conllu2korapxml > kyc.cmc.zip

Installation

Docker (recommended)

docker pull korap/conllu-cmc

npm

npm install 'git+https://gitlab.ids-mannheim.de/KorAP/conllu-cmc-docker.git'

Build from source

npm install

Build standalone

npm run pkg-linux

References

Beißwenger, Michael/Bartsch, Sabine/Evert, Stefan/Würzner, Kay-Michael (2016): EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora. In: Proceedings of the 10th Web as Corpus Workshop. Berlin: Association for Computational Linguistics, S. 44–56. https://doi.org/10.18653/v1/W16-2606.