| commit | 73f1749fb2a44001cff91dbf3777c562ee460018 | [log] [tgz] |
|---|---|---|
| author | Marc Kupietz <kupietz@ids-mannheim.de> | Mon Feb 26 09:44:53 2024 +0100 |
| committer | Marc Kupietz <kupietz@ids-mannheim.de> | Mon Feb 26 09:44:53 2024 +0100 |
| tree | 62085acd417a101db2fb250e0c289aa1f851aae7 | |
| parent | d10bef0e72fe66303b6745d2be2a87fdd0518242 [diff] |
Make action word regex more precise ::::: and :: are no action words
Reads CoNLL-U format from stdin and annotates emojis, emoticons, hashtags, URLs, email addresses, action words, and @names with their corresponding STTS-IBK POS tag (Beißwenger/Bartsch/Evert/Würzner 2016). Writes CoNLL-U format to stdout.
korapxml2conllu kyc.zip | conllu2cmc
korapxml2conllu kyc.zip | conllu2cmc -s | conllu2korapxml > kyc.cmc.zip
npm install 'git+https://gitlab.ids-mannheim.de/KorAP/korap-conllu-cmc.git'
npm install
npm run pkg-linux
Beißwenger, Michael/Bartsch, Sabine/Evert, Stefan/Würzner, Kay-Michael (2016): EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora. In: Proceedings of the 10th Web as Corpus Workshop. Berlin: Association for Computational Linguistics, S. 44–56. https://doi.org/10.18653/v1/W16-2606.