Initial import
7 files changed
tree: 4a24d93987a4809676b2febc58af4f67b3cd912c
  1. src/
  2. test/
  3. .gitignore
  4. package-lock.json
  5. package.json
  6. Readme.md
Readme.md

conllu-gender

Reads CoNLL-U format from stdin and annotates German gender-sensitive personal nouns, gendered determiners/pronouns, and neo-pronouns with correct POS (UPOS and XPOS/STTS), lemma, and morphological features. Writes CoNLL-U format to stdout.

Existing annotations for matched tokens are replaced; all other tokens pass through unchanged.

Background

Based on the morphosyntactic analysis in:

Ochs, S. (2026). Die morphosyntaktische Integration neuer Gendersuffixe: Eine korpusbasierte Analyse deutschsprachiger Pressetexte. Gender Linguistics, 2. doi: 10.65020/0619d927

The tool covers all six gender-marker types distinguished by Ochs & Rüdiger (2025):

TypeExamplesIntentGender feature
Genderstern *Lehrer*in, Bürger*innennon-binaryNonBin
Doppelpunkt :Lehrer:in, Bürger:innennon-binaryNonBin
Unterstrich _Lehrer_in, Bürger_innennon-binaryNonBin
Binnen-I ILehrerIn, LehrerInnenbinaryMasc,Fem
Klammern ()Lehrer(in), Lehrer(innen)binaryMasc,Fem
Schrägstrich /Lehrer/in, Lehrer/-innenbinaryMasc,Fem

Annotation

Nouns (NOUN / NN)

Surface formLemmaFEATS
Lehrer*inLehrer*inGender=NonBin|Number=Sing
Lehrer*innenLehrer*inGender=NonBin|Number=Plur
Lehrer:inLehrer:inGender=NonBin|Number=Sing
Lehrer:innenLehrer:inGender=NonBin|Number=Plur
Lehrer_inLehrer_inGender=NonBin|Number=Sing
LehrerInLehrerInGender=Masc,Fem|Number=Sing
LehrerInnenLehrerInGender=Masc,Fem|Number=Plur
Lehrer(in)Lehrer(in)Gender=Masc,Fem|Number=Sing
Lehrer/inLehrer/inGender=Masc,Fem|Number=Sing
Lehrer/-innenLehrer/inGender=Masc,Fem|Number=Plur

The lemma is always the nominative singular of the gender-marked derivate, preserving the original gender marker. Plural markers are stripped and their information encoded in Number=Plur. The Gender feature is retained for plural forms because the marker is still visibly present on the surface form.

Determiners (DET / ART, PIAT, PWAT, …)

Common gendered determiners are annotated.

Surface formLemmaFEATS
jede*rjede*rGender=NonBin
jede:rjede:rGender=NonBin
eine*neine*nGender=NonBin
kein_ekein_eGender=NonBin
die/derdie/derGender=Masc,Fem

Non-binary markers (*, :, _) yield Gender=NonBin; Schrägstrich (/) yields Gender=Masc,Fem.

Pronouns / Neo-pronouns (PRON / PPER)

Merged pronoun pairs with gender markers are annotated:

Surface formLemmaFEATS
sie*ersie*erGender=NonBin
er:sieer:sieGender=NonBin

Known limitations

  • Binnen-I with non-final capital (e.g. jedEn, jedEr): these forms embed the capital letter at a non-final position; detection requires morphological analysis beyond simple pattern matching and is not currently supported.
  • Gendered adjectives (e.g. begeisterte*n): not yet annotated (occur in ~5 % of gendered NP elements per Ochs 2026, §7.3.2).
  • Inflected case suffixes on gendered nouns (e.g. genitive Lehrers*in, dative plural extra marking): rare and not detected.
  • Completely novel neo-pronouns (e.g. dier, xier) that do not follow a known pattern cannot be detected by regular expressions.

Usage

# Annotate CoNLL-U input
korapxml2conllu doc.zip | conllu-gender

# Sparse output (only annotated tokens, with their sentence headers)
korapxml2conllu doc.zip | conllu-gender -s

# Pipe with other KorAP annotation tools
korapxml2conllu doc.zip | conllu-cmc | conllu-gender | conllu2korapxml > doc.annotated.zip

Options

OptionDescription
-s, --sparsePrint only tokens that received new annotations (with sentence headers).
-h, --helpPrint usage guide.

Installation

npm

npm install 'git+https://gitlab.ids-mannheim.de/KorAP/conllu-gender.git'

Build from source

npm install

Build standalone binary

npm run pkg-linux   # Linux x64
npm run pkg-macos   # macOS x64
npm run pkg-win     # Windows x64
npm run pkg-all     # all platforms

Testing

npm test

References

Ochs, S. & Rüdiger, J. O. (2025). Of stars and colons: A corpus-based analysis of gender-inclusive orthographies in German press texts. In D. Schmitz, S. D. Stein & V. Schneider (Eds.), Linguistic Intersections of Language and Gender (pp. 31–62). Düsseldorf: düsseldorf university press. https://doi.org/10.1515/9783111388694-003

Ochs, S. (2026). Die morphosyntaktische Integration neuer Gendersuffixe: Eine korpusbasierte Analyse deutschsprachiger Pressetexte. Gender Linguistics, 2. https://doi.org/10.65020/0619d927