Add de_old tokenizer variant without gender-sensitive rules
Introduces a new German tokenizer variant 'de_old' that treats
gender-sensitive forms, separted by `:`, `(`, and `/` as separate
tokens.
With de_old:
- Nutzer:in → Nutzer : in (3 tokens)
- Nutzer/innen → Nutzer / innen (3 tokens)
- Kaufmann/frau → Kaufmann / frau (3 tokens)
- Nutzer(in) → Nutzer ( in ) (4 tokens)
The standard 'de' tokenizer continues to keep these as single tokens.
Includes tests for the new variant.
Change-Id: I1d2af97e92a8af36ac3b04c2807bce8422b85df1
diff --git a/pom.xml b/pom.xml
index c9c1fe3..eec21db 100644
--- a/pom.xml
+++ b/pom.xml
@@ -112,6 +112,19 @@
<target>src/main/jflex/fr</target>
</configuration>
</execution>
+ <execution>
+ <id>preprocess for de-trad</id>
+ <phase>generate-sources</phase>
+ <goals>
+ <goal>preprocess</goal>
+ </goals>
+ <configuration>
+ <vars>
+ <target.language>de_old</target.language>
+ </vars>
+ <target>src/main/jflex/de_old</target>
+ </configuration>
+ </execution>
</executions>
<configuration>
<sources>src/main/jpc/jflex</sources>