Make sure that start and end tags for empty texts are counted
For each text, no matter if empty or not, there will be one
start and end tag count in the unigrams.
Change-Id: I9fe769ea3d8a7de7b078499f33a611a7ba4bac4d
diff --git a/src/test/resources/simple.conllu b/src/test/resources/simple.conllu
index 2e52539..5a907cc 100644
--- a/src/test/resources/simple.conllu
+++ b/src/test/resources/simple.conllu
@@ -1,17 +1,31 @@
-# text_id = TST_TST.00000
+# text_id = TST_TST.00001
+# empty texts are expected to count
+
+# text_id = TST_TST.00002
1 ich ich PPER PPER _ _ _ _ 1
2 bin sein VAFIN VAFIN _ _ _ _ 1.000000
3 alex alex NE NE _ _ _ _ 0.565630
4 . . $. $. _ _ _ _ 1.000000
-# text_id = TST_TST.00001
+# text_id = TST_TST.00003
1 alex alex NE NE _ _ _ _ 0.565630
2 bin sein VAFIN VAFIN _ _ _ _ 1.000000
3 ich ich PPER PPER _ _ _ _ 1
4 . . $. $. _ _ _ _ 1.000000
-# text_id = TST_TST.00002
+# text_id = TST_TST.00004
+# make sure that an empty text header does no harm
+
+# text_id = TST_TST.00005
1 ich ich PPER PPER _ _ _ _ 1
2 heiße heißen VAFIN VAFIN _ _ _ _ 1.000000
3 alex alex NE NE _ _ _ _ 0.565630
4 . . $. $. _ _ _ _ 1.000000
+
+# text_id = TST_TST.00006
+# make sure that an empty text header does no harm
+
+# text_id = TST_TST.00007
+# in the unigrams we should have 7 start and end tags
+
+