Gitiles
Code Review
Sign In
korap.ids-mannheim.de
/
KorAP
/
KorAP-XML-Krill
/
207439cb4ea07e0cc5701651ce815355512f9a9b
/
lib
/
KorAP
/
Tokenizer.pm
a96de62
Added glemm workaround, removed author array, implemented but skipped punctuation support
by Nils Diewald
· 10 years ago
32e30f0
Fix script for new index (including new foundries)
by Nils Diewald
· 10 years ago
840c924
Meta data update for corpus indexing
by Nils Diewald
· 10 years ago
6dab7ea
Add comments on punctuation
by Nils Diewald
· 10 years ago
7658e0a
Fixed surface attribute
by Nils Diewald
· 10 years ago
032e31d
Fixed example token
by Nils Diewald
· 10 years ago
f03c680
Sentence annotations for all providing foundries and a beginning subtokenization based on cschnobers code
by Nils Diewald
· 10 years ago
ff6d078
Solr export
by Nils Diewald
· 10 years ago
47c3ef3
Found some bugs in XIP/Constituency ... and introduced some new ones - yay
by Nils Diewald
· 11 years ago
21a3e1a
Bugfixes in dependency converter, improved test suite
by Nils Diewald
· 11 years ago
7b84722
Added text marker, added sentences from multiple foundries, changed paragraphs to base/para some tests, some bugfixes
by Nils Diewald
· 11 years ago
3cf08c7
Fixed primary data problems, speedup using moar C and now provide layer info
by Nils Diewald
· 11 years ago
3ece630
Fixed tiny offset issue for documents ending with non-tokens
by Nils Diewald
· 11 years ago
aba4710
Made the indexer more robust and ignore s**t my parser says
by Nils Diewald
· 11 years ago
092178e
Fix dealing with no-span layers|Improve error messages for bughunting
by Nils Diewald
· 11 years ago
37e5b57
changed sentence foundry, foundry selector and textClass serialization
by Nils Diewald
· 11 years ago
d9c1661
Lucene Backend is now a module (1)
by Nils Diewald
· 11 years ago
044c41d
Moderately changed JSON output format for easier parsing
by Nils Diewald
· 11 years ago
7364d1f
Indexation script finished
by Nils Diewald
· 11 years ago
2db9ad0
Lucene field indexer written in perl
by Nils Diewald
· 11 years ago