Gitiles
Code Review
Sign In
korap.ids-mannheim.de
/
KorAP
/
KorAP-XML-TEI
/
edee6e5115ef54f850ad0fe7f9a9eb0bf8b3a418
/
lib
edee6e5
Make tokenization chainable and remove unnecessary tokenization switch
by Akron
· 4 years, 10 months ago
f57ed81
Establish header object for corpus, doc and text header parsing
by Akron
· 4 years, 10 months ago
190d022
Improve utf-8-preprocessing for tokenizers
by Akron
· 4 years, 10 months ago
994aff7
faster processing of UTF8-chars
by Peter Harders
· 4 years, 10 months ago
854a115
bugfixing Conservative.pm
by Peter Harders
· 4 years, 10 months ago
5fb5e8d
Simplify and centralize temporary file creation
by Akron
· 4 years, 10 months ago
b122717
clean up intern tokenization
by Peter Harders
· 4 years, 10 months ago
95bc98a
Rename delHTMLcom to be in line with other naming conventions and make the function exportable
by Akron
· 4 years, 11 months ago
8b511f9
Establish tokenizer object for external base tokenization
by Akron
· 5 years ago
d962747
Establish tokenizer objects for aggressive and conservative base tokenization
by Akron
· 5 years ago
8571751
Create Zip-Factory for simpler handling of Zip streams
by Akron
· 5 years ago
3479082
Simplify conservative tokenization code
by Akron
· 5 years ago
510a88c
Minor speedup in tokenization by merging array pushes
by Akron
· 5 years ago
eac374d
Separate dummy tokenization from main script with minimal changes
by Akron
· 5 years ago
7fab93b
Replace recursion and non-essential regexes with index/substr
by Akron
· 5 years ago
2d547bc
Fix a bug in delHTMLcom where comments were left open
by Akron
· 5 years ago
4f67cd4
Atomize and test comment stripping
by Akron
· 5 years ago