extend tests for wikipedia.txt in t/tokenization.t

 . extended testing for wikipedia.txt, so that
   UTF8 characters are read

 . fixed bug related to UTF-8

 . TODO: testing is very slow after bugfix

Change-Id: I7d63e1b87c10bab85789098b3b7ce63f359dc49e
1 file changed
tree: 3e69613385c58635bd43f1cab4e0506f1e7a2d21
  1. lib/
  2. script/
  3. t/
  4. xt/
  5. .gitignore
  6. LICENSE
  7. Makefile.PL