blob: bef8cbd048c4b60e3acf04e773c1d0831d65158d [file] [log] [blame]
Akron78f67142022-04-09 14:10:44 +02001echo - Introduce Sentence splitter
2! And compose Whitespace ignorance
3
4read regex Token .o. [
5 ! Put a Token boundary behind the longest possible
6 ! sentence ending punctuation sequence,
7 ! that isn't followed by a comma
8 SentenceEnd @-> ... NLout \/ _ NLout \%,
9] .o. [
10 ! Put a Token boundary behind a punctuation
11 ! that is not a start of a punctuation sequence
12 SP @-> ... NLout \/ NLout _ NLout NotSentenceExtension
13] .o. [
14 ! Put a Token boundary behind ... if not followed by a small character
15 [%. %. %.] @-> ... NLout \/ _ NLout WS+ NotSmallCaps
16] .o. [
17 ! Remove whitespace between Tokens
18 [WS|NL]+ @-> 0 || [ .#. | NLout ] _
19];