| Akron | 78f6714 | 2022-04-09 14:10:44 +0200 | [diff] [blame] | 1 | echo - Introduce Sentence splitter | 
|  | 2 | ! And compose Whitespace ignorance | 
|  | 3 |  | 
|  | 4 | read regex Token .o. [ | 
|  | 5 | ! Put a Token boundary behind the longest possible | 
|  | 6 | ! sentence ending punctuation sequence, | 
|  | 7 | ! that isn't followed by a comma | 
|  | 8 | SentenceEnd @-> ... NLout \/ _ NLout \%, | 
|  | 9 | ] .o. [ | 
|  | 10 | ! Put a Token boundary behind a punctuation | 
|  | 11 | ! that is not a start of a punctuation sequence | 
|  | 12 | SP @-> ... NLout \/ NLout _ NLout NotSentenceExtension | 
|  | 13 | ] .o. [ | 
|  | 14 | ! Put a Token boundary behind ... if not followed by a small character | 
|  | 15 | [%. %. %.] @-> ... NLout \/ _ NLout WS+ NotSmallCaps | 
|  | 16 | ] .o. [ | 
|  | 17 | ! Remove whitespace between Tokens | 
|  | 18 | [WS|NL]+ @-> 0 || [ .#. | NLout ] _ | 
|  | 19 | ]; |