Akron | 78f6714 | 2022-04-09 14:10:44 +0200 | [diff] [blame] | 1 | echo - Introduce Sentence splitter |
| 2 | ! And compose Whitespace ignorance |
| 3 | |
| 4 | read regex Token .o. [ |
| 5 | ! Put a Token boundary behind the longest possible |
| 6 | ! sentence ending punctuation sequence, |
| 7 | ! that isn't followed by a comma |
| 8 | SentenceEnd @-> ... NLout \/ _ NLout \%, |
| 9 | ] .o. [ |
| 10 | ! Put a Token boundary behind a punctuation |
| 11 | ! that is not a start of a punctuation sequence |
| 12 | SP @-> ... NLout \/ NLout _ NLout NotSentenceExtension |
| 13 | ] .o. [ |
| 14 | ! Put a Token boundary behind ... if not followed by a small character |
| 15 | [%. %. %.] @-> ... NLout \/ _ NLout WS+ NotSmallCaps |
| 16 | ] .o. [ |
| 17 | ! Remove whitespace between Tokens |
| 18 | [WS|NL]+ @-> 0 || [ .#. | NLout ] _ |
| 19 | ]; |