Handle tokenizer special cases more generally by using the Matcher
internally to match special cases after the affix/token_match
tokenization is complete.
Instead of only matching special cases while processing balanced or
nearly balanced prefixes and suffixes, this recognizes special cases in
a wider range of contexts:
* Allows arbitrary numbers of prefixes/affixes around special cases
* Allows special cases separated by infixes
Existing tests/settings that couldn't be preserved as before:
* The emoticon '")' is no longer a supported special case
* The emoticon ':)' in "example:)" is a false positive again
When merged with #4258 (or the relevant cache bugfix), the affix and
token_match properties should be modified to flush and reload all
special cases to use the updated internal tokenization with the Matcher.