* added single and paired orth variants
* added token match
* added long text tokenization test
* inverted init
* normalized lemmas to lowercase
* more abbrevs
* tests for ordinals and abbrevs
* separated period abbvrevs to another list
* fiex typo
* added ordinal and abbrev tests
* added number tests for dates
* minor refinement
* added inflected abbrevs regex
* added percentage and inflection
* cosmetics
* added token match
* added url inflection tests
* excluded url tokens from custom pattern
* removed url match import