* added single and paired orth variants
* added token match
* added long text tokenization test
* inverted init
* normalized lemmas to lowercase
* more abbrevs
* tests for ordinals and abbrevs
* separated period abbvrevs to another list
* fiex typo
* added ordinal and abbrev tests
* added number tests for dates
* minor refinement
* added inflected abbrevs regex
* added percentage and inflection
* cosmetics
* added token match
* added url inflection tests
* excluded url tokens from custom pattern
* removed url match import
* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages
* Include Macedonian language
* Fix indentation at char_classes.py
* Fix indentation at char_classes.py
* Add Macedonian tests, update lex_attrs and char_classes
* Import unicode literals for python 2
* added tr_vocab to config
* basic test
* added syntax iterator to Turkish lang class
* first version for Turkish syntax iter, without flat
* added simple tests with nmod, amod, det
* more tests to amod and nmod
* separated noun chunks and parser test
* rearrangement after nchunk parser separation
* added recursive NPs
* tests with complicated recursive NPs
* tests with conjed NPs
* additional tests for conj NP
* small modification for shaving off conj from NP
* added tests with flat
* more tests with flat
* added examples with flats conjed
* added inner func for flat trick
* corrected parse
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* added tr_vocab to config
* basic test
* added syntax iterator to Turkish lang class
* first version for Turkish syntax iter, without flat
* added simple tests with nmod, amod, det
* more tests to amod and nmod
* separated noun chunks and parser test
* rearrangement after nchunk parser separation
* added recursive NPs
* tests with complicated recursive NPs
* tests with conjed NPs
* additional tests for conj NP
* small modification for shaving off conj from NP
* added tests with flat
* more tests with flat
* added examples with flats conjed
* added inner func for flat trick
* corrected parse
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* feat: added turkish tag map
* feat: morph rules cconj and sconj
* feat: more conjuncts
* feat: added popular postpositions
* feat: added adverbs
* feat: added personal pronouns
* feat: added reflexive pronouns
* minor: corrected case capital
* minor: fixed comma typo
* feat: added indef pronouns
* feat: added dict iter
* fixed comma typo
* updated language class with tag map and morph
* use default tag map instead
* removed tag map
* Hindi: Adds tests for lexical attributes (norm and like_num)
* Signs and sdds the contributor agreement
* Add ordinal numbers to be tagged as like_num
* Adds alternate pronunciation for 31 and 39
* Allow `pkuseg_model` to be set to `None` on initialization
* Don't save config within tokenizer
* Force convert pkuseg_model to use pickle protocol 4 by reencoding with
`pickle5` on serialization
* Update pkuseg serialization test
* create contributor agreement
* Update Indonesian example. (see #1107)
Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.