Adriane Boyd
30030176ee
Update Korean defaults for Tokenizer ( #10322 )
...
Update Korean defaults for `Tokenizer` for tokenization following UD
Korean Kaist.
2022-02-21 10:26:19 +01:00
Adriane Boyd
c5de9b463a
Update custom tokenizer APIs and pickling ( #8972 )
...
* Fix incorrect pickling of Japanese and Korean pipelines, which led to
the entire pipeline being reset if pickled
* Enable pickling of Vietnamese tokenizer
* Update tokenizer APIs for Chinese, Japanese, Korean, Thai, and
Vietnamese so that only the `Vocab` is required for initialization
2021-08-19 14:37:47 +02:00
Ines Montani
db55577c45
Drop Python 2.7 and 3.5 ( #4828 )
...
* Remove unicode declarations
* Remove Python 3.5 and 2.7 from CI
* Don't require pathlib
* Replace compat helpers
* Remove OrderedDict
* Use f-strings
* Set Cython compiler language level
* Fix typo
* Re-add OrderedDict for Table
* Update setup.cfg
* Revert CONTRIBUTING.md
* Revert lookups.md
* Revert top-level.md
* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Ines Montani
3d8fd4b461
Revert #4334
2019-09-29 17:32:12 +02:00
Ines Montani
c9cd516d96
Move tests out of package ( #4334 )
...
* Move tests out of package
* Fix typo
2019-09-28 18:05:00 +02:00
Bae Yong-Ju
a55f5a744f
Fix ValueError exception on empty Korean text. ( #4245 )
2019-09-06 10:29:40 +02:00
Bae Yong-Ju
05fbf5d976
Fix error when Korean text contains regexp special characters. ( #4022 )
2019-07-25 17:53:33 +02:00
Ines Montani
0b8406a05c
Tidy up and auto-format
2019-07-11 12:02:25 +02:00
cedar101
58f06e6180
Korean support ( #3901 )
...
* start lang/ko
* add test codes
* using natto-py
* add test_ko_tokenizer_full_tags()
* spaCy contributor agreement
* external dependency for ko
* collections.namedtuple for python version < 3.5
* case fix
* tuple unpacking
* add jongseong(final consonant)
* apply mecab option
* Remove Pipfile for now
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-09 22:23:16 +02:00