spaCy/spacy/lang/vi
Adriane Boyd 1d59fdbd39
Update Vietnamese tokenizer (#8099)
* Adapt tokenization methods from `pyvi` to preserve text encoding and
whitespace
* Add serialization support similar to Chinese and Japanese

Note: as for Chinese and Japanese, some settings are duplicated in
`config.cfg` and `tokenizer/cfg`.
2021-05-17 18:16:20 +10:00
..
__init__.py Update Vietnamese tokenizer (#8099) 2021-05-17 18:16:20 +10:00
lex_attrs.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
stop_words.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00