spaCy/spacy/tests/lang
Paul O'Leary McCann 0f01f46e02
Update Cython string types (#9143)
* Replace all basestring references with unicode

`basestring` was a compatability type introduced by Cython to make
dealing with utf-8 strings in Python2 easier. In Python3 it is
equivalent to the unicode (or str) type.

I replaced all references to basestring with unicode, since that was
used elsewhere, but we could also just replace them with str, which
shoudl also be equivalent.

All tests pass locally.

* Replace all references to unicode type with str

Since we only support python3 this is simpler.

* Remove all references to unicode type

This removes all references to the unicode type across the codebase and
replaces them with `str`, which makes it more drastic than the prior
commits. In order to make this work importing `unicode_literals` had to
be removed, and one explicit unicode literal also had to be removed (it
is unclear why this is necessary in Cython with language level 3, but
without doing it there were errors about implicit conversion).

When `unicode` is used as a type in comments it was also edited to be
`str`.

Additionally `coding: utf8` headers were removed from a few files.
2021-09-13 17:02:17 +02:00
..
am Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ar Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
bg Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
bn Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ca Auto-format code with black 2021-07-02 07:48:26 +00:00
cs Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Tidy up and auto-format 2020-09-29 21:39:28 +02:00
el Tidy up and auto-format 2020-09-29 21:39:28 +02:00
en Fix/fix en ordinals (#8028) 2021-05-07 10:26:42 +02:00
es Tidy up and auto-format 2020-09-29 21:39:28 +02:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
fi Tidy up code 2021-06-28 12:08:15 +02:00
fr Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ga Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
grc Added ancient Greek language support (#8606) 2021-07-15 10:27:17 +02:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
hi Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
hu Tidy up and auto-format 2020-03-25 12:28:12 +01:00
hy Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
id Tidy up and auto-format 2020-09-29 21:39:28 +02:00
it Auto-format code with black 2021-07-02 07:48:26 +00:00
ja Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ko Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ky Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
lb Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
mk Tidy up and auto-format 2021-01-05 13:41:53 +11:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nb Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ne Tidy up and auto-format 2020-09-29 21:39:28 +02:00
nl Adding noun_chunks to the DUTCH language model (nl) (#8529) 2021-07-14 14:01:02 +02:00
pl Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
pt Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ro Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ru Tidy up tests and docs 2020-09-21 20:43:54 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
sr Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sv Tidy up and auto-format 2020-09-29 21:39:28 +02:00
th Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ti Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
tr Tidy up and auto-format 2021-01-05 13:41:53 +11:00
tt Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
uk Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
vi Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
yo Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
zh Tidy up and auto-format 2020-10-03 17:20:18 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Use tokenizer URL_MATCH pattern in LIKE_URL (#8765) 2021-07-27 12:07:01 +02:00
test_initialize.py Fix Azerbaijani init, extend lang init tests (#8656) 2021-07-09 15:36:35 +02:00
test_lemmatizers.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00