spaCy/spacy/tests/lang
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
..
ar Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
bn 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
ca Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
da 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
de 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
el 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
en Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
es 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
fi 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
fr Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
ga 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
he 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
hu Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
id 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
it Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
ja 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
nb 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
nl 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
pl Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
pt 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
ro 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
ru Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
sv Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
th 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
tr 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
tt 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
uk Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
ur Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
test_attrs.py 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
test_initialize.py 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00