spaCy/spacy/lang/fr
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
..
lemmatizer Merge branch 'master' into develop 2019-02-07 20:54:07 +01:00
__init__.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
_tokenizer_exceptions_list.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
examples.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lex_attrs.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
punctuation.py Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
stop_words.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
syntax_iterators.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
tag_map.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
tokenizer_exceptions.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00