1
1
mirror of https://github.com/explosion/spaCy.git synced 2025-02-17 03:50:37 +03:00
spaCy/spacy/tests/regression/test_issue3002.py
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer ()
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue  which now works

* partial fix for issue 

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue 

* Fix issue  with custom Italian exception

* Fix issue  by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue  which now works

* partial fix for issue 

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue 

* Fix issue  with custom Italian exception

* Fix issue  by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue  fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00

12 lines
332 B
Python

# coding: utf8
from __future__ import unicode_literals
from spacy.lang.de import German
def test_issue3002():
"""Test that the tokenizer doesn't hang on a long list of dots"""
nlp = German()
doc = nlp('880.794.982.218.444.893.023.439.794.626.120.190.780.624.990.275.671 ist eine lange Zahl')
assert len(doc) == 5