mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-27 10:26:35 +03:00
9a478b6db8
* splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib
12 lines
332 B
Python
12 lines
332 B
Python
# coding: utf8
|
|
from __future__ import unicode_literals
|
|
|
|
from spacy.lang.de import German
|
|
|
|
|
|
def test_issue3002():
|
|
"""Test that the tokenizer doesn't hang on a long list of dots"""
|
|
nlp = German()
|
|
doc = nlp('880.794.982.218.444.893.023.439.794.626.120.190.780.624.990.275.671 ist eine lange Zahl')
|
|
assert len(doc) == 5
|