spaCy/spacy/tests/regression/test_issue2822.py

# coding: utf8
from __future__ import unicode_literals
from spacy.lang.it import Italian


def test_issue2822():
    """ Test that the abbreviation of poco is kept as one word """
    nlp = Italian()
    text = "Vuoi un po' di zucchero?"

    doc = nlp(text)

    assert len(doc) == 6

    assert doc[0].text == "Vuoi"
    assert doc[1].text == "un"
    assert doc[2].text == "po'"
    assert doc[2].lemma_ == "poco"
    assert doc[3].text == "di"
    assert doc[4].text == "zucchero"
    assert doc[5].text == "?"
Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib 2019-02-21 00:10:13 +03:00			`# coding: utf8`
			`from __future__ import unicode_literals`
			`from spacy.lang.it import Italian`


			`def test_issue2822():`
			`""" Test that the abbreviation of poco is kept as one word """`
			`nlp = Italian()`
			`text = "Vuoi un po' di zucchero?"`

			`doc = nlp(text)`

			`assert len(doc) == 6`

			`assert doc[0].text == "Vuoi"`
			`assert doc[1].text == "un"`
			`assert doc[2].text == "po'"`
			`assert doc[2].lemma_ == "poco"`
			`assert doc[3].text == "di"`
			`assert doc[4].text == "zucchero"`
			`assert doc[5].text == "?"`