spaCy/spacy/tests/lang/de/test_exceptions.py

# coding: utf-8
from __future__ import unicode_literals

import pytest


@pytest.mark.parametrize("text", ["auf'm", "du's", "über'm", "wir's"])
def test_de_tokenizer_splits_contractions(de_tokenizer, text):
    tokens = de_tokenizer(text)
    assert len(tokens) == 2


@pytest.mark.parametrize("text", ["z.B.", "d.h.", "Jan.", "Dez.", "Chr."])
def test_de_tokenizer_handles_abbr(de_tokenizer, text):
    tokens = de_tokenizer(text)
    assert len(tokens) == 1


def test_de_tokenizer_handles_exc_in_text(de_tokenizer):
    text = "Ich bin z.Zt. im Urlaub."
    tokens = de_tokenizer(text)
    assert len(tokens) == 6
    assert tokens[2].text == "z.Zt."
    assert tokens[2].lemma_ == "zur Zeit"


@pytest.mark.parametrize(
    "text,norms", [("vor'm", ["vor", "dem"]), ("du's", ["du", "es"])]
)
def test_de_tokenizer_norm_exceptions(de_tokenizer, text, norms):
    tokens = de_tokenizer(text)
    assert [token.norm_ for token in tokens] == norms


@pytest.mark.parametrize("text,norm", [("daß", "dass")])
def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm):
    tokens = de_tokenizer(text)
    assert tokens[0].norm_ == norm
Add tokenizer tests for German 2017-01-05 20:11:25 +03:00			`# coding: utf-8`
			`from __future__ import unicode_literals`

			`import pytest`


💫 Tidy up and auto-format tests (#2967) * Auto-format tests with black * Add flake8 config * Tidy up and remove unused imports * Fix redefinitions of test functions * Replace orths_and_spaces with words and spaces * Fix compatibility with pytest 4.0 * xfail test for now Test was previously overwritten by following test due to naming conflict, so failure wasn't reported * Unfail passing test * Only use fixture via arguments Fixes pytest 4.0 compatibility 2018-11-27 03:09:36 +03:00			`@pytest.mark.parametrize("text", ["auf'm", "du's", "über'm", "wir's"])`
Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00			`def test_de_tokenizer_splits_contractions(de_tokenizer, text):`
Add tokenizer tests for German 2017-01-05 20:11:25 +03:00			`tokens = de_tokenizer(text)`
			`assert len(tokens) == 2`


💫 Tidy up and auto-format tests (#2967) * Auto-format tests with black * Add flake8 config * Tidy up and remove unused imports * Fix redefinitions of test functions * Replace orths_and_spaces with words and spaces * Fix compatibility with pytest 4.0 * xfail test for now Test was previously overwritten by following test due to naming conflict, so failure wasn't reported * Unfail passing test * Only use fixture via arguments Fixes pytest 4.0 compatibility 2018-11-27 03:09:36 +03:00			`@pytest.mark.parametrize("text", ["z.B.", "d.h.", "Jan.", "Dez.", "Chr."])`
Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00			`def test_de_tokenizer_handles_abbr(de_tokenizer, text):`
Add tokenizer tests for German 2017-01-05 20:11:25 +03:00			`tokens = de_tokenizer(text)`
			`assert len(tokens) == 1`


Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00			`def test_de_tokenizer_handles_exc_in_text(de_tokenizer):`
Add tokenizer tests for German 2017-01-05 20:11:25 +03:00			`text = "Ich bin z.Zt. im Urlaub."`
			`tokens = de_tokenizer(text)`
			`assert len(tokens) == 6`
			`assert tokens[2].text == "z.Zt."`
			`assert tokens[2].lemma_ == "zur Zeit"`
Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00

💫 Tidy up and auto-format tests (#2967) * Auto-format tests with black * Add flake8 config * Tidy up and remove unused imports * Fix redefinitions of test functions * Replace orths_and_spaces with words and spaces * Fix compatibility with pytest 4.0 * xfail test for now Test was previously overwritten by following test due to naming conflict, so failure wasn't reported * Unfail passing test * Only use fixture via arguments Fixes pytest 4.0 compatibility 2018-11-27 03:09:36 +03:00			`@pytest.mark.parametrize(`
			`"text,norms", [("vor'm", ["vor", "dem"]), ("du's", ["du", "es"])]`
			`)`
Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00			`def test_de_tokenizer_norm_exceptions(de_tokenizer, text, norms):`
			`tokens = de_tokenizer(text)`
			`assert [token.norm_ for token in tokens] == norms`


Tidy up and auto-format 2019-08-20 18:36:34 +03:00			`@pytest.mark.parametrize("text,norm", [("daß", "dass")])`
Update German tokenizer exceptions and tests 2017-06-03 22:07:44 +03:00			`def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm):`
			`tokens = de_tokenizer(text)`
			`assert tokens[0].norm_ == norm`