spaCy/spacy/tests/lang/sv/test_lemmatizer.py

# coding: utf-8
from __future__ import unicode_literals

import pytest


@pytest.mark.parametrize(
    "string,lemma",
    [
        ("DNA-profilernas", "DNA-profil"),
        ("Elfenbenskustens", "Elfenbenskusten"),
        ("abortmotståndarens", "abortmotståndare"),
        ("kolesterols", "kolesterol"),
        ("portionssnusernas", "portionssnus"),
        ("åsyns", "åsyn"),
    ],
)
def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):
    tokens = sv_tokenizer(string)
    assert tokens[0].lemma_ == lemma
Updates to Swedish Language (#3164) * Added the same punctuation rules as danish language. * Added abbreviations and also the possibility to have capitalized abbreviations on some. Added a few specific cases too * Added test for long texts in swedish * Added morph rules, infixes and suffixes to __init__.py for swedish * Added some tests for prefixes, infixes and suffixes * Added tests for lemma * Renamed files to follow convention * [sv] Removed ambigious abbreviations * Added more tests for tokenizer exceptions * Added test for problem with punctuation in issue #2578 * Contributor agreement * Removed faulty lemmatization of 'jag' ('I') as it was lemmatized to 'jaga' ('hunt') 2019-01-16 15:45:50 +03:00			`# coding: utf-8`
			`from __future__ import unicode_literals`

			`import pytest`


Tidy up and fix small bugs and typos 2019-02-08 16:14:49 +03:00			`@pytest.mark.parametrize(`
			`"string,lemma",`
			`[`
			`("DNA-profilernas", "DNA-profil"),`
			`("Elfenbenskustens", "Elfenbenskusten"),`
			`("abortmotståndarens", "abortmotståndare"),`
			`("kolesterols", "kolesterol"),`
			`("portionssnusernas", "portionssnus"),`
			`("åsyns", "åsyn"),`
			`],`
			`)`
Updates to Swedish Language (#3164) * Added the same punctuation rules as danish language. * Added abbreviations and also the possibility to have capitalized abbreviations on some. Added a few specific cases too * Added test for long texts in swedish * Added morph rules, infixes and suffixes to __init__.py for swedish * Added some tests for prefixes, infixes and suffixes * Added tests for lemma * Renamed files to follow convention * [sv] Removed ambigious abbreviations * Added more tests for tokenizer exceptions * Added test for problem with punctuation in issue #2578 * Contributor agreement * Removed faulty lemmatization of 'jag' ('I') as it was lemmatized to 'jaga' ('hunt') 2019-01-16 15:45:50 +03:00			`def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):`
			`tokens = sv_tokenizer(string)`
			`assert tokens[0].lemma_ == lemma`