spaCy/spacy/tests/lang/ko/test_lemmatization.py

# coding: utf-8
from __future__ import unicode_literals

import pytest


@pytest.mark.parametrize(
    "word,lemma",
    [("새로운", "새롭"), ("빨간", "빨갛"), ("클수록", "크"), ("뭡니까", "뭣"), ("됐다", "되")],
)
def test_ko_lemmatizer_assigns(ko_tokenizer, word, lemma):
    test_lemma = ko_tokenizer(word)[0].lemma_
    assert test_lemma == lemma
Korean support (#3901) * start lang/ko * add test codes * using natto-py * add test_ko_tokenizer_full_tags() * spaCy contributor agreement * external dependency for ko * collections.namedtuple for python version < 3.5 * case fix * tuple unpacking * add jongseong(final consonant) * apply mecab option * Remove Pipfile for now Co-authored-by: Ines Montani <ines@ines.io> 2019-07-09 23:23:16 +03:00			`# coding: utf-8`
			`from __future__ import unicode_literals`

			`import pytest`


			`@pytest.mark.parametrize(`
			`"word,lemma",`
			`[("새로운", "새롭"), ("빨간", "빨갛"), ("클수록", "크"), ("뭡니까", "뭣"), ("됐다", "되")],`
			`)`
			`def test_ko_lemmatizer_assigns(ko_tokenizer, word, lemma):`
			`test_lemma = ko_tokenizer(word)[0].lemma_`
			`assert test_lemma == lemma`