spaCy/spacy/tests/lang/sl/test_text.py
Richard Hudson 7b134b8fbd
New tests for a number of alpha languages (#9703)
* Added Slovak

* Added Slovenian tests

* Added Estonian tests

* Added Croatian tests

* Added Latvian tests

* Added Icelandic tests

* Added Afrikaans tests

* Added language-independent tests

* Added Kannada tests

* Tidied up

* Added Albanian tests

* Formatted with black

* Added failing tests for anomalies

* Update spacy/tests/lang/af/test_text.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Estonian tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Croatian tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Icelandic tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Latvian tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Slovak tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Added context to failing Slovenian tokenizer test

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-11-28 21:59:23 +01:00

28 lines
1012 B
Python

import pytest
def test_long_text(sl_tokenizer):
# Excerpt: European Convention on Human Rights
text = """
upoštevajoč, da si ta deklaracija prizadeva zagotoviti splošno in
učinkovito priznavanje in spoštovanje v njej razglašenih pravic,
upoštevajoč, da je cilj Sveta Evrope doseči večjo enotnost med
njegovimi članicami, in da je eden izmed načinov za zagotavljanje
tega cilja varstvo in nadaljnji razvoj človekovih pravic in temeljnih
svoboščin,
ponovno potrjujoč svojo globoko vero v temeljne svoboščine, na
katerih temeljita pravičnost in mir v svetu, in ki jih je mogoče najbolje
zavarovati na eni strani z dejansko politično demokracijo in na drugi
strani s skupnim razumevanjem in spoštovanjem človekovih pravic,
od katerih so te svoboščine odvisne,
"""
tokens = sl_tokenizer(text)
assert len(tokens) == 116
@pytest.mark.xfail
def test_ordinal_number(sl_tokenizer):
text = "10. decembra 1948"
tokens = sl_tokenizer(text)
assert len(tokens) == 3