mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 10:16:27 +03:00
2a2654c756
The list of stop words for Spanish contained many inadequate words, see: https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100 Removed words: - verb forms of 'trabajar' (work) and intentar (try) - words related to 'empleo' (employment) - incorrect words: ampleamos, arribaabajo, soyos, paìs - miscellaneous words due to being too significant of too infrequent: actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general, pais, principalmente, raras Added other stop words for completion: - Spanish one-letter words - numbers up to twelve Some reformatting to 79 columns. When in doubt, the English and German lists have been consulted as good examples. |
||
---|---|---|
.. | ||
__init__.py | ||
examples.py | ||
lemmatizer.py | ||
lex_attrs.py | ||
punctuation.py | ||
stop_words.py | ||
syntax_iterators.py | ||
tokenizer_exceptions.py |