spaCy/spacy/lang/es
mgr 2a2654c756 Remove significant or not very frequent words from stop word list [es]
The list of stop words for Spanish contained many inadequate words, see:

https://github.com/explosion/spaCy/issues/3052#issuecomment-1100760100

Removed words:
- verb forms of 'trabajar' (work) and intentar (try)
- words related to 'empleo' (employment)
- incorrect words: ampleamos, arribaabajo, soyos, paìs
- miscellaneous words due to being too significant of too infrequent:
  actualmente, aproximadamente, antaño, cosas, ejemplo, horas, general,
  pais, principalmente, raras

Added other stop words for completion:
- Spanish one-letter words
- numbers up to twelve

Some reformatting to 79 columns.

When in doubt, the English and German lists have been consulted as good
examples.
2022-04-18 22:04:02 +02:00
..
__init__.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
examples.py Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
lemmatizer.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
lex_attrs.py Update lex_attrs.py for Spanish with ordinals (#10038) 2022-01-20 15:44:13 +01:00
punctuation.py Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
stop_words.py Remove significant or not very frequent words from stop word list [es] 2022-04-18 22:04:02 +02:00
syntax_iterators.py Spanish noun chunks review (#9537) 2021-11-05 00:46:36 +01:00
tokenizer_exceptions.py Add full exceptions with spaces 2021-01-29 14:27:22 +01:00