mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-30 20:06:30 +03:00
251119455d
Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations. Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data. See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831 |
||
---|---|---|
.. | ||
__init__.py | ||
examples.py | ||
punctuation.py | ||
stop_words.py | ||
syntax_iterators.py | ||
tokenizer_exceptions.py |