spaCy/spacy/lang/nb
Haakon Meland Eriksen 251119455d
Remove NER words from stop words in Norwegian (#9820)
Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations.

Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data.

See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831
2021-12-07 09:45:10 +01:00
..
__init__.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
examples.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
punctuation.py Add / to nb infixes (#7991) 2021-05-04 11:00:10 +02:00
stop_words.py Remove NER words from stop words in Norwegian (#9820) 2021-12-07 09:45:10 +01:00
syntax_iterators.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
tokenizer_exceptions.py Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00