spaCy/spacy
Haakon Meland Eriksen 251119455d
Remove NER words from stop words in Norwegian (#9820)
Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations.

Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data.

See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831
2021-12-07 09:45:10 +01:00
..
cli Fix Language-specific factory handling in package command (#9674) 2021-11-29 08:31:02 +01:00
displacy Displacy serve entity linking support without manual=True support. (#9748) 2021-11-29 17:13:26 +01:00
lang Remove NER words from stop words in Norwegian (#9820) 2021-12-07 09:45:10 +01:00
matcher Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
ml Fix spancat for empty docs and zero suggestions (#9654) 2021-11-15 12:40:55 +01:00
pipeline morphologizer: avoid recreating label tuple for each token (#9764) 2021-11-30 11:58:59 +01:00
tests Merge pull request #9777 from explosion/master 2021-11-30 14:01:23 +01:00
tokens Auto-format code with black (#9631) 2021-11-05 09:58:36 +01:00
training Exclude strings from v3.2+ source vector checks (#9697) 2021-11-19 08:51:19 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.2.0 (#9565) 2021-10-29 15:22:40 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
errors.py EntityRuler improve disk load error message (#9658) 2021-11-23 16:26:05 +01:00
glossary.py Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
language.py Exclude strings from v3.2+ source vector checks (#9697) 2021-11-19 08:51:19 +01:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
lexeme.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
lookups.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Allow Matcher to match on ENT_ID and ENT_KB_ID (#9688) 2021-11-24 10:37:10 +01:00
scorer.py Allow Scorer.score_spans to handle pred docs with missing annotation (#9701) 2021-11-23 15:17:19 +01:00
strings.pxd Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
strings.pyi 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
strings.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Remove two attributes marked for removal in 3.1 (#9150) 2021-09-15 23:07:21 +02:00
tokenizer.pyx Ignore prefix in suffix matches (#9155) 2021-10-27 13:02:25 +02:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Format (#9630) 2021-11-05 09:56:26 +01:00
vectors.pyx Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pxd Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pyi Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
vocab.pyx Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00