Commit Graph

13450 Commits

Author SHA1 Message Date
Matthew Honnibal
16475528f7
Fix skipped documents in entity scorer (#6137)
* Fix skipped documents in entity scorer

* Add back the skipping of unannotated entities

* Update spacy/scorer.py

* Use more specific NER scorer

* Fix import

* Fix get_ner_prf

* Add scorer

* Fix scorer

Co-authored-by: Ines Montani <ines@ines.io>
2020-09-24 20:38:57 +02:00
Matthew Honnibal
2abb4ba9db
Make a pre-check to speed up alignment cache (#6139)
* Dirty trick to fast-track alignment cache

* Improve alignment cache check

* Fix header

* Fix align cache

* Fix align logic
2020-09-24 18:13:39 +02:00
Ines Montani
26e28ed413 Fix combined scores if multiple components report it 2020-09-24 17:11:13 +02:00
Ines Montani
0b52b6904c Update entity_linker.py 2020-09-24 17:10:35 +02:00
Ines Montani
20b89a9717 Increment version [ci skip] 2020-09-24 16:57:02 +02:00
Adriane Boyd
3c062b3911
Add MORPH handling to Matcher (#6107)
* Add MORPH handling to Matcher

* Add `MORPH` to `Matcher` schema
* Rename `_SetMemberPredicate` to `_SetPredicate`
* Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate`
  * Add special handling for normalization and conversion of morph
    values into sets
  * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only
    matches for 0 or 1 values

* Update test

* Rename to IS_SUBSET and IS_SUPERSET
2020-09-24 16:55:09 +02:00
Adriane Boyd
59340606b7
Add option to disable Matcher errors (#6125)
* Add option to disable Matcher errors

* Add option to disable Matcher errors when a doc doesn't contain a
particular type of annotation

Minor additional change:

* Update `AttributeRuler.load_from_morph_rules` to allow direct `MORPH`
values

* Rename suppress_errors to allow_missing

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* Refactor annotation checks in Matcher and PhraseMatcher

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-24 16:54:39 +02:00
Sofie Van Landeghem
c7eedd3534
updates to NEL functionality (#6132)
* NEL: read sentences and ents from reference

* fiddling with sent_start annotations

* add KB serialization test

* KB write additional file with strings.json

* score_links function to calculate NEL P/R/F

* formatting

* documentation
2020-09-24 16:53:59 +02:00
Ines Montani
d0ef4a4cf5 Prevent division by zero in score weights 2020-09-24 16:42:13 +02:00
Matthew Honnibal
74ee456374 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-24 16:11:47 +02:00
Matthew Honnibal
0bc214c102 Fix pull 2020-09-24 16:11:33 +02:00
Ines Montani
6bc5058d13 Update models directory [ci skip] 2020-09-24 14:53:34 +02:00
Ines Montani
3f751e68f5 Increment version [ci skip] 2020-09-24 14:45:41 +02:00
Ines Montani
58dde293ce
Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2 2020-09-24 14:44:42 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk 2020-09-24 14:44:11 +02:00
Ines Montani
24e7ac3f2b Fix download CLI [ci skip] 2020-09-24 14:43:56 +02:00
Ines Montani
3b58a8be2b Update docs 2020-09-24 14:32:42 +02:00
Ines Montani
88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani
92f8b6959a Fix typo 2020-09-24 13:48:41 +02:00
Ines Montani
b92c8aae78 Merge branch 'develop' into pr/6135 2020-09-24 13:44:56 +02:00
Adriane Boyd
5c13e0cf1b Remove unused error 2020-09-24 13:41:55 +02:00
Ines Montani
6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
walterhenry
3dd5f409ec Proofreading
Proofread some API docs
2020-09-24 13:15:28 +02:00
Adriane Boyd
1c63f02f99 Add API docs 2020-09-24 12:51:16 +02:00
Ines Montani
138c8d45db Update docs 2020-09-24 12:43:39 +02:00
Ines Montani
be56c0994b Add [training.before_to_disk] callback 2020-09-24 12:40:25 +02:00
Ines Montani
d7ab6a2ffe Update docs [ci skip] 2020-09-24 12:37:21 +02:00
Adriane Boyd
8eaacaae97 Refactor Doc.ents setter to use Doc.set_ents
Additional changes:

* Entity spans with missing labels are ignored
* Fix ent_kb_id setting in `Doc.set_ents`
2020-09-24 12:36:51 +02:00
Ines Montani
c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights 2020-09-24 12:00:57 +02:00
Ines Montani
f69fea8b25 Improve error handling around non-number scores 2020-09-24 11:29:07 +02:00
Ines Montani
4eb39b5c43 Fix logging 2020-09-24 11:04:35 +02:00
Ines Montani
4bbe41f017 Fix combined scores and update test 2020-09-24 10:42:47 +02:00
Sofie Van Landeghem
c645c4e7ce
fix micro PRF for textcat (#6130)
* fix micro PRF for textcat

* small fix
2020-09-24 10:31:17 +02:00
Matthew Honnibal
17a6b0a173
Make project pull order insensitive (#6131) 2020-09-24 10:30:42 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani
e2ffe51fb5 Update docs [ci skip] 2020-09-24 10:13:41 +02:00
Ines Montani
02008e9a55 Update docs [ci skip] 2020-09-23 22:02:31 +02:00
Ines Montani
c8bda92243 Update benchmarks [ci skip] 2020-09-23 20:05:02 +02:00
Ines Montani
f25f05c503 Adjust sort order [ci skip] 2020-09-23 20:03:04 +02:00
Ines Montani
3f77eb749c Increment version [ci skip] 2020-09-23 19:50:15 +02:00
Ines Montani
cea9431a04
Merge pull request #6128 from svlandeg/fix/nr_features 2020-09-23 19:38:19 +02:00
svlandeg
b816ace4bb format 2020-09-23 17:33:13 +02:00
svlandeg
5a9fdbc8ad state_type as Literal 2020-09-23 17:32:14 +02:00
svlandeg
35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
25b34bba94 throw custom error when state_type is invalid 2020-09-23 16:57:14 +02:00
Ines Montani
916050bf2f
Merge pull request #6127 from explosion/feature/literal-nr_feature_tokens 2020-09-23 16:56:08 +02:00
Ines Montani
3c3863654e Increment version [ci skip] 2020-09-23 16:54:43 +02:00
svlandeg
dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
Ines Montani
50a4425cda Adjust docs 2020-09-23 16:03:32 +02:00
Ines Montani
76bbed3466 Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00