Commit Graph

367 Commits

Author SHA1 Message Date
Sofie Van Landeghem
d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani
064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
svlandeg
eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
Ines Montani
010956d493 Clear rule-based components on initialize 2020-10-08 09:51:31 +02:00
svlandeg
33c2d4af16 move kb_loader to initialize for NEL instead of constructor 2020-10-07 14:56:00 +02:00
svlandeg
ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
svlandeg
193e0d5a98 add docs for entity_ruler.initialize 2020-10-05 18:04:08 +02:00
svlandeg
9eb813a35d Merge remote-tracking branch 'upstream/develop' into fix/patterns-init 2020-10-05 17:49:44 +02:00
svlandeg
4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
svlandeg
65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
svlandeg
251b3eb4e5 add initialize method for entity_ruler 2020-10-05 14:59:13 +02:00
Sofie Van Landeghem
f4f49f5877
update blis (#6198)
* allow higher blis version

* fix typo

* bump to 3.0.0a34

* fix pins in other files
2020-10-05 14:58:56 +02:00
Ines Montani
11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
Matthew Honnibal
96b636c2d3 Update attribute ruler 2020-10-04 13:08:21 +02:00
Ines Montani
bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani
d3b3663942 Adjust error message and add test 2020-10-04 10:11:27 +02:00
Ines Montani
cc08c88a89
Merge pull request #6187 from svlandeg/fix/begin_training_pipe 2020-10-04 10:01:02 +02:00
svlandeg
3f657ed3a1 implement warning in __init_subclass__ instead 2020-10-03 22:34:10 +02:00
Matthew Honnibal
3b2a78720c Upd morphologizer 2020-10-03 19:35:19 +02:00
Matthew Honnibal
4fccd2ceaf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-03 19:13:55 +02:00
Matthew Honnibal
8ea8b7d940 Support loading labels in morphologizer 2020-10-03 19:13:42 +02:00
Ines Montani
80603f0fa5 Make SentenceRecognizer.label_data return None
Overwrite the method from the base class (Tagger) but don't export anything in "init labels"
2020-10-03 18:54:09 +02:00
Ines Montani
3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani
f0b30aedad
Make lemmatizers use initialize logic (#6182)
* Make lemmatizer use initialize logic and tidy up

* Fix typo

* Raise for uninitialized tables
2020-10-02 15:42:36 +02:00
Adriane Boyd
86c3ec9c2b
Refactor Token morph setting (#6175)
* Refactor Token morph setting

* Remove `Token.morph_`
* Add `Token.set_morph()`
  * `0` resets `token.c.morph` to unset
  * Any other values are passed to `Morphology.add`

* Add token.morph setter to set from MorphAnalysis
2020-10-01 22:21:46 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani
b799af16de Don't raise in Pipe.initialize if not implemented 2020-09-30 00:05:27 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Matthew Honnibal
a4da3120b4 Fix multitasks 2020-09-29 18:33:16 +02:00
Matthew Honnibal
0b5c72fce2 Fix incorrect docstrings 2020-09-29 18:30:38 +02:00
Matthew Honnibal
e4f535a964 Fix Pipe.labels 2020-09-29 16:55:07 +02:00
Matthew Honnibal
1fd002180e Allow more components to use labels 2020-09-29 16:48:56 +02:00
Matthew Honnibal
99bff78617 Use labels in tagger 2020-09-29 16:48:44 +02:00
Matthew Honnibal
58c8d4b414 Add label_data property to pipeline 2020-09-29 16:22:13 +02:00
Ines Montani
f171903139 Clean up sgd and pipeline -> nlp 2020-09-29 12:20:26 +02:00
Ines Montani
42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Matthew Honnibal
9c8b2524fe Upd initialize args 2020-09-29 12:08:37 +02:00
Matthew Honnibal
f2d1b7feb5 Clean up sgd 2020-09-29 12:00:08 +02:00
Ines Montani
dec984a9c1 Update Language.initialize and support components/tokenizer settings 2020-09-29 11:52:45 +02:00
Matthew Honnibal
b3b6868639 Remove 'sgd' arg from component initialize 2020-09-29 11:42:35 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Adriane Boyd
6c25e60089 Simplify string match IDs for AttributeRuler 2020-09-26 11:12:39 +02:00
Matthew Honnibal
702edf52a0 Fix attributeruler 2020-09-26 00:30:48 +02:00
Matthew Honnibal
821f37254c Fix attributeruler 2020-09-26 00:19:53 +02:00
Matthew Honnibal
98327f66a9 Fix attributeruler key 2020-09-25 23:20:50 +02:00
Matthew Honnibal
16475528f7
Fix skipped documents in entity scorer (#6137)
* Fix skipped documents in entity scorer

* Add back the skipping of unannotated entities

* Update spacy/scorer.py

* Use more specific NER scorer

* Fix import

* Fix get_ner_prf

* Add scorer

* Fix scorer

Co-authored-by: Ines Montani <ines@ines.io>
2020-09-24 20:38:57 +02:00
Ines Montani
0b52b6904c Update entity_linker.py 2020-09-24 17:10:35 +02:00
Adriane Boyd
59340606b7
Add option to disable Matcher errors (#6125)
* Add option to disable Matcher errors

* Add option to disable Matcher errors when a doc doesn't contain a
particular type of annotation

Minor additional change:

* Update `AttributeRuler.load_from_morph_rules` to allow direct `MORPH`
values

* Rename suppress_errors to allow_missing

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* Refactor annotation checks in Matcher and PhraseMatcher

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-24 16:54:39 +02:00
Sofie Van Landeghem
c7eedd3534
updates to NEL functionality (#6132)
* NEL: read sentences and ents from reference

* fiddling with sent_start annotations

* add KB serialization test

* KB write additional file with strings.json

* score_links function to calculate NEL P/R/F

* formatting

* documentation
2020-09-24 16:53:59 +02:00