Commit Graph

373 Commits

Author SHA1 Message Date
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
svlandeg
44e14ccae8 one more losses fix 2020-10-14 15:11:34 +02:00
svlandeg
0aa8851878 always return losses 2020-10-14 15:00:49 +02:00
svlandeg
68d79796c6 add test for vocab after serializing KB 2020-10-10 20:59:48 +02:00
Ines Montani
bfa3931c9d
Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
Adriane Boyd
39aabf50ab Also rename to include_static_vectors in CharEmbed 2020-10-09 11:54:48 +02:00
Sofie Van Landeghem
d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani
064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
svlandeg
eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
Ines Montani
010956d493 Clear rule-based components on initialize 2020-10-08 09:51:31 +02:00
svlandeg
33c2d4af16 move kb_loader to initialize for NEL instead of constructor 2020-10-07 14:56:00 +02:00
svlandeg
ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
svlandeg
193e0d5a98 add docs for entity_ruler.initialize 2020-10-05 18:04:08 +02:00
svlandeg
9eb813a35d Merge remote-tracking branch 'upstream/develop' into fix/patterns-init 2020-10-05 17:49:44 +02:00
svlandeg
4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
svlandeg
65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
svlandeg
251b3eb4e5 add initialize method for entity_ruler 2020-10-05 14:59:13 +02:00
Sofie Van Landeghem
f4f49f5877
update blis (#6198)
* allow higher blis version

* fix typo

* bump to 3.0.0a34

* fix pins in other files
2020-10-05 14:58:56 +02:00
Ines Montani
11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
Matthew Honnibal
96b636c2d3 Update attribute ruler 2020-10-04 13:08:21 +02:00
Ines Montani
bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani
d3b3663942 Adjust error message and add test 2020-10-04 10:11:27 +02:00
Ines Montani
cc08c88a89
Merge pull request #6187 from svlandeg/fix/begin_training_pipe 2020-10-04 10:01:02 +02:00
svlandeg
3f657ed3a1 implement warning in __init_subclass__ instead 2020-10-03 22:34:10 +02:00
Matthew Honnibal
3b2a78720c Upd morphologizer 2020-10-03 19:35:19 +02:00
Matthew Honnibal
4fccd2ceaf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-03 19:13:55 +02:00
Matthew Honnibal
8ea8b7d940 Support loading labels in morphologizer 2020-10-03 19:13:42 +02:00
Ines Montani
80603f0fa5 Make SentenceRecognizer.label_data return None
Overwrite the method from the base class (Tagger) but don't export anything in "init labels"
2020-10-03 18:54:09 +02:00
Ines Montani
3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani
f0b30aedad
Make lemmatizers use initialize logic (#6182)
* Make lemmatizer use initialize logic and tidy up

* Fix typo

* Raise for uninitialized tables
2020-10-02 15:42:36 +02:00
Adriane Boyd
86c3ec9c2b
Refactor Token morph setting (#6175)
* Refactor Token morph setting

* Remove `Token.morph_`
* Add `Token.set_morph()`
  * `0` resets `token.c.morph` to unset
  * Any other values are passed to `Morphology.add`

* Add token.morph setter to set from MorphAnalysis
2020-10-01 22:21:46 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani
b799af16de Don't raise in Pipe.initialize if not implemented 2020-09-30 00:05:27 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Matthew Honnibal
a4da3120b4 Fix multitasks 2020-09-29 18:33:16 +02:00
Matthew Honnibal
0b5c72fce2 Fix incorrect docstrings 2020-09-29 18:30:38 +02:00
Matthew Honnibal
e4f535a964 Fix Pipe.labels 2020-09-29 16:55:07 +02:00
Matthew Honnibal
1fd002180e Allow more components to use labels 2020-09-29 16:48:56 +02:00
Matthew Honnibal
99bff78617 Use labels in tagger 2020-09-29 16:48:44 +02:00
Matthew Honnibal
58c8d4b414 Add label_data property to pipeline 2020-09-29 16:22:13 +02:00
Ines Montani
f171903139 Clean up sgd and pipeline -> nlp 2020-09-29 12:20:26 +02:00
Ines Montani
42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Matthew Honnibal
9c8b2524fe Upd initialize args 2020-09-29 12:08:37 +02:00
Matthew Honnibal
f2d1b7feb5 Clean up sgd 2020-09-29 12:00:08 +02:00
Ines Montani
dec984a9c1 Update Language.initialize and support components/tokenizer settings 2020-09-29 11:52:45 +02:00
Matthew Honnibal
b3b6868639 Remove 'sgd' arg from component initialize 2020-09-29 11:42:35 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Adriane Boyd
6c25e60089 Simplify string match IDs for AttributeRuler 2020-09-26 11:12:39 +02:00
Matthew Honnibal
702edf52a0 Fix attributeruler 2020-09-26 00:30:48 +02:00