Commit Graph

545 Commits

Author SHA1 Message Date
Ines Montani
539b0c10da Tidy up and auto-format 2020-10-10 19:14:48 +02:00
svlandeg
8316bc7d4a bugfix DisabledPipes 2020-10-09 12:06:20 +02:00
Sofie Van Landeghem
d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
svlandeg
3e2e1fd323 cleanup 2020-10-08 10:37:32 +02:00
svlandeg
eaf5c265cb set_kb method for entity_linker 2020-10-08 10:34:01 +02:00
svlandeg
6b8bdb2d39 add init_config to nlp.create_pipe 2020-10-07 14:58:16 +02:00
svlandeg
ff9ac39c88 read entity_ruler patterns with srsly.read_jsonl.v1 2020-10-05 22:50:14 +02:00
svlandeg
4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
svlandeg
65abd77779 add finish_update to Pipe 2020-10-05 16:23:33 +02:00
Ines Montani
8f018e47f8 Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe 2020-10-04 14:43:45 +02:00
Ines Montani
ae15c9de79 Raise error from caught KeyError to preserve traceback 2020-10-03 11:43:56 +02:00
svlandeg
02247cccaf Merge remote-tracking branch 'upstream/develop' into feature/small-fixes 2020-10-02 20:48:11 +02:00
svlandeg
6787e56315 print debugging warning before raising error if model not properly initialized 2020-10-01 09:21:00 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani
798040bc1d Fix language detection 2020-09-29 21:08:13 +02:00
Matthew Honnibal
8ce9f44433 Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare 2020-09-29 16:57:38 +02:00
Matthew Honnibal
ca72608059 Fix language 2020-09-29 16:48:33 +02:00
Ines Montani
fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani
63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00
Ines Montani
adca08a12f Pass nlp forward 2020-09-29 12:21:52 +02:00
Ines Montani
42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Matthew Honnibal
9c8b2524fe Upd initialize args 2020-09-29 12:08:37 +02:00
Matthew Honnibal
f2d1b7feb5 Clean up sgd 2020-09-29 12:00:08 +02:00
Ines Montani
78396d137f Integrate initialize settings 2020-09-29 11:57:08 +02:00
Ines Montani
dec984a9c1 Update Language.initialize and support components/tokenizer settings 2020-09-29 11:52:45 +02:00
Matthew Honnibal
5276db6f3f Remove 'device' argument from Language, clean up 'sgd' arg 2020-09-29 11:42:19 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani
658fad428a Fix base schema integration 2020-09-27 22:50:36 +02:00
Ines Montani
7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Ines Montani
4bbe41f017 Fix combined scores and update test 2020-09-24 10:42:47 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani
1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Adriane Boyd
47080fba98 Minor renaming / refactoring
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd
eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Matthew Honnibal
c776594ab1 Fix 2020-09-16 18:15:14 +02:00
Matthew Honnibal
4a573d18b3 Add comment 2020-09-16 17:51:29 +02:00
Matthew Honnibal
d31afc8334 Fix Language.link_components when model is None 2020-09-16 17:49:48 +02:00
Ines Montani
aaf01689a1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-15 14:24:42 +02:00
Ines Montani
91a6637f74 Remove extra pipe config values before merging 2020-09-15 14:24:17 +02:00
Ines Montani
d3d7f92f05 Fix lang check and error handling in Language.from_config 2020-09-15 14:24:06 +02:00
Ines Montani
253ba5ef14 Raise for bad Vocab values 2020-09-15 13:25:34 +02:00
Ines Montani
7dfc4bc062 Allow overriding meta from spacy.blank 2020-09-15 11:12:12 +02:00
Matthew Honnibal
b693d2d224 Fix speed report in table 2020-09-13 17:39:31 +02:00
Ines Montani
febb99916d Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
Sofie Van Landeghem
e92e850c72
Raise if empty examples (#6052)
* raise error if no valid Example objects were found during initialization

* fix max_length parameter

* remove commit from other branch

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-12 21:01:53 +02:00
svlandeg
711166a75a prevent overwriting score_weights 2020-09-11 15:12:05 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Sofie Van Landeghem
60f22e1800
Pipe API (#6034)
* ensure Language passes on valid examples for initialization

* fix tagger model initialization

* check for valid get_examples across components

* assume labels were added before begin_training

* fix senter initialization

* fix morphologizer initialization

* use methods to check arguments

* test textcat init, requires thinc>=8.0.0a31

* fix tok2vec init

* fix entity linker init

* use islice

* fix simple NER

* cleanup debug model

* fix assert statements

* fix tests

* throw error when adding a label if the output layer can't be resized anymore

* fix test

* add failing test for simple_ner

* UX improvements

* morphologizer UX

* assume begin_training gets a representative set and processes the labels

* remove assumptions for output of untrained NER model

* restore test for original purpose
2020-09-08 22:44:25 +02:00
Ines Montani
f06eed800e
Merge pull request #6029 from explosion/master-tmp 2020-09-04 15:11:55 +02:00
Ines Montani
f9550b4493 Fix components in meta.json and website [ci skip] 2020-09-04 14:42:12 +02:00