Commit Graph

36 Commits

Author SHA1 Message Date
Paul O'Leary McCann
40bc01e668 Proactively remove unused listeners
With this the changes in initialize.py might be unecessary.

Requires testing.
2021-03-17 22:41:41 +09:00
Paul O'Leary McCann
ef77c88638 Don't warn about components not in the pipeline
See here:

https://github.com/explosion/spaCy/discussions/7463

Still need to check if there are any side effects of listeners being
present but not in the pipeline, but this commit will silence the
warnings.
2021-03-17 14:56:04 +09:00
Sofie Van Landeghem
cd70c3cb79
Fixing pretrain (#7342)
* initialize NLP with train corpus

* add more pretraining tests

* more tests

* function to fetch tok2vec layer for pretraining

* clarify parameter name

* test different objectives

* formatting

* fix check for static vectors when using vectors objective

* clarify docs

* logger statement

* fix init_tok2vec and proc.initialize order

* test training after pretraining

* add init_config tests for pretraining

* pop pretraining block to avoid config validation errors

* custom errors
2021-03-09 14:01:13 +11:00
Sofie Van Landeghem
6ed423c16c
reduce memory load when reading all vectors from file (#6945)
* reduce memory load when reading all vectors from file

* one more small typo fix
2021-02-07 08:05:43 +08:00
Sofie Van Landeghem
f638306598
remove link_components flag again (#6883) 2021-02-02 10:08:40 +08:00
Sofie Van Landeghem
acabb284dd
Fix linking resumed components (#6859)
* link components across enabled, resumed and frozen

* revert renaming

* revert renaming, the sequel
2021-02-01 22:19:58 +11:00
Ines Montani
325f47500d Move replacement logic to Language.from_config 2021-01-29 19:37:04 +11:00
Ines Montani
911dfcccfc Add option to replace listeners for sourced components 2021-01-29 15:57:04 +11:00
Ines Montani
c0926c9088
WIP: Various small training changes (#6818)
* Allow output_path to be None during training

* Fix cat scoring (?)

* Improve error message for weighted None score

* Improve messages

So we can call this in other places etc.

* FIx output path check

* Use latest wasabi

* Revert "Improve error message for weighted None score"

This reverts commit 7059926763.

* Exclude None scores from final score by default

It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc.

* Update warnings and use logger.warning
2021-01-26 14:51:52 +11:00
Sofie Van Landeghem
57640aa838
warn when frozen components break listener pattern (#6766)
* warn when frozen components break listener pattern

* few notes in the documentation

* update arg name

* formatting

* cleanup

* specify listeners return type
2021-01-20 11:12:35 +11:00
Adriane Boyd
681a6195f7 Validate seed and gpu_allocator manually 2021-01-14 16:57:57 +01:00
Adriane Boyd
5fb8b7037a Expand initialize/training config validation
Validate both `[initialize]` and `[training]` in `debug data` and
`nlp.initialize()` with separate config validation error blocks that
indicate which block of the config is being validated.
2021-01-12 17:17:00 +01:00
Ines Montani
991669c934 Tidy up and auto-format 2021-01-05 13:41:53 +11:00
Sofie Van Landeghem
de108ed3e8
Add specific error when StaticVectors can't read the vectors data (#6450) 2020-12-09 06:16:07 +08:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
svlandeg
251b3eb4e5 add initialize method for entity_ruler 2020-10-05 14:59:13 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani
44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Ines Montani
ad6d40d028 Add logging 2020-09-29 22:53:14 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani
2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani
fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani
978ab54a84 Fix logging 2020-09-29 16:22:41 +02:00
Ines Montani
aa2a6882d0 Fix logging 2020-09-29 16:08:39 +02:00
Ines Montani
63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00
Ines Montani
612bbf85ab Update initialize.py 2020-09-29 12:14:47 +02:00
Ines Montani
42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Ines Montani
78396d137f Integrate initialize settings 2020-09-29 11:57:08 +02:00
Ines Montani
4925ad760a Add init vectors 2020-09-29 10:58:50 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani
046f655d86 Fix error 2020-09-28 21:17:45 +02:00
Ines Montani
a139fe672b Fix typos and refactor CLI logging 2020-09-28 21:17:10 +02:00
Ines Montani
822ea4ef61 Refactor CLI 2020-09-28 15:09:59 +02:00
Ines Montani
d5155376fd Update vocab init 2020-09-28 11:30:18 +02:00
Matthew Honnibal
13b1605ee6 Add init script 2020-09-28 01:08:49 +02:00