Sofie Van Landeghem
75a202ce65
TextCat updates and fixes ( #6263 )
...
* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos
2020-10-18 14:50:41 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging ( #6184 )
...
* Make logging and progress easier to control
* Update docs
* Cleanup errors
* Fix ConfigValidationError
* Pass stdout/stderr, not wasabi.Printer
* Fix type
* Upd logging example
* Fix logger example
* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani
44160cd52f
Tidy up [ci skip]
2020-10-01 10:41:19 +02:00
Ines Montani
a5debb356d
Tidy up and adjust logging [ci skip]
2020-09-30 01:22:08 +02:00
Ines Montani
56a2f778c4
Add logging [ci skip]
2020-09-30 01:08:55 +02:00
Ines Montani
0a1ee109db
Remove init form path
2020-09-29 22:53:18 +02:00
Ines Montani
0250bcf6a3
Show validation error during init
2020-09-29 22:29:09 +02:00
Matthew Honnibal
e70a00fa76
Remove unnecessary warning from train
2020-09-29 16:47:54 +02:00
Matthew Honnibal
3f0d61232d
Remove outdated arg from train
2020-09-29 16:47:44 +02:00
Ines Montani
a139fe672b
Fix typos and refactor CLI logging
2020-09-28 21:17:10 +02:00
Ines Montani
822ea4ef61
Refactor CLI
2020-09-28 15:09:59 +02:00
Ines Montani
c22ecc66bb
Don't support init path for now
2020-09-28 12:46:28 +02:00
Ines Montani
a5f2cc0509
Tidy up and remove raw text (rehearsal) for now
2020-09-28 12:30:13 +02:00
Ines Montani
e44a7519cd
Update CLI and add [initialize] block
2020-09-28 11:56:14 +02:00
Ines Montani
2fdb7285a0
Update CLI
2020-09-28 11:06:07 +02:00
Ines Montani
553bfea641
Fix commands
2020-09-28 10:53:17 +02:00
Matthew Honnibal
b886f53c31
init-pipeline runs (maybe doesnt work)
2020-09-28 03:42:47 +02:00
Matthew Honnibal
ed2aff2db3
Remove unused train code
2020-09-28 03:12:31 +02:00
Matthew Honnibal
3a0a3b8db6
Dont hard-code for 'corpora' name
2020-09-28 03:06:33 +02:00
Matthew Honnibal
a3e1791c9c
Upd train
2020-09-28 01:08:30 +02:00
Matthew Honnibal
b5556093e2
Start updating train script
2020-09-27 23:59:44 +02:00
Ines Montani
7e938ed63e
Update config resolution to use new Thinc
2020-09-27 22:21:31 +02:00
Matthew Honnibal
39b178999c
Tmp notes
2020-09-27 20:13:38 +02:00
Ines Montani
b4486d747d
Merge branch 'develop' into fix/train-config-interpolation
2020-09-26 15:32:14 +02:00
Ines Montani
b2d07de786
Construct nlp from uninterpolated config before training
2020-09-26 15:16:59 +02:00
Ines Montani
ca3c997062
Improve CLI config validation with latest Thinc
2020-09-26 13:13:57 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script ( #6143 )
...
* update pretraining API in train CLI
* bump thinc to 8.0.0a35
* bump to 3.0.0a26
* doc fixes
* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani
be56c0994b
Add [training.before_to_disk] callback
2020-09-24 12:40:25 +02:00
Ines Montani
f69fea8b25
Improve error handling around non-number scores
2020-09-24 11:29:07 +02:00
Ines Montani
ae51f580c1
Fix handling of score_weights
2020-09-24 10:27:33 +02:00
svlandeg
6435458d51
simplify expression
2020-09-23 12:12:38 +02:00
svlandeg
20b0ec5dcf
avoid logging performance of frozen components
2020-09-23 10:37:12 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2
2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator ( #6091 )
...
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string
2020-09-19 01:17:02 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
svlandeg
427dbecdd6
cleanup and formatting
2020-09-17 11:48:04 +02:00
svlandeg
0c35885751
generalize corpora, dot notation for dev and train corpus
2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47
rewrite train_corpus to corpus.train in config
2020-09-15 21:58:04 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat ( #6062 )
...
* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit
2020-09-14 17:08:00 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter ( #6042 )
...
* version bump to 3.0.0a16
* rename "gold" folder to "training"
* rename 'annotation_setter' to 'set_extra_annotations'
* formatting
2020-09-09 10:31:03 +02:00
Matthew Honnibal
ba5f4c9b32
Add words and seconds to train info
2020-09-08 15:24:47 +02:00
Ines Montani
ab1bb421ed
Update docs links in codebase
2020-09-04 12:58:50 +02:00
Ines Montani
b5a0657fd6
"model" terminology consistency in docs
2020-09-03 13:13:03 +02:00
Matthew Honnibal
122cb02001
Fix averages
2020-09-02 19:37:43 +02:00
Matthew Honnibal
ec660e3131
Fix use_pytorch_for_gpu_memory
2020-09-01 00:41:38 +02:00
Matthw Honnibal
c38298b8fa
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-08-31 19:55:55 +02:00
Matthw Honnibal
fe298fa50a
Shuffle on first epoch of train
2020-08-31 19:55:22 +02:00
svlandeg
13ee742fb4
example of custom logger
2020-08-31 14:24:41 +02:00
svlandeg
5230529de2
add loggers registry & logger docs sections
2020-08-28 21:44:04 +02:00
Ines Montani
a5fff1df51
Remove outdated non-empty output dir warning [ci skip]
2020-08-26 15:45:51 +02:00