Ines Montani
d5155376fd
Update vocab init
2020-09-28 11:30:18 +02:00
Matthew Honnibal
65448b2e34
Remove schema=None until Optional
2020-09-28 03:42:58 +02:00
Matthew Honnibal
a023cf3ecc
Add (untested) resolve_dot_names util
2020-09-28 03:06:12 +02:00
Ines Montani
9016d23cc5
Fix exclude and add test
2020-09-27 23:34:03 +02:00
Ines Montani
7e938ed63e
Update config resolution to use new Thinc
2020-09-27 22:21:31 +02:00
Ines Montani
26e28ed413
Fix combined scores if multiple components report it
2020-09-24 17:11:13 +02:00
Ines Montani
d0ef4a4cf5
Prevent division by zero in score weights
2020-09-24 16:42:13 +02:00
Ines Montani
4bbe41f017
Fix combined scores and update test
2020-09-24 10:42:47 +02:00
Ines Montani
ae51f580c1
Fix handling of score_weights
2020-09-24 10:27:33 +02:00
Ines Montani
f25f05c503
Adjust sort order [ci skip]
2020-09-23 20:03:04 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
889128e5c5
Improve error handling in run_command
2020-09-20 16:20:57 +02:00
Adriane Boyd
47080fba98
Minor renaming / refactoring
...
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Ines Montani
416deb412f
Prevent duplicate traceback on CalledProcessError [ci skip]
2020-09-13 19:28:54 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Ines Montani
3e83a509bb
WIP: fix project clone compatibility
2020-09-10 15:49:13 +02:00
Matthew Honnibal
b470062153
Add CLI registry ( #6037 )
2020-09-08 15:23:34 +02:00
Ines Montani
5afe6447cd
registry.assets -> registry.misc
2020-09-03 17:31:14 +02:00
Ines Montani
45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components
2020-08-29 15:58:41 +02:00
Ines Montani
34146750d4
Use frozen list with custom errors
...
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani
5de3f8604d
Update spacy/util.py
...
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-29 13:17:06 +02:00
Ines Montani
cad988da7f
Allow component decorators to re-run with same function
2020-08-28 16:27:22 +02:00
Ines Montani
3ce5be4b76
Allow loaded but disabled components
2020-08-28 15:20:14 +02:00
Sofie Van Landeghem
79d460e3a2
Weights & Biases logger for train CLI ( #5971 )
...
* quick test as part of train script
* train_logger in config, default ConsoleLogger in loggers catalogue
* entitiy typo
* add wandb_logger
* cleanup
* Update spacy/cli/train_logger.py
Co-authored-by: Ines Montani <ines@ines.io>
* move loggers to gold.loggers
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Matthew Honnibal
77852d2428
Fix run_command for python 3.6
2020-08-26 05:02:43 +02:00
Matthew Honnibal
884cac5fb5
Make run_command backwards compatible
2020-08-26 04:33:42 +02:00
Matthew Honnibal
2771e4f2b3
Fix the git "sparse checkout" functionality ( #5973 )
...
* Fix the git sparse checkout functionality
* Format
2020-08-26 04:00:14 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage ( #5949 )
...
* Add utils for working with remote storage
* WIP add remote_cache for project
* WIP add push and pull commands
* Use pathy in remote_cache
* Updarte util
* Update remote_cache
* Update util
* Update project assets
* Update pull script
* Update push script
* Fix type annotation in util
* Work on remote storage
* Remove site and env hash
* Fix imports
* Fix type annotation
* Require pathy
* Require pathy
* Fix import
* Add a util to handle project variable substitution
* Import push and pull commands
* Fix pull command
* Fix push command
* Fix tarfile in remote_storage
* Improve printing
* Fiddle with status messages
* Set version to v3.0.0a9
* Draft docs for spacy project remote storages
* Update docs [ci skip]
* Use Thinc config to simplify and unify template variables
* Auto-format
* Don't import Pathy globally for now
Causes slow and annoying Google Cloud warning
* Tidy up test
* Tidy up and update tests
* Update to latest Thinc
* Update docs
* variables -> vars
* Update docs [ci skip]
* Update docs [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Ines Montani
1c3bcfb488
Update docs and util consistency
2020-08-18 01:22:59 +02:00
Ines Montani
3ae5e02f4f
Update docs, types and API consistency
2020-08-17 16:45:24 +02:00
Ines Montani
45f13cbf64
Merge pull request #5916 from explosion/feature/new-thinc-config
2020-08-16 15:24:12 +02:00
Ines Montani
8128e5eb35
Replace lexeme_norm warning with logging
2020-08-14 15:00:52 +02:00
Ines Montani
37814b608d
Remove env_opt and simplfy default Optimizer
2020-08-14 14:59:54 +02:00
Ines Montani
67cc39af7f
Update Thinc and include section order
2020-08-14 14:06:22 +02:00
Ines Montani
88b0a96801
Update for new Thinc and adjust config
2020-08-13 17:38:30 +02:00
Ines Montani
913d21f0a3
Merge pull request #5882 from explosion/feature/raise-from
...
Use "raise ... from" in custom errors for better tracebacks
2020-08-06 00:35:26 +02:00
Ines Montani
d92954ac1d
Merge pull request #5881 from explosion/feature/better-error-model-shortcuts
2020-08-06 00:13:35 +02:00
Ines Montani
56c17973aa
Use "raise ... from" in custom errors for better tracebacks
2020-08-05 23:53:21 +02:00
Ines Montani
5cc0d89fad
Simplify config overrides in CLI and deserialization ( #5880 )
2020-08-05 23:35:09 +02:00
Ines Montani
2a1fa86a0d
Add better error for failed model shortcut loading
2020-08-05 23:10:29 +02:00
Ines Montani
823e533dc1
Add config callbacks for modifying nlp object before and after init ( #5866 )
...
* WIP: Concept for modifying nlp object before and after init
* Make callbacks return nlp object
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* Raise if callbacks don't return correct type
* Rename, update types, add after_pipeline_creation
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-05 19:47:54 +02:00
Ines Montani
e68459296d
Tidy up and auto-format
2020-08-05 16:00:59 +02:00
Ines Montani
b795f02fbd
Allow adding pipeline components from source model ( #5857 )
...
* Allow adding pipeline components from source model
* Config: name -> component
* Improve error messages
* Fix error and test
* Add frozen components and exclude logic
* Remove exclude from Language.evaluate
* Init sourced components with current vocab
* Fix error codes
2020-08-04 23:39:19 +02:00
Matthew Honnibal
ecb3c4e8f4
Create corpus iterator and batcher from registry during training ( #5865 )
...
* Move batchers into their own module (and registry)
* Update CLI
* Update Corpus and batcher
* Update tests
* Update one config
* Merge 'evaluation' block back under [training]
* Import batchers in gold __init__
* Fix batchers
* Update config
* Update schema
* Update util
* Don't assume train and dev are actually paths
* Update onto-joint config
* Fix missing import
* Format
* Format
* Update spacy/gold/corpus.py
Co-authored-by: Ines Montani <ines@ines.io>
* Fix name
* Update default config
* Fix get_length option in batchers
* Update test
* Add comment
* Pass path into Corpus
* Update docstring
* Update schema and configs
* Update config
* Fix test
* Fix paths
* Fix print
* Fix create_train_batches
* [training.read_train] -> [training.train_corpus]
* Update onto-joint config
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-04 15:09:37 +02:00
Ines Montani
e9e8fa2466
Update docs and types
2020-07-31 17:02:54 +02:00
Matthew Honnibal
1784c95827
Clean up link_vectors_to_models unused stuff
2020-07-29 14:01:11 +02:00
Matthew Honnibal
0c17ea4c85
Format
2020-07-29 14:00:13 +02:00
Matthew Honnibal
7852a68a75
Fix load_vectors_into_model function
2020-07-29 14:00:13 +02:00