Ines Montani
01ecfbcc45
Merge branch 'develop' into feature/replace-listeners
2021-01-29 15:57:32 +11:00
Ines Montani
911dfcccfc
Add option to replace listeners for sourced components
2021-01-29 15:57:04 +11:00
Sofie Van Landeghem
837a4f53c2
Error handling in nlp.pipe ( #6817 )
...
* add error handler for pipe methods
* add unit tests
* remove pipe method that are the same as their base class
* have Language keep track of a default error handler
* cleanup
* formatting
* small refactor
* add documentation
2021-01-29 08:51:21 +08:00
Ines Montani
f4d547b73c
Fix error code
2021-01-18 11:43:45 +11:00
Ines Montani
a552db2819
Include available registry names in error
2021-01-16 14:35:03 +11:00
Ines Montani
d12be459f6
Raise RegistryError
2021-01-16 12:57:13 +11:00
Ines Montani
a203e3dbb8
Support spacy-legacy via the registry
2021-01-15 21:42:40 +11:00
Ines Montani
991669c934
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
Thomas Bird
cbb8c66da3
prevent the root logger from inialising
2020-12-15 19:50:34 +00:00
Ines Montani
1980203229
Merge branch 'master' into pr/6444
2020-12-09 11:09:40 +11:00
Koichi Yasuoka
0afb54ac93
JapaneseTokenizer.pipe added ( #6515 )
...
* JapaneseTokenizer.pipe added
For [spacymoji](https://spacy.io/universe/project/spacymoji ) with `Japanese()`.
* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00
Ines Montani
d25b1606d6
Allow reading config from sdtin in spacy train
2020-12-08 18:01:40 +11:00
svlandeg
1f465bea18
if-else
2020-10-13 09:27:19 +02:00
Ines Montani
bfa3931c9d
Revert added_strings change ( #6236 )
2020-10-10 18:55:07 +02:00
svlandeg
040c7c0541
fix get_dim calls in build_simple_cnn_text_classifier
2020-10-09 15:40:58 +02:00
Florijan Stamenković
18f5c309dc
Fix Issue 6207 ( #6208 )
...
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-09 10:14:40 +02:00
Sofie Van Landeghem
d093d6343b
TrainablePipe ( #6213 )
...
* rename Pipe to TrainablePipe
* split functionality between Pipe and TrainablePipe
* remove unnecessary methods from certain components
* cleanup
* hasattr(component, "pipe") should be sufficient again
* remove serialization and vocab/cfg from Pipe
* unify _ensure_examples and validate_examples
* small fixes
* hasattr checks for self.cfg and self.vocab
* make is_resizable and is_trainable properties
* serialize strings.json instead of vocab
* fix KB IO + tests
* fix typos
* more typos
* _added_strings as a set
* few more tests specifically for _added_strings field
* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Florijan Stamenković
9db670b996
Fix Issue 6207 ( #6208 )
...
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-06 11:17:37 +02:00
Ines Montani
0135f6ed95
Enable commit check via env var
2020-10-05 20:51:15 +02:00
Ines Montani
6958510bda
Include spaCy version check in project CLI
2020-10-05 13:53:07 +02:00
Ines Montani
f758804401
Save one line of code
2020-10-03 11:41:28 +02:00
svlandeg
02247cccaf
Merge remote-tracking branch 'upstream/develop' into feature/small-fixes
2020-10-02 20:48:11 +02:00
svlandeg
acc391c2a8
remove redundant str() call
2020-10-02 11:05:59 +02:00
Ines Montani
01c1538c72
Integrate file readers
2020-10-02 01:36:06 +02:00
Ines Montani
23c63eefaf
Tidy up env vars [ci skip]
2020-09-30 15:15:11 +02:00
Ines Montani
a5debb356d
Tidy up and adjust logging [ci skip]
2020-09-30 01:22:08 +02:00
Ines Montani
da30bae8a6
Use __pyx_vtable__ instead of __reduce_cython__
2020-09-29 22:04:17 +02:00
Ines Montani
fa47f87924
Tidy up and auto-format
2020-09-29 21:39:28 +02:00
Ines Montani
d3c63b7965
Merge branch 'develop' into feature/prepare
2020-09-29 20:53:05 +02:00
Ines Montani
2be80379ec
Fix small issues, resolve_dot_names and debug model
2020-09-29 20:38:35 +02:00
Ines Montani
dba26186ef
Handle None default args in Cython methods
2020-09-29 18:08:02 +02:00
Ines Montani
9353a82076
Auto-format
2020-09-29 18:07:48 +02:00
Matthew Honnibal
4ad26f4a2f
Move reader
2020-09-29 16:54:53 +02:00
Ines Montani
2e9c9e74af
Fix config resolution and interpolation
...
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani
02838a1d47
Fix resolve_dot_names
2020-09-28 15:27:10 +02:00
Ines Montani
822ea4ef61
Refactor CLI
2020-09-28 15:09:59 +02:00
Ines Montani
e44a7519cd
Update CLI and add [initialize] block
2020-09-28 11:56:14 +02:00
Ines Montani
d5155376fd
Update vocab init
2020-09-28 11:30:18 +02:00
Matthew Honnibal
65448b2e34
Remove schema=None until Optional
2020-09-28 03:42:58 +02:00
Matthew Honnibal
a023cf3ecc
Add (untested) resolve_dot_names util
2020-09-28 03:06:12 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus ( #6155 )
...
* Support data augmentation in Corpus
* Note initial docs for data augmentation
* Add augmenter to quickstart
* Fix flake8
* Format
* Fix test
* Update spacy/tests/training/test_training.py
* Improve data augmentation arguments
* Update templates
* Move randomization out into caller
* Refactor
* Update spacy/training/augment.py
* Update spacy/tests/training/test_training.py
* Fix augment
* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani
9016d23cc5
Fix exclude and add test
2020-09-27 23:34:03 +02:00
Ines Montani
7e938ed63e
Update config resolution to use new Thinc
2020-09-27 22:21:31 +02:00
Ines Montani
26e28ed413
Fix combined scores if multiple components report it
2020-09-24 17:11:13 +02:00
Ines Montani
d0ef4a4cf5
Prevent division by zero in score weights
2020-09-24 16:42:13 +02:00
Ines Montani
4bbe41f017
Fix combined scores and update test
2020-09-24 10:42:47 +02:00
Ines Montani
ae51f580c1
Fix handling of score_weights
2020-09-24 10:27:33 +02:00
Ines Montani
f25f05c503
Adjust sort order [ci skip]
2020-09-23 20:03:04 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
889128e5c5
Improve error handling in run_command
2020-09-20 16:20:57 +02:00
Adriane Boyd
47080fba98
Minor renaming / refactoring
...
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Ines Montani
416deb412f
Prevent duplicate traceback on CalledProcessError [ci skip]
2020-09-13 19:28:54 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Ines Montani
3e83a509bb
WIP: fix project clone compatibility
2020-09-10 15:49:13 +02:00
Matthew Honnibal
b470062153
Add CLI registry ( #6037 )
2020-09-08 15:23:34 +02:00
Ines Montani
5afe6447cd
registry.assets -> registry.misc
2020-09-03 17:31:14 +02:00
Ines Montani
45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components
2020-08-29 15:58:41 +02:00
Ines Montani
34146750d4
Use frozen list with custom errors
...
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani
5de3f8604d
Update spacy/util.py
...
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-29 13:17:06 +02:00
Ines Montani
cad988da7f
Allow component decorators to re-run with same function
2020-08-28 16:27:22 +02:00
Ines Montani
3ce5be4b76
Allow loaded but disabled components
2020-08-28 15:20:14 +02:00
Sofie Van Landeghem
79d460e3a2
Weights & Biases logger for train CLI ( #5971 )
...
* quick test as part of train script
* train_logger in config, default ConsoleLogger in loggers catalogue
* entitiy typo
* add wandb_logger
* cleanup
* Update spacy/cli/train_logger.py
Co-authored-by: Ines Montani <ines@ines.io>
* move loggers to gold.loggers
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Matthew Honnibal
77852d2428
Fix run_command for python 3.6
2020-08-26 05:02:43 +02:00
Matthew Honnibal
884cac5fb5
Make run_command backwards compatible
2020-08-26 04:33:42 +02:00
Matthew Honnibal
2771e4f2b3
Fix the git "sparse checkout" functionality ( #5973 )
...
* Fix the git sparse checkout functionality
* Format
2020-08-26 04:00:14 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage ( #5949 )
...
* Add utils for working with remote storage
* WIP add remote_cache for project
* WIP add push and pull commands
* Use pathy in remote_cache
* Updarte util
* Update remote_cache
* Update util
* Update project assets
* Update pull script
* Update push script
* Fix type annotation in util
* Work on remote storage
* Remove site and env hash
* Fix imports
* Fix type annotation
* Require pathy
* Require pathy
* Fix import
* Add a util to handle project variable substitution
* Import push and pull commands
* Fix pull command
* Fix push command
* Fix tarfile in remote_storage
* Improve printing
* Fiddle with status messages
* Set version to v3.0.0a9
* Draft docs for spacy project remote storages
* Update docs [ci skip]
* Use Thinc config to simplify and unify template variables
* Auto-format
* Don't import Pathy globally for now
Causes slow and annoying Google Cloud warning
* Tidy up test
* Tidy up and update tests
* Update to latest Thinc
* Update docs
* variables -> vars
* Update docs [ci skip]
* Update docs [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Ines Montani
1c3bcfb488
Update docs and util consistency
2020-08-18 01:22:59 +02:00
Ines Montani
3ae5e02f4f
Update docs, types and API consistency
2020-08-17 16:45:24 +02:00
Ines Montani
45f13cbf64
Merge pull request #5916 from explosion/feature/new-thinc-config
2020-08-16 15:24:12 +02:00
Ines Montani
8128e5eb35
Replace lexeme_norm warning with logging
2020-08-14 15:00:52 +02:00
Ines Montani
37814b608d
Remove env_opt and simplfy default Optimizer
2020-08-14 14:59:54 +02:00
Ines Montani
67cc39af7f
Update Thinc and include section order
2020-08-14 14:06:22 +02:00
Ines Montani
88b0a96801
Update for new Thinc and adjust config
2020-08-13 17:38:30 +02:00
Ines Montani
913d21f0a3
Merge pull request #5882 from explosion/feature/raise-from
...
Use "raise ... from" in custom errors for better tracebacks
2020-08-06 00:35:26 +02:00
Ines Montani
d92954ac1d
Merge pull request #5881 from explosion/feature/better-error-model-shortcuts
2020-08-06 00:13:35 +02:00
Ines Montani
56c17973aa
Use "raise ... from" in custom errors for better tracebacks
2020-08-05 23:53:21 +02:00
Ines Montani
5cc0d89fad
Simplify config overrides in CLI and deserialization ( #5880 )
2020-08-05 23:35:09 +02:00
Ines Montani
2a1fa86a0d
Add better error for failed model shortcut loading
2020-08-05 23:10:29 +02:00
Ines Montani
823e533dc1
Add config callbacks for modifying nlp object before and after init ( #5866 )
...
* WIP: Concept for modifying nlp object before and after init
* Make callbacks return nlp object
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* Raise if callbacks don't return correct type
* Rename, update types, add after_pipeline_creation
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-05 19:47:54 +02:00
Ines Montani
e68459296d
Tidy up and auto-format
2020-08-05 16:00:59 +02:00
Ines Montani
b795f02fbd
Allow adding pipeline components from source model ( #5857 )
...
* Allow adding pipeline components from source model
* Config: name -> component
* Improve error messages
* Fix error and test
* Add frozen components and exclude logic
* Remove exclude from Language.evaluate
* Init sourced components with current vocab
* Fix error codes
2020-08-04 23:39:19 +02:00
Matthew Honnibal
ecb3c4e8f4
Create corpus iterator and batcher from registry during training ( #5865 )
...
* Move batchers into their own module (and registry)
* Update CLI
* Update Corpus and batcher
* Update tests
* Update one config
* Merge 'evaluation' block back under [training]
* Import batchers in gold __init__
* Fix batchers
* Update config
* Update schema
* Update util
* Don't assume train and dev are actually paths
* Update onto-joint config
* Fix missing import
* Format
* Format
* Update spacy/gold/corpus.py
Co-authored-by: Ines Montani <ines@ines.io>
* Fix name
* Update default config
* Fix get_length option in batchers
* Update test
* Add comment
* Pass path into Corpus
* Update docstring
* Update schema and configs
* Update config
* Fix test
* Fix paths
* Fix print
* Fix create_train_batches
* [training.read_train] -> [training.train_corpus]
* Update onto-joint config
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-04 15:09:37 +02:00
Ines Montani
e9e8fa2466
Update docs and types
2020-07-31 17:02:54 +02:00
Matthew Honnibal
1784c95827
Clean up link_vectors_to_models unused stuff
2020-07-29 14:01:11 +02:00
Matthew Honnibal
0c17ea4c85
Format
2020-07-29 14:00:13 +02:00
Matthew Honnibal
7852a68a75
Fix load_vectors_into_model function
2020-07-29 14:00:13 +02:00
Matthew Honnibal
df95e2af64
Add load_vectors_into_model util
2020-07-29 14:00:12 +02:00
Matthew Honnibal
acc64e138a
Add import
2020-07-29 14:00:11 +02:00
Matthew Honnibal
cb9654e98c
WIP on new StaticVectors
2020-07-29 14:00:09 +02:00
Ines Montani
ba22111ff4
Move error to Errors
2020-07-28 16:24:14 +02:00
Ines Montani
b83ead5bf5
Merge pull request #5824 from svlandeg/fix/textcat-v3
2020-07-28 15:04:25 +02:00
Ines Montani
ae4d8a6ffd
Update docstrings, docs and pipe consistency
2020-07-28 13:37:31 +02:00
svlandeg
61068e0fb1
util function dot_to_object and corresponding unit test
2020-07-27 17:50:12 +02:00
Adriane Boyd
8bb0507777
Add and update score methods and score weights
...
Add and update `score` methods, provided `scores`, and default weights
`default_score_weights` for pipeline components.
* `scores` provides all top-level keys returned by `score` (merely informative, similar to `assigns`).
* `default_score_weights` provides the default weights for a default config.
* The keys from `default_score_weights` determine which values will be
shown in the `spacy train` output, so keys with weight `0.0` will be
displayed but not counted toward the overall score.
2020-07-27 14:44:53 +02:00
Adriane Boyd
f8cf378be9
Combine weights from multiple components
...
Combine weights from multiple components for the same score.
2020-07-27 10:21:31 +02:00
Ines Montani
2470486543
Allow pipeline components to set default scores and weights
2020-07-26 13:18:43 +02:00
Ines Montani
e92df281ce
Tidy up, autoformat, add types
2020-07-25 15:01:15 +02:00
Ines Montani
c003d26b94
Tidy up
2020-07-25 12:21:37 +02:00