Ines Montani
2e9c9e74af
Fix config resolution and interpolation
...
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani
822ea4ef61
Refactor CLI
2020-09-28 15:09:59 +02:00
Ines Montani
a89e0ff7cb
Fix typo
2020-09-28 12:55:21 +02:00
Ines Montani
a62337b3f3
Tidy up vocab init
2020-09-28 12:53:06 +02:00
Ines Montani
c22ecc66bb
Don't support init path for now
2020-09-28 12:46:28 +02:00
Ines Montani
a5f2cc0509
Tidy up and remove raw text (rehearsal) for now
2020-09-28 12:30:13 +02:00
Ines Montani
1590de11b1
Update config
2020-09-28 12:05:23 +02:00
Ines Montani
e44a7519cd
Update CLI and add [initialize] block
2020-09-28 11:56:14 +02:00
Ines Montani
d5155376fd
Update vocab init
2020-09-28 11:30:18 +02:00
Ines Montani
8b74fd19df
init pipeline -> init nlp
2020-09-28 11:13:38 +02:00
Ines Montani
2fdb7285a0
Update CLI
2020-09-28 11:06:07 +02:00
Ines Montani
553bfea641
Fix commands
2020-09-28 10:53:17 +02:00
Matthew Honnibal
44bad1474c
Add init_pipeline file
2020-09-28 09:47:34 +02:00
Matthew Honnibal
b886f53c31
init-pipeline runs (maybe doesnt work)
2020-09-28 03:42:47 +02:00
Matthew Honnibal
ed2aff2db3
Remove unused train code
2020-09-28 03:12:31 +02:00
Matthew Honnibal
3a0a3b8db6
Dont hard-code for 'corpora' name
2020-09-28 03:06:33 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus ( #6155 )
...
* Support data augmentation in Corpus
* Note initial docs for data augmentation
* Add augmenter to quickstart
* Fix flake8
* Format
* Fix test
* Update spacy/tests/training/test_training.py
* Improve data augmentation arguments
* Update templates
* Move randomization out into caller
* Refactor
* Update spacy/training/augment.py
* Update spacy/tests/training/test_training.py
* Fix augment
* Fix test
2020-09-28 03:03:27 +02:00
Matthew Honnibal
a3e1791c9c
Upd train
2020-09-28 01:08:30 +02:00
Matthew Honnibal
b5556093e2
Start updating train script
2020-09-27 23:59:44 +02:00
Ines Montani
e04bd16f7f
Merge branch 'develop' into feature/new-thinc-config-resolution
2020-09-27 22:34:46 +02:00
Ines Montani
d7ad65a9bb
Fix handling of error description [ci skip]
2020-09-27 22:31:57 +02:00
Ines Montani
7e938ed63e
Update config resolution to use new Thinc
2020-09-27 22:21:31 +02:00
Matthew Honnibal
39b178999c
Tmp notes
2020-09-27 20:13:38 +02:00
Ines Montani
b4486d747d
Merge branch 'develop' into fix/train-config-interpolation
2020-09-26 15:32:14 +02:00
Ines Montani
b2d07de786
Construct nlp from uninterpolated config before training
2020-09-26 15:16:59 +02:00
Ines Montani
ca3c997062
Improve CLI config validation with latest Thinc
2020-09-26 13:13:57 +02:00
Matthew Honnibal
3d8388969e
Sort paths for cache consistency
2020-09-25 19:07:26 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script ( #6143 )
...
* update pretraining API in train CLI
* bump thinc to 8.0.0a35
* bump to 3.0.0a26
* doc fixes
* small doc fix
2020-09-25 15:47:10 +02:00
Matthew Honnibal
74ee456374
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-09-24 16:11:47 +02:00
Matthew Honnibal
0bc214c102
Fix pull
2020-09-24 16:11:33 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk
2020-09-24 14:44:11 +02:00
Ines Montani
24e7ac3f2b
Fix download CLI [ci skip]
2020-09-24 14:43:56 +02:00
Ines Montani
88e54caa12
accuracy -> performance
2020-09-24 14:32:35 +02:00
Ines Montani
be56c0994b
Add [training.before_to_disk] callback
2020-09-24 12:40:25 +02:00
Ines Montani
c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights
2020-09-24 12:00:57 +02:00
Ines Montani
f69fea8b25
Improve error handling around non-number scores
2020-09-24 11:29:07 +02:00
Matthew Honnibal
17a6b0a173
Make project pull order insensitive ( #6131 )
2020-09-24 10:30:42 +02:00
Ines Montani
ae51f580c1
Fix handling of score_weights
2020-09-24 10:27:33 +02:00
svlandeg
35dbc63578
Merge remote-tracking branch 'upstream/develop' into fix/nr_features
...
# Conflicts:
# spacy/ml/models/parser.py
# spacy/tests/serialize/test_serialize_config.py
# website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f
'parser' instead of 'deps' for state_type
2020-09-23 16:53:49 +02:00
svlandeg
6c85fab316
state_type and extra_state_tokens instead of nr_feature_tokens
2020-09-23 13:35:09 +02:00
Ines Montani
7745d77a38
Fix whitespace in template [ci skip]
2020-09-23 13:21:42 +02:00
svlandeg
6435458d51
simplify expression
2020-09-23 12:12:38 +02:00
svlandeg
20b0ec5dcf
avoid logging performance of frozen components
2020-09-23 10:37:12 +02:00
Ines Montani
6ca06cb62c
Update docs and formatting [ci skip]
2020-09-23 10:14:27 +02:00
Ines Montani
888f936a73
Merge pull request #6106 from svlandeg/feature/textcat-quickstart
2020-09-23 10:11:45 +02:00
Ines Montani
60a317520a
Merge pull request #6109 from svlandeg/feature/2rename
2020-09-23 09:47:12 +02:00
svlandeg
556f3e4652
add pooling to NEL's TransformerListener
2020-09-23 09:24:28 +02:00
Sofie Van Landeghem
86a08f819d
tok2vec.update instead of predict ( #6113 )
2020-09-22 21:54:52 +02:00
Ines Montani
5e3b796b12
Validate section refs in debug config
2020-09-22 12:24:39 +02:00
svlandeg
085a1c8e2b
add no_output_layer to TextCatBOW config
2020-09-22 12:06:40 +02:00
svlandeg
b556a10808
rename converts in_to_out
2020-09-22 11:50:19 +02:00
svlandeg
e931f4d757
add textcat score
2020-09-22 10:56:43 +02:00
svlandeg
396b33257f
add entity_linker to jinja template
2020-09-22 10:40:05 +02:00
svlandeg
135de82a2d
add textcat to quickstart
2020-09-22 10:22:06 +02:00
Ines Montani
6316d5f398
Improve messages in project CLI [ci skip]
2020-09-22 09:45:34 +02:00
Ines Montani
81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip]
2020-09-22 09:31:23 +02:00
svlandeg
45b29c4a5b
cleanup
2020-09-21 23:17:23 +02:00
svlandeg
fa5c416db6
initialize through nlp object and with train_corpus
2020-09-21 23:09:22 +02:00
svlandeg
447b3e5787
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani
e8bcaa44f1
Don't auto-decompress archives with smart_open [ci skip]
2020-09-21 16:01:46 +02:00
svlandeg
eb9b447960
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani
758ead8a47
Sync overrides with CLI overrides
2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a
Support config overrides via environment variables
2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3
Tidy up and auto-format
2020-09-21 10:59:07 +02:00
Ines Montani
b2302c0a1c
Improve error for missing dependency
2020-09-20 17:44:51 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87
Fix sparse checkout
2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db
Use simple git clone call if not sparse
2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0
Use updated run_command
2020-09-20 16:21:43 +02:00
Ines Montani
554c9a2497
Update docs [ci skip]
2020-09-20 12:30:53 +02:00
svlandeg
6db1d5dc0d
trying some stuff
2020-09-19 19:11:30 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2
2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator ( #6091 )
...
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
73ff52b9ec
hack for tok2vec listener
2020-09-18 16:43:15 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus
2020-09-18 14:44:21 +02:00
svlandeg
e4fc7e0222
fixing output sample to proper 2D array
2020-09-17 22:34:36 +02:00
Ines Montani
3865214343
Use consistent shortcut
2020-09-17 16:57:02 +02:00
svlandeg
35a3931064
fix typo
2020-09-17 16:36:27 +02:00
svlandeg
ddfc1fc146
add pretraining option to init config
2020-09-17 16:05:40 +02:00
svlandeg
427dbecdd6
cleanup and formatting
2020-09-17 11:48:04 +02:00
svlandeg
0c35885751
generalize corpora, dot notation for dev and train corpus
2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47
rewrite train_corpus to corpus.train in config
2020-09-15 21:58:04 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
...
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat ( #6062 )
...
* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Matthew Honnibal
54c40223a1
Improve v3 pretrain command ( #6040 )
...
* Starts to run
* Update pretrain script
* Update corpus
* Update pretrain schema
* Remove outdated test
* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani
febb99916d
Tidy up and auto-format [ci skip]
2020-09-13 10:55:36 +02:00
Ines Montani
a5633b205f
Fix handling of errors around git [ci skip]
2020-09-13 10:52:28 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Matthew Honnibal
37347830d4
Fix reading in GloVe vectors
2020-09-12 17:31:18 +02:00
Ines Montani
b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config
2020-09-12 17:12:35 +02:00
Ines Montani
eedaaaec75
Fix handling of existing asset without checksum [ci skip]
2020-09-12 17:02:53 +02:00
svlandeg
a75cfe0da6
Merge remote-tracking branch 'upstream/develop' into feature/cli-config
2020-09-12 14:44:40 +02:00
svlandeg
115147804a
string_to_list to parse comma-separated string into a list
2020-09-12 14:43:22 +02:00
Ines Montani
f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat
2020-09-12 10:30:49 +02:00
Ines Montani
0b2e07215d
Support overwriting name on spacy package
2020-09-11 11:38:28 +02:00
svlandeg
5b94aeece9
support pipeline as "list in string"
2020-09-11 11:08:46 +02:00
Ines Montani
1bce432b4a
Adjust message [ci skip]
2020-09-11 10:00:49 +02:00