Ines Montani
e8bcaa44f1
Don't auto-decompress archives with smart_open [ci skip]
2020-09-21 16:01:46 +02:00
Ines Montani
758ead8a47
Sync overrides with CLI overrides
2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a
Support config overrides via environment variables
2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3
Tidy up and auto-format
2020-09-21 10:59:07 +02:00
Ines Montani
b2302c0a1c
Improve error for missing dependency
2020-09-20 17:44:51 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87
Fix sparse checkout
2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db
Use simple git clone call if not sparse
2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0
Use updated run_command
2020-09-20 16:21:43 +02:00
Ines Montani
554c9a2497
Update docs [ci skip]
2020-09-20 12:30:53 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2
2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator ( #6091 )
...
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string
2020-09-19 01:17:02 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus
2020-09-18 14:44:21 +02:00
Ines Montani
3865214343
Use consistent shortcut
2020-09-17 16:57:02 +02:00
svlandeg
ddfc1fc146
add pretraining option to init config
2020-09-17 16:05:40 +02:00
svlandeg
427dbecdd6
cleanup and formatting
2020-09-17 11:48:04 +02:00
svlandeg
0c35885751
generalize corpora, dot notation for dev and train corpus
2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47
rewrite train_corpus to corpus.train in config
2020-09-15 21:58:04 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
...
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat ( #6062 )
...
* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Matthew Honnibal
54c40223a1
Improve v3 pretrain command ( #6040 )
...
* Starts to run
* Update pretrain script
* Update corpus
* Update pretrain schema
* Remove outdated test
* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani
febb99916d
Tidy up and auto-format [ci skip]
2020-09-13 10:55:36 +02:00
Ines Montani
a5633b205f
Fix handling of errors around git [ci skip]
2020-09-13 10:52:28 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Matthew Honnibal
37347830d4
Fix reading in GloVe vectors
2020-09-12 17:31:18 +02:00
Ines Montani
b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config
2020-09-12 17:12:35 +02:00
Ines Montani
eedaaaec75
Fix handling of existing asset without checksum [ci skip]
2020-09-12 17:02:53 +02:00
svlandeg
a75cfe0da6
Merge remote-tracking branch 'upstream/develop' into feature/cli-config
2020-09-12 14:44:40 +02:00
svlandeg
115147804a
string_to_list to parse comma-separated string into a list
2020-09-12 14:43:22 +02:00
Ines Montani
f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat
2020-09-12 10:30:49 +02:00
Ines Montani
0b2e07215d
Support overwriting name on spacy package
2020-09-11 11:38:28 +02:00
svlandeg
5b94aeece9
support pipeline as "list in string"
2020-09-11 11:08:46 +02:00
Ines Montani
1bce432b4a
Adjust message [ci skip]
2020-09-11 10:00:49 +02:00
Ines Montani
5acd4fbcd8
Merge branch 'develop' into fix/clone-compat
2020-09-11 09:58:30 +02:00
Ines Montani
761bd60d43
Adjust info message
2020-09-11 09:57:00 +02:00
Ines Montani
6831161bfa
Resolve path to be extra sure
2020-09-11 09:56:49 +02:00
svlandeg
1723fb73c4
remove brol
2020-09-10 17:44:59 +02:00
svlandeg
08a831ce83
process trailing slash if any
2020-09-10 17:39:52 +02:00
Ines Montani
3e83a509bb
WIP: fix project clone compatibility
2020-09-10 15:49:13 +02:00
svlandeg
f1bc09c1e9
restore partly
2020-09-10 14:53:02 +02:00
svlandeg
3889747119
asset fix & UX
2020-09-10 14:36:53 +02:00
svlandeg
a36766d153
hookup branch
2020-09-10 12:00:34 +02:00
svlandeg
97d99f7efa
Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes
2020-09-10 11:51:34 +02:00
Ines Montani
908f3a4494
Update default projects repo [ci skip]
2020-09-10 11:42:14 +02:00
svlandeg
92f9d2f406
small UX fixes
2020-09-10 11:35:50 +02:00
svlandeg
1fc5486792
more fine-grained errors for git_sparse_checkout
2020-09-10 11:31:32 +02:00
Ines Montani
15bc3a37b4
Add --branch to project clone
2020-09-10 11:08:15 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter ( #6042 )
...
* version bump to 3.0.0a16
* rename "gold" folder to "training"
* rename 'annotation_setter' to 'set_extra_annotations'
* formatting
2020-09-09 10:31:03 +02:00