Commit Graph

902 Commits

Author SHA1 Message Date
Sofie Van Landeghem
86a08f819d
tok2vec.update instead of predict (#6113) 2020-09-22 21:54:52 +02:00
Ines Montani
5e3b796b12 Validate section refs in debug config 2020-09-22 12:24:39 +02:00
Ines Montani
6316d5f398 Improve messages in project CLI [ci skip] 2020-09-22 09:45:34 +02:00
Ines Montani
81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip] 2020-09-22 09:31:23 +02:00
svlandeg
45b29c4a5b cleanup 2020-09-21 23:17:23 +02:00
svlandeg
fa5c416db6 initialize through nlp object and with train_corpus 2020-09-21 23:09:22 +02:00
svlandeg
447b3e5787 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani
e8bcaa44f1 Don't auto-decompress archives with smart_open [ci skip] 2020-09-21 16:01:46 +02:00
svlandeg
eb9b447960 Merge remote-tracking branch 'upstream/develop' into fix/debug_model
# Conflicts:
#	spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani
758ead8a47 Sync overrides with CLI overrides 2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a Support config overrides via environment variables 2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Ines Montani
b2302c0a1c Improve error for missing dependency 2020-09-20 17:44:51 +02:00
Matthew Honnibal
8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87 Fix sparse checkout 2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db Use simple git clone call if not sparse 2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0 Use updated run_command 2020-09-20 16:21:43 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
svlandeg
6db1d5dc0d trying some stuff 2020-09-19 19:11:30 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
73ff52b9ec hack for tok2vec listener 2020-09-18 16:43:15 +02:00
Adriane Boyd
eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
svlandeg
e4fc7e0222 fixing output sample to proper 2D array 2020-09-17 22:34:36 +02:00
Ines Montani
3865214343 Use consistent shortcut 2020-09-17 16:57:02 +02:00
svlandeg
35a3931064 fix typo 2020-09-17 16:36:27 +02:00
svlandeg
ddfc1fc146 add pretraining option to init config 2020-09-17 16:05:40 +02:00
svlandeg
427dbecdd6 cleanup and formatting 2020-09-17 11:48:04 +02:00
svlandeg
0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat (#6062)
* hook up positive_label in textcat

* unit tests

* documentation

* formatting

* tests

* fix typo

* move verify_config to after begin_training

* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025 Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
Matthew Honnibal
54c40223a1
Improve v3 pretrain command (#6040)
* Starts to run

* Update pretrain script

* Update corpus

* Update pretrain schema

* Remove outdated test

* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani
febb99916d Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
Ines Montani
a5633b205f Fix handling of errors around git [ci skip] 2020-09-13 10:52:28 +02:00
Ines Montani
f8846c198d Update types and docstrings 2020-09-13 10:52:02 +02:00
Matthew Honnibal
37347830d4 Fix reading in GloVe vectors 2020-09-12 17:31:18 +02:00
Ines Montani
b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config 2020-09-12 17:12:35 +02:00
Ines Montani
eedaaaec75 Fix handling of existing asset without checksum [ci skip] 2020-09-12 17:02:53 +02:00
svlandeg
a75cfe0da6 Merge remote-tracking branch 'upstream/develop' into feature/cli-config 2020-09-12 14:44:40 +02:00
svlandeg
115147804a string_to_list to parse comma-separated string into a list 2020-09-12 14:43:22 +02:00
Ines Montani
f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat 2020-09-12 10:30:49 +02:00
Ines Montani
0b2e07215d Support overwriting name on spacy package 2020-09-11 11:38:28 +02:00
svlandeg
5b94aeece9 support pipeline as "list in string" 2020-09-11 11:08:46 +02:00
Ines Montani
1bce432b4a Adjust message [ci skip] 2020-09-11 10:00:49 +02:00
Ines Montani
5acd4fbcd8 Merge branch 'develop' into fix/clone-compat 2020-09-11 09:58:30 +02:00
Ines Montani
761bd60d43 Adjust info message 2020-09-11 09:57:00 +02:00
Ines Montani
6831161bfa Resolve path to be extra sure 2020-09-11 09:56:49 +02:00