Commit Graph

13143 Commits

Author SHA1 Message Date
Adriane Boyd
5fbb8dfcbc Merge remote-tracking branch 'upstream/develop' into docs/various-v3-2 2020-09-22 09:22:58 +02:00
Ines Montani
a5f6ab4943
Merge pull request #6098 from adrianeboyd/feature/doc-init 2020-09-21 18:35:20 +02:00
Adriane Boyd
f212303729 Add sent_starts to Doc.__init__
Add sent_starts to `Doc.__init__`. Officially specify `is_sent_start`
values but also convert to and accept `sent_start` internally.
2020-09-21 17:59:09 +02:00
Ines Montani
b3327c1e45 Increment version [ci skip] 2020-09-21 16:04:30 +02:00
Ines Montani
e8bcaa44f1 Don't auto-decompress archives with smart_open [ci skip] 2020-09-21 16:01:46 +02:00
Adriane Boyd
6aa91c7ca0 Make user_data keyword-only 2020-09-21 16:00:06 +02:00
Ines Montani
e548654aca Update docs [ci skip] 2020-09-21 14:46:55 +02:00
Ines Montani
4b79d697ee
Merge pull request #6096 from explosion/feature/config-overrides-env-vars 2020-09-21 14:46:19 +02:00
Ines Montani
626cfd7155
Merge pull request #6099 from adrianeboyd/docs/alphabetize-api-sidebar [ci skip]
Alphabetize API sidebars
2020-09-21 14:44:43 +02:00
Adriane Boyd
ce455f30ca Fix formatting 2020-09-21 13:53:29 +02:00
Adriane Boyd
9b8d0b7f90 Alphabetize API sidebars 2020-09-21 13:46:21 +02:00
Adriane Boyd
bc02e86494 Extend Doc.__init__ with additional annotation
Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to
`Doc.__init__` to initialize the most common doc/token values.
2020-09-21 13:36:24 +02:00
Ines Montani
758ead8a47 Sync overrides with CLI overrides 2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a Support config overrides via environment variables 2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3 Tidy up and auto-format 2020-09-21 10:59:07 +02:00
Ines Montani
9d32cac736 Update docs [ci skip] 2020-09-21 10:55:36 +02:00
Adriane Boyd
cc71ec901f Fix typo in saving and loading usage docs 2020-09-21 09:08:55 +02:00
Adriane Boyd
3aa57ce6c9 Update alignment mode in Doc.char_span docs 2020-09-21 09:07:20 +02:00
Ines Montani
b9d2b29684 Update docs [ci skip] 2020-09-20 17:49:09 +02:00
Ines Montani
012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani
b2302c0a1c Improve error for missing dependency 2020-09-20 17:44:51 +02:00
Ines Montani
6898b35028
Merge pull request #6094 from explosion/bugfix/run_process 2020-09-20 16:49:30 +02:00
Ines Montani
744f259b9c Update landing [ci skip] 2020-09-20 16:37:23 +02:00
Matthew Honnibal
8fb59d958c Format 2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87 Fix sparse checkout 2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db Use simple git clone call if not sparse 2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0 Use updated run_command 2020-09-20 16:21:43 +02:00
Matthew Honnibal
889128e5c5 Improve error handling in run_command 2020-09-20 16:20:57 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2 2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Adriane Boyd
47080fba98 Minor renaming / refactoring
* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property
2020-09-18 19:43:19 +02:00
Adriane Boyd
eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
0406200a1e Update docs [ci skip] 2020-09-18 15:13:13 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
Matthew Honnibal
bbdb5f62b7
Temporary work-around for scoring a subset of components (#6090)
* Try hacking the scorer to work around sentence boundaries

* Upd scorer

* Set dev version

* Upd scorer hack

* Fix version

* Improve comment on hack
2020-09-18 14:26:42 +02:00
Ines Montani
d32ce121be Fix docs [ci skip] 2020-09-18 13:41:12 +02:00
Adriane Boyd
a88106e852
Remove W106: HEAD and SENT_START in doc.from_array (#6086)
* Remove W106: HEAD and SENT_START in doc.from_array

This warning was hacky and being triggered too often.

* Fix test
2020-09-18 03:01:29 +02:00
Ines Montani
9062585a13
Merge pull request #6087 from explosion/docs/pretrain-usage [ci skip] 2020-09-17 19:25:24 +02:00
Ines Montani
a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal
6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Sofie Van Landeghem
ed0fb034cb
ml_datasets v0.2.0a0 2020-09-17 18:11:10 +02:00
Ines Montani
1bb8b4f824 Merge branch 'master' into develop 2020-09-17 17:46:20 +02:00
Ines Montani
6bd0d25fb9
Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip] 2020-09-17 17:14:45 +02:00
Ines Montani
a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Ines Montani
2c80f41852
Merge pull request #6084 from svlandeg/feature/init-config-pretrain [ci skip] 2020-09-17 16:59:14 +02:00
Ines Montani
2e3ce9f42f Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084 2020-09-17 16:58:49 +02:00
Ines Montani
3d8e010655 Change order 2020-09-17 16:58:46 +02:00
Ines Montani
c4b414b282
Update website/docs/api/cli.md 2020-09-17 16:58:09 +02:00
Ines Montani
3865214343 Use consistent shortcut 2020-09-17 16:57:02 +02:00