Commit Graph

26 Commits

Author SHA1 Message Date
Matthew Honnibal
a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
svlandeg
35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
svlandeg
6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani
7745d77a38 Fix whitespace in template [ci skip] 2020-09-23 13:21:42 +02:00
Ines Montani
6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
svlandeg
556f3e4652 add pooling to NEL's TransformerListener 2020-09-23 09:24:28 +02:00
svlandeg
085a1c8e2b add no_output_layer to TextCatBOW config 2020-09-22 12:06:40 +02:00
svlandeg
e931f4d757 add textcat score 2020-09-22 10:56:43 +02:00
svlandeg
396b33257f add entity_linker to jinja template 2020-09-22 10:40:05 +02:00
svlandeg
135de82a2d add textcat to quickstart 2020-09-22 10:22:06 +02:00
Ines Montani
554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Matthew Honnibal
4b7abaafdb Fix learn rate for non-transformer 2020-09-04 21:22:50 +02:00
Ines Montani
23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani
c063e55eb7 Add prefix to batchers 2020-09-03 17:30:41 +02:00
Sofie Van Landeghem
ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Matthew Honnibal
c356e62908 Minor adjustments to quickstart template 2020-08-21 00:10:21 +02:00
Ines Montani
6ad59d59fe Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip] 2020-08-20 11:20:58 +02:00
svlandeg
b96cd9fa5e fix typo 2020-08-19 18:46:08 +02:00
Ines Montani
e2f2ef3a5a Update init config and recommendations
- As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations
- Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date.
- Simplify jinja_to_js script and use fewer dependencies
2020-08-19 13:33:15 +02:00
Ines Montani
a570c304df Update quickstart, template and docs 2020-08-15 14:50:29 +02:00
Ines Montani
88b0a96801 Update for new Thinc and adjust config 2020-08-13 17:38:30 +02:00