Commit Graph

11788 Commits

Author SHA1 Message Date
Ines Montani
810fce3bb1 Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
Adriane Boyd
b0ee76264b Remove debugging 2020-06-03 14:20:42 +02:00
Adriane Boyd
1d8168d1fd Fix problems with lower and whitespace in variants
Port relevant changes from #5361:

* Initialize lower flag explicitly

* Handle whitespace words from GoldParse correctly when creating raw
text with orth variants
2020-06-03 14:15:58 +02:00
Adriane Boyd
10d938f221 Update default cfg dir in train CLI 2020-06-03 14:15:50 +02:00
Adriane Boyd
f1f9c8b417 Port train CLI updates
Updates from #5362 and fix from #5387:

* `train`:

  * if training on GPU, only run evaluation/timing on CPU in the first
    iteration

  * if training is aborted, exit with a non-0 exit status
2020-06-03 14:03:43 +02:00
Adriane Boyd
8c758ed1eb Fix meta path 2020-06-03 12:11:57 +02:00
Adriane Boyd
a57bdeecac Test util.get_model_meta instead of util.load_model 2020-06-03 12:10:12 +02:00
svlandeg
109bbdab98 update config files with separate dropout for Tok2Vec layer 2020-06-03 11:53:59 +02:00
svlandeg
eac12cbb77 make dropout in embed layers configurable 2020-06-03 11:50:16 +02:00
svlandeg
e91485dfc4 add discard_oversize parameter, move optimizer to training subsection 2020-06-03 10:04:16 +02:00
svlandeg
03c58b488c prevent infinite loop, custom warning 2020-06-03 10:00:21 +02:00
svlandeg
6504b7f161 Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config 2020-06-03 08:30:16 +02:00
Matthew Honnibal
f74784575c
Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize
add oversize examples before StopIteration returns
2020-06-02 22:54:38 +02:00
svlandeg
c5ac382f0a fix name clash 2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf additional test with discard_oversize=False 2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c extending algorithm to deal better with edge cases 2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60 it's only oversized if the tolerance level is also exceeded 2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7 fix comments 2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3 slightly more challenging unit test 2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c using overflow buffer for examples within the tolerance margin 2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5 add test for minibatch util 2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99 bugfix of the bugfix 2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d Remove unnecessary check 2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581 Add rudimentary version checks on model load 2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936 rewrite minibatch_by_words function 2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886 add oversize examples before StopIteration returns 2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1 remove Tensorizer 2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement (#5531)
* Spanish tokenizer exception additions. Added Spanish question examples

* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526) 2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7 Preserve _SP when filtering tag map in Tagger
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps 2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09 Add onto-joint experiment file 2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5 Add warning 2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417 Merge branch 'develop' into feature/improve-model-version-deps 2020-05-30 15:20:20 +02:00
Ines Montani
368182776e Tidy up dependencies 2020-05-30 15:19:53 +02:00
Ines Montani
b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf Don't override spaCy version 2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani
bed62991ad Tidy up requirements 2020-05-30 14:59:55 +02:00
svlandeg
15134ef611 fix deserialization order 2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202 Revert "Remove peeking from Parser.begin_training (#5456)"
This reverts commit 9393253b66.

The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d prevent loading a pretrained Tok2Vec layer AND pretrained components 2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197 Remove MorphAnalysis __str__ and __repr__ 2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667 fix description 2020-05-29 13:52:39 +02:00