Ines Montani
810fce3bb1
Merge branch 'develop' into master-tmp
2020-06-03 14:36:59 +02:00
Adriane Boyd
b0ee76264b
Remove debugging
2020-06-03 14:20:42 +02:00
Adriane Boyd
1d8168d1fd
Fix problems with lower and whitespace in variants
...
Port relevant changes from #5361 :
* Initialize lower flag explicitly
* Handle whitespace words from GoldParse correctly when creating raw
text with orth variants
2020-06-03 14:15:58 +02:00
Adriane Boyd
10d938f221
Update default cfg dir in train CLI
2020-06-03 14:15:50 +02:00
Adriane Boyd
f1f9c8b417
Port train CLI updates
...
Updates from #5362 and fix from #5387 :
* `train`:
* if training on GPU, only run evaluation/timing on CPU in the first
iteration
* if training is aborted, exit with a non-0 exit status
2020-06-03 14:03:43 +02:00
Adriane Boyd
8c758ed1eb
Fix meta path
2020-06-03 12:11:57 +02:00
Adriane Boyd
a57bdeecac
Test util.get_model_meta instead of util.load_model
2020-06-03 12:10:12 +02:00
svlandeg
109bbdab98
update config files with separate dropout for Tok2Vec layer
2020-06-03 11:53:59 +02:00
svlandeg
eac12cbb77
make dropout in embed layers configurable
2020-06-03 11:50:16 +02:00
svlandeg
e91485dfc4
add discard_oversize parameter, move optimizer to training subsection
2020-06-03 10:04:16 +02:00
svlandeg
03c58b488c
prevent infinite loop, custom warning
2020-06-03 10:00:21 +02:00
svlandeg
6504b7f161
Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config
2020-06-03 08:30:16 +02:00
Matthew Honnibal
f74784575c
Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize
...
add oversize examples before StopIteration returns
2020-06-02 22:54:38 +02:00
svlandeg
c5ac382f0a
fix name clash
2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf
additional test with discard_oversize=False
2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c
extending algorithm to deal better with edge cases
2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60
it's only oversized if the tolerance level is also exceeded
2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7
fix comments
2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3
slightly more challenging unit test
2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c
using overflow buffer for examples within the tolerance margin
2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5
add test for minibatch util
2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99
bugfix of the bugfix
2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d
Remove unnecessary check
2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581
Add rudimentary version checks on model load
2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936
rewrite minibatch_by_words function
2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886
add oversize examples before StopIteration returns
2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1
remove Tensorizer
2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement ( #5531 )
...
* Spanish tokenizer exception additions. Added Spanish question examples
* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
...
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' ( #5526 )
2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed ( #5525 )
2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7
Preserve _SP when filtering tag map in Tagger
...
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps
2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09
Add onto-joint experiment file
2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
...
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5
Add warning
2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417
Merge branch 'develop' into feature/improve-model-version-deps
2020-05-30 15:20:20 +02:00
Ines Montani
368182776e
Tidy up dependencies
2020-05-30 15:19:53 +02:00
Ines Montani
b7aff6020c
Make functions more general purpose and update docstrings and tests
2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf
Don't override spaCy version
2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10
Use more sophisticated version parsing logic
2020-05-30 15:01:58 +02:00
Ines Montani
bed62991ad
Tidy up requirements
2020-05-30 14:59:55 +02:00
svlandeg
15134ef611
fix deserialization order
2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202
Revert "Remove peeking from Parser.begin_training ( #5456 )"
...
This reverts commit 9393253b66
.
The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
...
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
...
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d
prevent loading a pretrained Tok2Vec layer AND pretrained components
2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197
Remove MorphAnalysis __str__ and __repr__
2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667
fix description
2020-05-29 13:52:39 +02:00