svlandeg
3ade455fd3
formatting
2020-06-04 16:09:55 +02:00
svlandeg
776d4f1190
cleanup
2020-06-04 16:07:30 +02:00
svlandeg
6b027d7689
remove duplicate model definition of tok2vec layer
2020-06-04 15:49:23 +02:00
svlandeg
1775f54a26
small little fixes
2020-06-03 22:17:02 +02:00
svlandeg
07886a3de3
rename init_tok2vec to resume
2020-06-03 22:00:25 +02:00
svlandeg
4ed6278663
small fixes to pretrain config, init_tok2vec TODO
2020-06-03 19:32:40 +02:00
Ines Montani
d79964bcb1
Merge pull request #5535 from adrianeboyd/feature/model-spacy-version-check
2020-06-03 15:35:20 +02:00
Ines Montani
56a9d1b78c
Merge pull request #5479 from explosion/master-tmp
2020-06-03 15:31:27 +02:00
svlandeg
ddf8244df9
add normalize option to distance metric
2020-06-03 14:52:54 +02:00
svlandeg
ffe0451d09
pretrain from config
2020-06-03 14:45:00 +02:00
Ines Montani
a8875d4a4b
Fix typo
2020-06-03 14:42:39 +02:00
Ines Montani
4e0610d0d4
Update warning codes
2020-06-03 14:37:09 +02:00
Ines Montani
810fce3bb1
Merge branch 'develop' into master-tmp
2020-06-03 14:36:59 +02:00
Adriane Boyd
b0ee76264b
Remove debugging
2020-06-03 14:20:42 +02:00
Adriane Boyd
1d8168d1fd
Fix problems with lower and whitespace in variants
...
Port relevant changes from #5361 :
* Initialize lower flag explicitly
* Handle whitespace words from GoldParse correctly when creating raw
text with orth variants
2020-06-03 14:15:58 +02:00
Adriane Boyd
10d938f221
Update default cfg dir in train CLI
2020-06-03 14:15:50 +02:00
Adriane Boyd
f1f9c8b417
Port train CLI updates
...
Updates from #5362 and fix from #5387 :
* `train`:
* if training on GPU, only run evaluation/timing on CPU in the first
iteration
* if training is aborted, exit with a non-0 exit status
2020-06-03 14:03:43 +02:00
Adriane Boyd
8c758ed1eb
Fix meta path
2020-06-03 12:11:57 +02:00
Adriane Boyd
a57bdeecac
Test util.get_model_meta instead of util.load_model
2020-06-03 12:10:12 +02:00
svlandeg
109bbdab98
update config files with separate dropout for Tok2Vec layer
2020-06-03 11:53:59 +02:00
svlandeg
eac12cbb77
make dropout in embed layers configurable
2020-06-03 11:50:16 +02:00
svlandeg
e91485dfc4
add discard_oversize parameter, move optimizer to training subsection
2020-06-03 10:04:16 +02:00
svlandeg
03c58b488c
prevent infinite loop, custom warning
2020-06-03 10:00:21 +02:00
svlandeg
6504b7f161
Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config
2020-06-03 08:30:16 +02:00
Matthew Honnibal
f74784575c
Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize
...
add oversize examples before StopIteration returns
2020-06-02 22:54:38 +02:00
svlandeg
c5ac382f0a
fix name clash
2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf
additional test with discard_oversize=False
2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c
extending algorithm to deal better with edge cases
2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60
it's only oversized if the tolerance level is also exceeded
2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7
fix comments
2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3
slightly more challenging unit test
2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c
using overflow buffer for examples within the tolerance margin
2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5
add test for minibatch util
2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99
bugfix of the bugfix
2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d
Remove unnecessary check
2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581
Add rudimentary version checks on model load
2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936
rewrite minibatch_by_words function
2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886
add oversize examples before StopIteration returns
2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1
remove Tensorizer
2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement ( #5531 )
...
* Spanish tokenizer exception additions. Added Spanish question examples
* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
...
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' ( #5526 )
2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed ( #5525 )
2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7
Preserve _SP when filtering tag map in Tagger
...
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps
2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09
Add onto-joint experiment file
2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
...
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5
Add warning
2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417
Merge branch 'develop' into feature/improve-model-version-deps
2020-05-30 15:20:20 +02:00