Matthew Honnibal
f74784575c
Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize
...
add oversize examples before StopIteration returns
2020-06-02 22:54:38 +02:00
svlandeg
c5ac382f0a
fix name clash
2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf
additional test with discard_oversize=False
2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c
extending algorithm to deal better with edge cases
2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60
it's only oversized if the tolerance level is also exceeded
2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7
fix comments
2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3
slightly more challenging unit test
2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c
using overflow buffer for examples within the tolerance margin
2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5
add test for minibatch util
2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99
bugfix of the bugfix
2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d
Remove unnecessary check
2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581
Add rudimentary version checks on model load
2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936
rewrite minibatch_by_words function
2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886
add oversize examples before StopIteration returns
2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1
remove Tensorizer
2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement ( #5531 )
...
* Spanish tokenizer exception additions. Added Spanish question examples
* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
...
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' ( #5526 )
2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed ( #5525 )
2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7
Preserve _SP when filtering tag map in Tagger
...
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps
2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09
Add onto-joint experiment file
2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
...
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5
Add warning
2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417
Merge branch 'develop' into feature/improve-model-version-deps
2020-05-30 15:20:20 +02:00
Ines Montani
368182776e
Tidy up dependencies
2020-05-30 15:19:53 +02:00
Ines Montani
b7aff6020c
Make functions more general purpose and update docstrings and tests
2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf
Don't override spaCy version
2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10
Use more sophisticated version parsing logic
2020-05-30 15:01:58 +02:00
Ines Montani
bed62991ad
Tidy up requirements
2020-05-30 14:59:55 +02:00
svlandeg
15134ef611
fix deserialization order
2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202
Revert "Remove peeking from Parser.begin_training ( #5456 )"
...
This reverts commit 9393253b66
.
The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
...
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
...
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d
prevent loading a pretrained Tok2Vec layer AND pretrained components
2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197
Remove MorphAnalysis __str__ and __repr__
2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667
fix description
2020-05-29 13:52:39 +02:00
svlandeg
5f0a91cf37
fix conv-depth parameter
2020-05-29 09:56:29 +02:00
Ines Montani
4fd087572a
WIP: improve model version deps
2020-05-28 12:51:37 +02:00
Matthw Honnibal
58750b06f8
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-27 22:18:36 +02:00
Matthew Honnibal
aecd1437cc
Merge pull request #5508 from adrianeboyd/bugfix/tag-map-sp-tag
...
Prefer _SP over SP for default tag map space attrs
2020-05-27 20:39:40 +02:00
Matthew Honnibal
e7ac12b598
Merge pull request #5514 from adrianeboyd/bugfix/load-vector-name
...
Improve vector name loading from model meta
2020-05-27 20:39:23 +02:00
Adriane Boyd
25de2a2191
Improve vector name loading from model meta
2020-05-27 14:48:54 +02:00
adrianeboyd
aad0610a85
Map NR to PROPN ( #5512 )
2020-05-26 22:30:53 +02:00
Sofie Van Landeghem
f00488ab30
Update train_intent_parser.py
2020-05-26 16:41:39 +02:00
Adriane Boyd
b6b5908f5e
Prefer _SP over SP for default tag map space attrs
...
If `_SP` is already in the tag map, use the mapping from `_SP` instead
of `SP` so that `SP` can be a valid non-space tag. (Chinese has a
non-space tag `SP` which was overriding the mapping of `_SP` to
`SPACE`.)
2020-05-26 14:57:13 +02:00
Matthew Honnibal
b0c0271a48
Merge pull request #5506 from adrianeboyd/bugfix/pl-lemmatizer-lookup-loading
...
Fix Polish lemmatizer for deserialized models
2020-05-26 12:31:25 +02:00
Matthew Honnibal
a44d51a3d8
Merge pull request #5496 from explosion/docs/unicode-str
...
unicode -> str consistency
2020-05-26 10:30:37 +02:00
Adriane Boyd
1eed101be9
Fix Polish lemmatizer for deserialized models
...
Restructure Polish lemmatizer not to depend on lookups data in
`__init__` since the lemmatizer is initialized before the lookups data
is loaded from a saved model. The lookups tables are accessed first in
`__call__` instead once the data is available.
2020-05-26 09:56:12 +02:00