Commit Graph

12118 Commits

Author SHA1 Message Date
svlandeg
c5ac382f0a fix name clash 2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf additional test with discard_oversize=False 2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c extending algorithm to deal better with edge cases 2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60 it's only oversized if the tolerance level is also exceeded 2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7 fix comments 2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3 slightly more challenging unit test 2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c using overflow buffer for examples within the tolerance margin 2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5 add test for minibatch util 2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99 bugfix of the bugfix 2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d Remove unnecessary check 2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581 Add rudimentary version checks on model load 2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936 rewrite minibatch_by_words function 2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886 add oversize examples before StopIteration returns 2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1 remove Tensorizer 2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement (#5531)
* Spanish tokenizer exception additions. Added Spanish question examples

* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526) 2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7 Preserve _SP when filtering tag map in Tagger
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps 2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09 Add onto-joint experiment file 2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5 Add warning 2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417 Merge branch 'develop' into feature/improve-model-version-deps 2020-05-30 15:20:20 +02:00
Ines Montani
368182776e Tidy up dependencies 2020-05-30 15:19:53 +02:00
Ines Montani
b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf Don't override spaCy version 2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani
bed62991ad Tidy up requirements 2020-05-30 14:59:55 +02:00
svlandeg
15134ef611 fix deserialization order 2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202 Revert "Remove peeking from Parser.begin_training (#5456)"
This reverts commit 9393253b66.

The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d prevent loading a pretrained Tok2Vec layer AND pretrained components 2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197 Remove MorphAnalysis __str__ and __repr__ 2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667 fix description 2020-05-29 13:52:39 +02:00
svlandeg
5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Ines Montani
4fd087572a WIP: improve model version deps 2020-05-28 12:51:37 +02:00
Matthw Honnibal
58750b06f8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-27 22:18:36 +02:00
Matthew Honnibal
aecd1437cc
Merge pull request #5508 from adrianeboyd/bugfix/tag-map-sp-tag
Prefer _SP over SP for default tag map space attrs
2020-05-27 20:39:40 +02:00
Matthew Honnibal
e7ac12b598
Merge pull request #5514 from adrianeboyd/bugfix/load-vector-name
Improve vector name loading from model meta
2020-05-27 20:39:23 +02:00
Adriane Boyd
25de2a2191 Improve vector name loading from model meta 2020-05-27 14:48:54 +02:00
adrianeboyd
aad0610a85
Map NR to PROPN (#5512) 2020-05-26 22:30:53 +02:00
Sofie Van Landeghem
f00488ab30
Update train_intent_parser.py 2020-05-26 16:41:39 +02:00
Adriane Boyd
b6b5908f5e Prefer _SP over SP for default tag map space attrs
If `_SP` is already in the tag map, use the mapping from `_SP` instead
of `SP` so that `SP` can be a valid non-space tag. (Chinese has a
non-space tag `SP` which was overriding the mapping of `_SP` to
`SPACE`.)
2020-05-26 14:57:13 +02:00
Matthew Honnibal
b0c0271a48
Merge pull request #5506 from adrianeboyd/bugfix/pl-lemmatizer-lookup-loading
Fix Polish lemmatizer for deserialized models
2020-05-26 12:31:25 +02:00
Matthew Honnibal
a44d51a3d8
Merge pull request #5496 from explosion/docs/unicode-str
unicode -> str consistency
2020-05-26 10:30:37 +02:00
Adriane Boyd
1eed101be9 Fix Polish lemmatizer for deserialized models
Restructure Polish lemmatizer not to depend on lookups data in
`__init__` since the lemmatizer is initialized before the lookups data
is loaded from a saved model. The lookups tables are accessed first in
`__call__` instead once the data is available.
2020-05-26 09:56:12 +02:00
adrianeboyd
69897b45d8
Handle spacy.pex renaming in Makefile (#5503) 2020-05-25 16:39:22 +02:00