Commit Graph

11767 Commits

Author SHA1 Message Date
svlandeg
5b350a6c99 bugfix of the bugfix 2020-06-02 17:49:33 +02:00
Adriane Boyd
75f08ad62d Remove unnecessary check 2020-06-02 17:41:25 +02:00
Adriane Boyd
bbc1836581 Add rudimentary version checks on model load 2020-06-02 17:33:48 +02:00
svlandeg
fdfd822936 rewrite minibatch_by_words function 2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886 add oversize examples before StopIteration returns 2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1 remove Tensorizer 2020-06-01 23:38:48 +02:00
Leo
925e938570
Spanish tokenizer exception and examples improvement (#5531)
* Spanish tokenizer exception additions. Added Spanish question examples

* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526) 2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7 Preserve _SP when filtering tag map in Tagger
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps 2020-05-31 12:54:01 +02:00
Matthw Honnibal
cd5f748e09 Add onto-joint experiment file 2020-05-30 20:27:47 +02:00
Matthw Honnibal
d1c2e88d0f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-30 19:23:12 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
fix deserialization order
2020-05-30 18:38:23 +02:00
Ines Montani
dc186afdc5 Add warning 2020-05-30 15:34:54 +02:00
Ines Montani
2bdf787417 Merge branch 'develop' into feature/improve-model-version-deps 2020-05-30 15:20:20 +02:00
Ines Montani
368182776e Tidy up dependencies 2020-05-30 15:19:53 +02:00
Ines Montani
b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf Don't override spaCy version 2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani
bed62991ad Tidy up requirements 2020-05-30 14:59:55 +02:00
svlandeg
15134ef611 fix deserialization order 2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202 Revert "Remove peeking from Parser.begin_training (#5456)"
This reverts commit 9393253b66.

The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d prevent loading a pretrained Tok2Vec layer AND pretrained components 2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197 Remove MorphAnalysis __str__ and __repr__ 2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667 fix description 2020-05-29 13:52:39 +02:00
svlandeg
5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Ines Montani
4fd087572a WIP: improve model version deps 2020-05-28 12:51:37 +02:00
Matthw Honnibal
58750b06f8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-27 22:18:36 +02:00
Matthew Honnibal
aecd1437cc
Merge pull request #5508 from adrianeboyd/bugfix/tag-map-sp-tag
Prefer _SP over SP for default tag map space attrs
2020-05-27 20:39:40 +02:00
Matthew Honnibal
e7ac12b598
Merge pull request #5514 from adrianeboyd/bugfix/load-vector-name
Improve vector name loading from model meta
2020-05-27 20:39:23 +02:00
Adriane Boyd
25de2a2191 Improve vector name loading from model meta 2020-05-27 14:48:54 +02:00
adrianeboyd
aad0610a85
Map NR to PROPN (#5512) 2020-05-26 22:30:53 +02:00
Sofie Van Landeghem
f00488ab30
Update train_intent_parser.py 2020-05-26 16:41:39 +02:00
Adriane Boyd
b6b5908f5e Prefer _SP over SP for default tag map space attrs
If `_SP` is already in the tag map, use the mapping from `_SP` instead
of `SP` so that `SP` can be a valid non-space tag. (Chinese has a
non-space tag `SP` which was overriding the mapping of `_SP` to
`SPACE`.)
2020-05-26 14:57:13 +02:00
Matthew Honnibal
b0c0271a48
Merge pull request #5506 from adrianeboyd/bugfix/pl-lemmatizer-lookup-loading
Fix Polish lemmatizer for deserialized models
2020-05-26 12:31:25 +02:00
Matthew Honnibal
a44d51a3d8
Merge pull request #5496 from explosion/docs/unicode-str
unicode -> str consistency
2020-05-26 10:30:37 +02:00
Adriane Boyd
1eed101be9 Fix Polish lemmatizer for deserialized models
Restructure Polish lemmatizer not to depend on lookups data in
`__init__` since the lemmatizer is initialized before the lookups data
is loaded from a saved model. The lookups tables are accessed first in
`__call__` instead once the data is available.
2020-05-26 09:56:12 +02:00
adrianeboyd
69897b45d8
Handle spacy.pex renaming in Makefile (#5503) 2020-05-25 16:39:22 +02:00
adrianeboyd
c9c7b135c0
Update Makefile for v2.3.0 (#5502) 2020-05-25 15:24:24 +02:00
Ines Montani
24ef6680fa
Merge pull request #5499 from adrianeboyd/chore/bump-version-deps-v2.3.0 2020-05-25 13:25:45 +02:00
Ines Montani
ade4767e06
Merge pull request #5498 from adrianeboyd/bugfix/phrasematcher-unpickle-new-api 2020-05-25 13:25:07 +02:00
Adriane Boyd
3f727bc539 Switch to v2.3.0.dev0 2020-05-25 12:57:20 +02:00
Adriane Boyd
736f3cb5af Bump version and deps for v2.3.0
* spacy to v2.3.0
* thinc to v7.4.1
* spacy-lookups-data to v0.3.2
2020-05-25 12:03:49 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Adriane Boyd
e06ca7ea24 Switch to new add API in PhraseMatcher unpickle 2020-05-25 11:22:47 +02:00
Ines Montani
1a15896ba9 unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00