Commit Graph

11626 Commits

Author SHA1 Message Date
Leo
925e938570
Spanish tokenizer exception and examples improvement (#5531)
* Spanish tokenizer exception additions. Added Spanish question examples

* erased slang tokenization examples
2020-06-01 18:18:34 +02:00
Matthew Honnibal
67af3a32b0
Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map
Preserve _SP when filtering tag map in Tagger
2020-06-01 12:00:21 +02:00
Leo
c21c308ecb
corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526) 2020-05-31 22:08:12 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Adriane Boyd
a005ccd6d7 Preserve _SP when filtering tag map in Tagger
To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if
present as the reference `SPACE` POS in the tag map in
`Tagger.begin_training()`.
2020-05-31 19:57:54 +02:00
Matthew Honnibal
758a4b154d
Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk
fix deserialization order
2020-05-30 18:38:23 +02:00
svlandeg
15134ef611 fix deserialization order 2020-05-30 12:53:32 +02:00
Matthew Honnibal
64adda3202 Revert "Remove peeking from Parser.begin_training (#5456)"
This reverts commit 9393253b66.

The model shouldn't need to see all examples, and actually in v3 there's
no equivalent step. All examples are provided to the component, for the
component to do stuff like figuring out the labels. The model just needs
to do stuff like shape inference.
2020-05-29 23:21:55 +02:00
Matthew Honnibal
85f1acfaa0
Merge pull request #5517 from adrianeboyd/bugfix/morph-repr
Remove MorphAnalysis __str__ and __repr__
2020-05-29 19:20:56 +02:00
Matthew Honnibal
2a8137aba9
Merge pull request #5518 from svlandeg/fix/pretrain-docs
Pretrain fixes
2020-05-29 19:20:20 +02:00
svlandeg
291483157d prevent loading a pretrained Tok2Vec layer AND pretrained components 2020-05-29 17:38:33 +02:00
Adriane Boyd
e1b7cbd197 Remove MorphAnalysis __str__ and __repr__ 2020-05-29 14:33:47 +02:00
svlandeg
04ba37b667 fix description 2020-05-29 13:52:39 +02:00
svlandeg
5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Matthew Honnibal
aecd1437cc
Merge pull request #5508 from adrianeboyd/bugfix/tag-map-sp-tag
Prefer _SP over SP for default tag map space attrs
2020-05-27 20:39:40 +02:00
Matthew Honnibal
e7ac12b598
Merge pull request #5514 from adrianeboyd/bugfix/load-vector-name
Improve vector name loading from model meta
2020-05-27 20:39:23 +02:00
Adriane Boyd
25de2a2191 Improve vector name loading from model meta 2020-05-27 14:48:54 +02:00
adrianeboyd
aad0610a85
Map NR to PROPN (#5512) 2020-05-26 22:30:53 +02:00
Sofie Van Landeghem
f00488ab30
Update train_intent_parser.py 2020-05-26 16:41:39 +02:00
Adriane Boyd
b6b5908f5e Prefer _SP over SP for default tag map space attrs
If `_SP` is already in the tag map, use the mapping from `_SP` instead
of `SP` so that `SP` can be a valid non-space tag. (Chinese has a
non-space tag `SP` which was overriding the mapping of `_SP` to
`SPACE`.)
2020-05-26 14:57:13 +02:00
Matthew Honnibal
b0c0271a48
Merge pull request #5506 from adrianeboyd/bugfix/pl-lemmatizer-lookup-loading
Fix Polish lemmatizer for deserialized models
2020-05-26 12:31:25 +02:00
Adriane Boyd
1eed101be9 Fix Polish lemmatizer for deserialized models
Restructure Polish lemmatizer not to depend on lookups data in
`__init__` since the lemmatizer is initialized before the lookups data
is loaded from a saved model. The lookups tables are accessed first in
`__call__` instead once the data is available.
2020-05-26 09:56:12 +02:00
adrianeboyd
69897b45d8
Handle spacy.pex renaming in Makefile (#5503) 2020-05-25 16:39:22 +02:00
adrianeboyd
c9c7b135c0
Update Makefile for v2.3.0 (#5502) 2020-05-25 15:24:24 +02:00
Ines Montani
24ef6680fa
Merge pull request #5499 from adrianeboyd/chore/bump-version-deps-v2.3.0 2020-05-25 13:25:45 +02:00
Ines Montani
ade4767e06
Merge pull request #5498 from adrianeboyd/bugfix/phrasematcher-unpickle-new-api 2020-05-25 13:25:07 +02:00
Adriane Boyd
3f727bc539 Switch to v2.3.0.dev0 2020-05-25 12:57:20 +02:00
Adriane Boyd
736f3cb5af Bump version and deps for v2.3.0
* spacy to v2.3.0
* thinc to v7.4.1
* spacy-lookups-data to v0.3.2
2020-05-25 12:03:49 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Adriane Boyd
e06ca7ea24 Switch to new add API in PhraseMatcher unpickle 2020-05-25 11:22:47 +02:00
Sofie Van Landeghem
ae1c179f3a
Remove the nested quote 2020-05-23 17:58:19 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Ines Montani
6728747f71
Merge pull request #5486 from explosion/fix/compat-py2 2020-05-22 15:47:21 +02:00
Matthew Honnibal
f6078d866a
Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match
Revert token_match priority changes from #4374 and extend token match options
2020-05-22 14:42:51 +02:00
Ines Montani
c685ee734a Fix compat for v2.x branch 2020-05-22 14:22:36 +02:00
Ines Montani
65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00
Matthew Honnibal
8cb16c7120
Merge pull request #5485 from adrianeboyd/bugfix/retokenizer-merge-0-length-5450
Disallow merging 0-length spans
2020-05-22 13:28:35 +02:00
Adriane Boyd
e4a1b5dab1 Rename to url_match
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd
730fa493a4 Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-22 12:18:00 +02:00
Adriane Boyd
71fe61fdcd Disallow merging 0-length spans 2020-05-22 10:14:34 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Matthew Honnibal
e1cb7e838b
Merge pull request #5481 from explosion/feature/blank-shortcut-v2
Add blank:{lang} shortcut support to util.load_model
2020-05-22 00:08:23 +02:00
Ines Montani
ee027de032 Update universe and display of videos [ci skip] 2020-05-21 21:54:23 +02:00
Ines Montani
2250380816
Merge pull request #5482 from explosion/fix/backwards-compat-super 2020-05-21 21:51:46 +02:00
Ines Montani
891fa59009 Use backwards-compatible super() 2020-05-21 20:52:48 +02:00
Matthew Honnibal
5ce02c1b17
Merge pull request #5470 from svlandeg/bugfix/noun-chunks
Bugfix in noun chunks
2020-05-21 20:51:31 +02:00
Ines Montani
53da6bd672 Add course to landing [ci skip] 2020-05-21 20:45:33 +02:00
Ines Montani
cb02bff0eb Add blank:{lang} shortcut to util.load_mode 2020-05-21 20:24:07 +02:00
Ines Montani
0f1beb5ff2 Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
svlandeg
51715b9f72 span / noun chunk has +1 because end is exclusive 2020-05-21 19:56:56 +02:00