Commit Graph

12044 Commits

Author SHA1 Message Date
Matthew Honnibal
7a73a9dcf6
Merge pull request #5488 from explosion/feature/better-model-compat
Better model compatibility and validation
2020-05-22 16:44:29 +02:00
Matthew Honnibal
f7f6df7275 Move to spacy.analysis 2020-05-22 16:43:18 +02:00
Matthew Honnibal
78d79d94ce Guess set_annotations=True in nlp.update
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.

Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.

We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
Ines Montani
6728747f71
Merge pull request #5486 from explosion/fix/compat-py2 2020-05-22 15:47:21 +02:00
Ines Montani
6e6db6afb6 Better model compatibility and validation 2020-05-22 15:42:46 +02:00
Matthew Honnibal
f6078d866a
Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match
Revert token_match priority changes from #4374 and extend token match options
2020-05-22 14:42:51 +02:00
Ines Montani
c685ee734a Fix compat for v2.x branch 2020-05-22 14:22:36 +02:00
Ines Montani
f30b9d3038 Merge branch 'master' into spacy.io 2020-05-22 13:50:37 +02:00
Ines Montani
65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00
Matthew Honnibal
8cb16c7120
Merge pull request #5485 from adrianeboyd/bugfix/retokenizer-merge-0-length-5450
Disallow merging 0-length spans
2020-05-22 13:28:35 +02:00
Adriane Boyd
e4a1b5dab1 Rename to url_match
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd
730fa493a4 Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-22 12:18:00 +02:00
Adriane Boyd
71fe61fdcd Disallow merging 0-length spans 2020-05-22 10:14:34 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Matthew Honnibal
e1cb7e838b
Merge pull request #5481 from explosion/feature/blank-shortcut-v2
Add blank:{lang} shortcut support to util.load_model
2020-05-22 00:08:23 +02:00
Ines Montani
85064b5c22 Merge branch 'master' into spacy.io 2020-05-21 21:55:04 +02:00
Ines Montani
ee027de032 Update universe and display of videos [ci skip] 2020-05-21 21:54:23 +02:00
Ines Montani
2250380816
Merge pull request #5482 from explosion/fix/backwards-compat-super 2020-05-21 21:51:46 +02:00
Ines Montani
dc94052d6e Merge branch 'master' into spacy.io 2020-05-21 21:01:32 +02:00
Ines Montani
5753b43e60 Tidy up and fix alignment of landing cards (#5317) 2020-05-21 20:56:04 +02:00
Ines Montani
891fa59009 Use backwards-compatible super() 2020-05-21 20:52:48 +02:00
Matthew Honnibal
5ce02c1b17
Merge pull request #5470 from svlandeg/bugfix/noun-chunks
Bugfix in noun chunks
2020-05-21 20:51:31 +02:00
Ines Montani
32c2bb3d99 Add course to landing [ci skip] 2020-05-21 20:50:17 +02:00
Matthw Honnibal
25b51f4fc8 Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
Matthw Honnibal
bc94fdabd0 Fix begin_training 2020-05-21 20:46:21 +02:00
Matthw Honnibal
d507ac28d8 Fix shape inference 2020-05-21 20:46:10 +02:00
Ines Montani
53da6bd672 Add course to landing [ci skip] 2020-05-21 20:45:33 +02:00
Ines Montani
cb02bff0eb Add blank:{lang} shortcut to util.load_mode 2020-05-21 20:24:07 +02:00
Matthw Honnibal
df87c32a40 Pass smaller doc sample into model initialize 2020-05-21 20:17:24 +02:00
Ines Montani
581bda9f98 Update senter test and auto-format 2020-05-21 20:17:14 +02:00
Ines Montani
0f1beb5ff2 Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
svlandeg
51715b9f72 span / noun chunk has +1 because end is exclusive 2020-05-21 19:56:56 +02:00
Adriane Boyd
132b2a6898 Merge remote-tracking branch 'upstream/master-tmp' into HEAD 2020-05-21 19:50:30 +02:00
Adriane Boyd
17ee9ab53a Fix _SP/POS=SPACE in strings serialization tests 2020-05-21 19:49:08 +02:00
Ines Montani
245f91df78 Fix merge issues 2020-05-21 19:42:13 +02:00
Matthw Honnibal
3b5cfec1fc Tweak memory management in train_from_config 2020-05-21 19:32:04 +02:00
Matthw Honnibal
f075655deb Fix shape inference in begin_training 2020-05-21 19:26:29 +02:00
svlandeg
84d5b7ad0a Merge remote-tracking branch 'upstream/master' into bugfix/noun-chunks
# Conflicts:
#	spacy/lang/el/syntax_iterators.py
#	spacy/lang/en/syntax_iterators.py
#	spacy/lang/fa/syntax_iterators.py
#	spacy/lang/fr/syntax_iterators.py
#	spacy/lang/id/syntax_iterators.py
#	spacy/lang/nb/syntax_iterators.py
#	spacy/lang/sv/syntax_iterators.py
2020-05-21 19:19:50 +02:00
svlandeg
f7d10da555 avoid unnecessary loop to check overlapping noun chunks 2020-05-21 19:15:57 +02:00
Matthw Honnibal
1729165e90 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-21 19:11:08 +02:00
Ines Montani
631e20d0c6 Fix test and schemas 2020-05-21 19:01:02 +02:00
Ines Montani
d34fc0915e Remove serialization getter 2020-05-21 18:48:21 +02:00
Ines Montani
f44897e4c6 Update warning IDs 2020-05-21 18:39:11 +02:00
Ines Montani
24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
Ines Montani
c6ec19c844 Add missing declaration 2020-05-21 17:30:05 +02:00
Matthew Honnibal
884d9b060d
Merge pull request #5466 from adrianeboyd/feature/omit-extra-lexeme-info
Add option to omit extra lexeme tables in CLI
2020-05-21 16:40:02 +02:00
Matthew Honnibal
e6c4c1a507
Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner
Improve handling of NER in CoNLL-U MISC
2020-05-21 16:39:46 +02:00
Matthew Honnibal
26cd6a0229
Merge pull request #5462 from adrianeboyd/feature/lemmatizer-all-upos
Extend lemmatizer rules for all UPOS tags
2020-05-21 16:05:31 +02:00
Matthew Honnibal
cad9b290a2
Merge branch 'master' into feature/omit-extra-lexeme-info 2020-05-21 16:04:24 +02:00
Matthew Honnibal
1f572ce89b
Merge pull request #5473 from explosion/fix/travis-tests
Fix Python 2.7 compat
2020-05-21 15:56:16 +02:00