spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-25 17:36:30 +03:00

Author	SHA1	Message	Date
svlandeg	6504b7f161	Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config	2020-06-03 08:30:16 +02:00
svlandeg	c5ac382f0a	fix name clash	2020-06-02 22:24:57 +02:00
svlandeg	2bf5111ecf	additional test with discard_oversize=False	2020-06-02 22:09:37 +02:00
svlandeg	aa6271b16c	extending algorithm to deal better with edge cases	2020-06-02 22:05:08 +02:00
svlandeg	f2e162fc60	it's only oversized if the tolerance level is also exceeded	2020-06-02 19:59:04 +02:00
svlandeg	ef834b4cd7	fix comments	2020-06-02 19:50:44 +02:00
svlandeg	6208d322d3	slightly more challenging unit test	2020-06-02 19:47:30 +02:00
svlandeg	6651fafd5c	using overflow buffer for examples within the tolerance margin	2020-06-02 19:43:39 +02:00
svlandeg	85b0597ed5	add test for minibatch util	2020-06-02 18:26:21 +02:00
svlandeg	5b350a6c99	bugfix of the bugfix	2020-06-02 17:49:33 +02:00
Adriane Boyd	75f08ad62d	Remove unnecessary check	2020-06-02 17:41:25 +02:00
Adriane Boyd	bbc1836581	Add rudimentary version checks on model load	2020-06-02 17:33:48 +02:00
svlandeg	fdfd822936	rewrite minibatch_by_words function	2020-06-02 15:22:54 +02:00
svlandeg	ec52e7f886	add oversize examples before StopIteration returns	2020-06-02 13:21:55 +02:00
svlandeg	e0f9f448f1	remove Tensorizer	2020-06-01 23:38:48 +02:00
Leo	925e938570	Spanish tokenizer exception and examples improvement (#5531 ) * Spanish tokenizer exception additions. Added Spanish question examples * erased slang tokenization examples	2020-06-01 18:18:34 +02:00
Matthew Honnibal	67af3a32b0	Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map Preserve _SP when filtering tag map in Tagger	2020-06-01 12:00:21 +02:00
Leo	c21c308ecb	corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526 )	2020-05-31 22:08:12 +02:00
Adriane Boyd	a005ccd6d7	Preserve _SP when filtering tag map in Tagger To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if present as the reference `SPACE` POS in the tag map in `Tagger.begin_training()`.	2020-05-31 19:57:54 +02:00
Ines Montani	b5ae2edcba	Merge pull request #5516 from explosion/feature/improve-model-version-deps	2020-05-31 12:54:01 +02:00
Ines Montani	dc186afdc5	Add warning	2020-05-30 15:34:54 +02:00
Ines Montani	b7aff6020c	Make functions more general purpose and update docstrings and tests	2020-05-30 15:18:53 +02:00
Ines Montani	a7e370bcbf	Don't override spaCy version	2020-05-30 15:03:18 +02:00
Ines Montani	e47e5a4b10	Use more sophisticated version parsing logic	2020-05-30 15:01:58 +02:00
svlandeg	15134ef611	fix deserialization order	2020-05-30 12:53:32 +02:00
Matthew Honnibal	64adda3202	Revert "Remove peeking from Parser.begin_training (#5456 )" This reverts commit `9393253b66`. The model shouldn't need to see all examples, and actually in v3 there's no equivalent step. All examples are provided to the component, for the component to do stuff like figuring out the labels. The model just needs to do stuff like shape inference.	2020-05-29 23:21:55 +02:00
Matthew Honnibal	85f1acfaa0	Merge pull request #5517 from adrianeboyd/bugfix/morph-repr Remove MorphAnalysis __str__ and __repr__	2020-05-29 19:20:56 +02:00
svlandeg	291483157d	prevent loading a pretrained Tok2Vec layer AND pretrained components	2020-05-29 17:38:33 +02:00
Adriane Boyd	e1b7cbd197	Remove MorphAnalysis __str__ and __repr__	2020-05-29 14:33:47 +02:00
Ines Montani	4fd087572a	WIP: improve model version deps	2020-05-28 12:51:37 +02:00
Matthw Honnibal	58750b06f8	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-05-27 22:18:36 +02:00
Matthew Honnibal	aecd1437cc	Merge pull request #5508 from adrianeboyd/bugfix/tag-map-sp-tag Prefer _SP over SP for default tag map space attrs	2020-05-27 20:39:40 +02:00
Adriane Boyd	25de2a2191	Improve vector name loading from model meta	2020-05-27 14:48:54 +02:00
adrianeboyd	aad0610a85	Map NR to PROPN (#5512 )	2020-05-26 22:30:53 +02:00
Adriane Boyd	b6b5908f5e	Prefer _SP over SP for default tag map space attrs If `_SP` is already in the tag map, use the mapping from `_SP` instead of `SP` so that `SP` can be a valid non-space tag. (Chinese has a non-space tag `SP` which was overriding the mapping of `_SP` to `SPACE`.)	2020-05-26 14:57:13 +02:00
Adriane Boyd	1eed101be9	Fix Polish lemmatizer for deserialized models Restructure Polish lemmatizer not to depend on lookups data in `__init__` since the lemmatizer is initialized before the lookups data is loaded from a saved model. The lookups tables are accessed first in `__call__` instead once the data is available.	2020-05-26 09:56:12 +02:00
Ines Montani	24ef6680fa	Merge pull request #5499 from adrianeboyd/chore/bump-version-deps-v2.3.0	2020-05-25 13:25:45 +02:00
Adriane Boyd	3f727bc539	Switch to v2.3.0.dev0	2020-05-25 12:57:20 +02:00
Adriane Boyd	736f3cb5af	Bump version and deps for v2.3.0 * spacy to v2.3.0 * thinc to v7.4.1 * spacy-lookups-data to v0.3.2	2020-05-25 12:03:49 +02:00
Adriane Boyd	e06ca7ea24	Switch to new add API in PhraseMatcher unpickle	2020-05-25 11:22:47 +02:00
Ines Montani	1a15896ba9	unicode -> str consistency [ci skip]	2020-05-24 18:51:10 +02:00
Ines Montani	5d3806e059	unicode -> str consistency	2020-05-24 17:20:58 +02:00
Ines Montani	387c7aba15	Update test	2020-05-24 14:55:16 +02:00
Ines Montani	f9786d765e	Simplify is_package check	2020-05-24 14:48:56 +02:00
Matthw Honnibal	2d9de8684d	Support use_pytorch_for_gpu_memory config	2020-05-22 23:10:40 +02:00
Ines Montani	4465cad6c5	Rename spacy.analysis to spacy.pipe_analysis	2020-05-22 17:42:06 +02:00
Ines Montani	25d6ed3fb8	Merge pull request #5489 from explosion/feature/connected-components	2020-05-22 17:40:11 +02:00
Ines Montani	841c05b47b	Merge pull request #5490 from explosion/fix/remove-jsonschema	2020-05-22 17:39:54 +02:00
Ines Montani	569a65b60e	Auto-format	2020-05-22 16:55:42 +02:00
Ines Montani	d844528c5f	Add test for is_compatible_model	2020-05-22 16:55:15 +02:00
Ines Montani	12b7be1d98	Remove jsonschema from dependencies	2020-05-22 16:49:26 +02:00
Matthew Honnibal	f7f6df7275	Move to spacy.analysis	2020-05-22 16:43:18 +02:00
Matthew Honnibal	78d79d94ce	Guess set_annotations=True in nlp.update During `nlp.update`, components can be passed a boolean set_annotations to indicate whether they should assign annotations to the `Doc`. This needs to be called if downstream components expect to use the annotations during training, e.g. if we wanted to use tagger features in the parser. Components can specify their assignments and requirements, so we can figure out which components have these inter-dependencies. After figuring this out, we can guess whether to pass set_annotations=True. We could also call set_annotations=True always, or even just have this as the only behaviour. The downside of this is that it would require the `Doc` objects to be created afresh to avoid problematic modifications. One approach would be to make a fresh copy of the `Doc` objects within `nlp.update()`, so that we can write to the objects without any problems. If we do that, we can drop this logic and also drop the `set_annotations` mechanism. I would be fine with that approach, although it runs the risk of introducing some performance overhead, and we'll have to take care to copy all extension attributes etc.	2020-05-22 15:55:45 +02:00
Ines Montani	6728747f71	Merge pull request #5486 from explosion/fix/compat-py2	2020-05-22 15:47:21 +02:00
Ines Montani	6e6db6afb6	Better model compatibility and validation	2020-05-22 15:42:46 +02:00
Matthew Honnibal	f6078d866a	Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match Revert token_match priority changes from #4374 and extend token match options	2020-05-22 14:42:51 +02:00
Ines Montani	c685ee734a	Fix compat for v2.x branch	2020-05-22 14:22:36 +02:00
Adriane Boyd	e4a1b5dab1	Rename to url_match Rename to `url_match` and update docs.	2020-05-22 12:41:03 +02:00
Adriane Boyd	730fa493a4	Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match	2020-05-22 12:18:00 +02:00
Adriane Boyd	71fe61fdcd	Disallow merging 0-length spans	2020-05-22 10:14:34 +02:00
Matthew Honnibal	93c4d13588	Merge pull request #5264 from lfiedler/issue-5230 Fix ResourceWarnings during unittest	2020-05-22 00:31:07 +02:00
Matthew Honnibal	e1cb7e838b	Merge pull request #5481 from explosion/feature/blank-shortcut-v2 Add blank:{lang} shortcut support to util.load_model	2020-05-22 00:08:23 +02:00
Ines Montani	2250380816	Merge pull request #5482 from explosion/fix/backwards-compat-super	2020-05-21 21:51:46 +02:00
Ines Montani	891fa59009	Use backwards-compatible super()	2020-05-21 20:52:48 +02:00
Matthew Honnibal	5ce02c1b17	Merge pull request #5470 from svlandeg/bugfix/noun-chunks Bugfix in noun chunks	2020-05-21 20:51:31 +02:00
Matthw Honnibal	25b51f4fc8	Set version to v3.0.0.dev9	2020-05-21 20:47:52 +02:00
Matthw Honnibal	bc94fdabd0	Fix begin_training	2020-05-21 20:46:21 +02:00
Matthw Honnibal	d507ac28d8	Fix shape inference	2020-05-21 20:46:10 +02:00
Ines Montani	cb02bff0eb	Add blank:{lang} shortcut to util.load_mode	2020-05-21 20:24:07 +02:00
Matthw Honnibal	df87c32a40	Pass smaller doc sample into model initialize	2020-05-21 20:17:24 +02:00
Ines Montani	581bda9f98	Update senter test and auto-format	2020-05-21 20:17:14 +02:00
Ines Montani	0f1beb5ff2	Tidy up and avoid absolute spacy imports in core	2020-05-21 20:05:03 +02:00
svlandeg	51715b9f72	span / noun chunk has +1 because end is exclusive	2020-05-21 19:56:56 +02:00
Adriane Boyd	132b2a6898	Merge remote-tracking branch 'upstream/master-tmp' into HEAD	2020-05-21 19:50:30 +02:00
Adriane Boyd	17ee9ab53a	Fix _SP/POS=SPACE in strings serialization tests	2020-05-21 19:49:08 +02:00
Ines Montani	245f91df78	Fix merge issues	2020-05-21 19:42:13 +02:00
Matthw Honnibal	3b5cfec1fc	Tweak memory management in train_from_config	2020-05-21 19:32:04 +02:00
Matthw Honnibal	f075655deb	Fix shape inference in begin_training	2020-05-21 19:26:29 +02:00
svlandeg	84d5b7ad0a	Merge remote-tracking branch 'upstream/master' into bugfix/noun-chunks # Conflicts: # spacy/lang/el/syntax_iterators.py # spacy/lang/en/syntax_iterators.py # spacy/lang/fa/syntax_iterators.py # spacy/lang/fr/syntax_iterators.py # spacy/lang/id/syntax_iterators.py # spacy/lang/nb/syntax_iterators.py # spacy/lang/sv/syntax_iterators.py	2020-05-21 19:19:50 +02:00
svlandeg	f7d10da555	avoid unnecessary loop to check overlapping noun chunks	2020-05-21 19:15:57 +02:00
Ines Montani	631e20d0c6	Fix test and schemas	2020-05-21 19:01:02 +02:00
Ines Montani	d34fc0915e	Remove serialization getter	2020-05-21 18:48:21 +02:00
Ines Montani	f44897e4c6	Update warning IDs	2020-05-21 18:39:11 +02:00
Ines Montani	24f72c669c	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
Ines Montani	c6ec19c844	Add missing declaration	2020-05-21 17:30:05 +02:00
Matthew Honnibal	884d9b060d	Merge pull request #5466 from adrianeboyd/feature/omit-extra-lexeme-info Add option to omit extra lexeme tables in CLI	2020-05-21 16:40:02 +02:00
Matthew Honnibal	e6c4c1a507	Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner Improve handling of NER in CoNLL-U MISC	2020-05-21 16:39:46 +02:00
Matthew Honnibal	26cd6a0229	Merge pull request #5462 from adrianeboyd/feature/lemmatizer-all-upos Extend lemmatizer rules for all UPOS tags	2020-05-21 16:05:31 +02:00
Matthew Honnibal	cad9b290a2	Merge branch 'master' into feature/omit-extra-lexeme-info	2020-05-21 16:04:24 +02:00
Matthew Honnibal	1f572ce89b	Merge pull request #5473 from explosion/fix/travis-tests Fix Python 2.7 compat	2020-05-21 15:56:16 +02:00
Ines Montani	a9cb2882cb	Rename argument: doc_or_span/obj -> doclike (#5463 ) * doc_or_span -> obj * Revert "doc_or_span -> obj" This reverts commit `78bb9ff5e0`. * obj -> doclike * Refer to correct object	2020-05-21 15:17:39 +02:00
Ines Montani	bea863acd2	Fix naming conflict and formatting	2020-05-21 14:24:38 +02:00
Ines Montani	bd6353715a	Merge branch 'master' into fix/travis-tests	2020-05-21 14:23:04 +02:00
Ines Montani	d8f3190c0a	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
Ines Montani	56de520afd	Try to fix tests on Travis (2.7)	2020-05-21 14:04:57 +02:00
adrianeboyd	d45602bc11	Merge branch 'master' into feature/omit-extra-lexeme-info	2020-05-21 10:26:01 +02:00
svlandeg	b221bcf1ba	fixing all languages	2020-05-21 00:17:28 +02:00
svlandeg	b509a3e7fc	fix: use actual range in 'seen' instead of subtree	2020-05-20 23:06:39 +02:00
svlandeg	36a94c409a	failing test to reproduce overlapping spans problem	2020-05-20 23:06:03 +02:00
adrianeboyd	49ef06d793	Add option for base model in init-model CLI (#5467 ) Intended for languages like Chinese with a custom tokenizer.	2020-05-20 18:49:11 +02:00

1 2 3 4 5 ...

7025 Commits