spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-07 12:54:56 +03:00

Author	SHA1	Message	Date
Ines Montani	56a9d1b78c	Merge pull request #5479 from explosion/master-tmp	2020-06-03 15:31:27 +02:00
svlandeg	ddf8244df9	add normalize option to distance metric	2020-06-03 14:52:54 +02:00
svlandeg	ffe0451d09	pretrain from config	2020-06-03 14:45:00 +02:00
Ines Montani	a8875d4a4b	Fix typo	2020-06-03 14:42:39 +02:00
Ines Montani	4e0610d0d4	Update warning codes	2020-06-03 14:37:09 +02:00
Ines Montani	810fce3bb1	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
Adriane Boyd	b0ee76264b	Remove debugging	2020-06-03 14:20:42 +02:00
Adriane Boyd	1d8168d1fd	Fix problems with lower and whitespace in variants Port relevant changes from #5361: * Initialize lower flag explicitly * Handle whitespace words from GoldParse correctly when creating raw text with orth variants	2020-06-03 14:15:58 +02:00
Adriane Boyd	10d938f221	Update default cfg dir in train CLI	2020-06-03 14:15:50 +02:00
Adriane Boyd	f1f9c8b417	Port train CLI updates Updates from #5362 and fix from #5387: * `train`: * if training on GPU, only run evaluation/timing on CPU in the first iteration * if training is aborted, exit with a non-0 exit status	2020-06-03 14:03:43 +02:00
Adriane Boyd	8c758ed1eb	Fix meta path	2020-06-03 12:11:57 +02:00
Adriane Boyd	a57bdeecac	Test util.get_model_meta instead of util.load_model	2020-06-03 12:10:12 +02:00
svlandeg	109bbdab98	update config files with separate dropout for Tok2Vec layer	2020-06-03 11:53:59 +02:00
svlandeg	eac12cbb77	make dropout in embed layers configurable	2020-06-03 11:50:16 +02:00
svlandeg	e91485dfc4	add discard_oversize parameter, move optimizer to training subsection	2020-06-03 10:04:16 +02:00
svlandeg	03c58b488c	prevent infinite loop, custom warning	2020-06-03 10:00:21 +02:00
svlandeg	6504b7f161	Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config	2020-06-03 08:30:16 +02:00
Matthew Honnibal	f74784575c	Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize add oversize examples before StopIteration returns	2020-06-02 22:54:38 +02:00
svlandeg	c5ac382f0a	fix name clash	2020-06-02 22:24:57 +02:00
svlandeg	2bf5111ecf	additional test with discard_oversize=False	2020-06-02 22:09:37 +02:00
svlandeg	aa6271b16c	extending algorithm to deal better with edge cases	2020-06-02 22:05:08 +02:00
svlandeg	f2e162fc60	it's only oversized if the tolerance level is also exceeded	2020-06-02 19:59:04 +02:00
svlandeg	ef834b4cd7	fix comments	2020-06-02 19:50:44 +02:00
svlandeg	6208d322d3	slightly more challenging unit test	2020-06-02 19:47:30 +02:00
svlandeg	6651fafd5c	using overflow buffer for examples within the tolerance margin	2020-06-02 19:43:39 +02:00
svlandeg	85b0597ed5	add test for minibatch util	2020-06-02 18:26:21 +02:00
svlandeg	5b350a6c99	bugfix of the bugfix	2020-06-02 17:49:33 +02:00
Adriane Boyd	75f08ad62d	Remove unnecessary check	2020-06-02 17:41:25 +02:00
Adriane Boyd	bbc1836581	Add rudimentary version checks on model load	2020-06-02 17:33:48 +02:00
svlandeg	fdfd822936	rewrite minibatch_by_words function	2020-06-02 15:22:54 +02:00
svlandeg	ec52e7f886	add oversize examples before StopIteration returns	2020-06-02 13:21:55 +02:00
svlandeg	e0f9f448f1	remove Tensorizer	2020-06-01 23:38:48 +02:00
Leo	925e938570	Spanish tokenizer exception and examples improvement (#5531 ) * Spanish tokenizer exception additions. Added Spanish question examples * erased slang tokenization examples	2020-06-01 18:18:34 +02:00
Matthew Honnibal	67af3a32b0	Merge pull request #5527 from adrianeboyd/bugfix/tagger-sp-tag-map Preserve _SP when filtering tag map in Tagger	2020-06-01 12:00:21 +02:00
Leo	c21c308ecb	corrected issue #5524 changed <U+009C> 'STRING TERMINATOR' for <U+0153> LATIN SMALL LIGATURE OE' (#5526 )	2020-05-31 22:08:12 +02:00
Leo	7d5a89661e	contributor agreement signed (#5525 )	2020-05-31 20:13:39 +02:00
Adriane Boyd	a005ccd6d7	Preserve _SP when filtering tag map in Tagger To allow "SP" as a tag (for Chinese OntoNotes), preserve "_SP" if present as the reference `SPACE` POS in the tag map in `Tagger.begin_training()`.	2020-05-31 19:57:54 +02:00
Ines Montani	b5ae2edcba	Merge pull request #5516 from explosion/feature/improve-model-version-deps	2020-05-31 12:54:01 +02:00
Matthw Honnibal	cd5f748e09	Add onto-joint experiment file	2020-05-30 20:27:47 +02:00
Matthw Honnibal	d1c2e88d0f	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-05-30 19:23:12 +02:00
Matthew Honnibal	758a4b154d	Merge pull request #5521 from svlandeg/bugfix/vectors-from-disk fix deserialization order	2020-05-30 18:38:23 +02:00
Ines Montani	dc186afdc5	Add warning	2020-05-30 15:34:54 +02:00
Ines Montani	2bdf787417	Merge branch 'develop' into feature/improve-model-version-deps	2020-05-30 15:20:20 +02:00
Ines Montani	368182776e	Tidy up dependencies	2020-05-30 15:19:53 +02:00
Ines Montani	b7aff6020c	Make functions more general purpose and update docstrings and tests	2020-05-30 15:18:53 +02:00
Ines Montani	a7e370bcbf	Don't override spaCy version	2020-05-30 15:03:18 +02:00
Ines Montani	e47e5a4b10	Use more sophisticated version parsing logic	2020-05-30 15:01:58 +02:00
Ines Montani	bed62991ad	Tidy up requirements	2020-05-30 14:59:55 +02:00
svlandeg	15134ef611	fix deserialization order	2020-05-30 12:53:32 +02:00
Matthew Honnibal	64adda3202	Revert "Remove peeking from Parser.begin_training (#5456 )" This reverts commit `9393253b66`. The model shouldn't need to see all examples, and actually in v3 there's no equivalent step. All examples are provided to the component, for the component to do stuff like figuring out the labels. The model just needs to do stuff like shape inference.	2020-05-29 23:21:55 +02:00

... 76 77 78 79 80 ...

15687 Commits