spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 02:46:40 +03:00

Author	SHA1	Message	Date
svlandeg	1775f54a26	small little fixes	2020-06-03 22:17:02 +02:00
svlandeg	07886a3de3	rename init_tok2vec to resume	2020-06-03 22:00:25 +02:00
svlandeg	4ed6278663	small fixes to pretrain config, init_tok2vec TODO	2020-06-03 19:32:40 +02:00
svlandeg	ffe0451d09	pretrain from config	2020-06-03 14:45:00 +02:00
svlandeg	eac12cbb77	make dropout in embed layers configurable	2020-06-03 11:50:16 +02:00
svlandeg	e91485dfc4	add discard_oversize parameter, move optimizer to training subsection	2020-06-03 10:04:16 +02:00
svlandeg	03c58b488c	prevent infinite loop, custom warning	2020-06-03 10:00:21 +02:00
svlandeg	6504b7f161	Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config	2020-06-03 08:30:16 +02:00
svlandeg	c5ac382f0a	fix name clash	2020-06-02 22:24:57 +02:00
svlandeg	2bf5111ecf	additional test with discard_oversize=False	2020-06-02 22:09:37 +02:00
svlandeg	aa6271b16c	extending algorithm to deal better with edge cases	2020-06-02 22:05:08 +02:00
svlandeg	f2e162fc60	it's only oversized if the tolerance level is also exceeded	2020-06-02 19:59:04 +02:00
svlandeg	ef834b4cd7	fix comments	2020-06-02 19:50:44 +02:00
svlandeg	6208d322d3	slightly more challenging unit test	2020-06-02 19:47:30 +02:00
svlandeg	6651fafd5c	using overflow buffer for examples within the tolerance margin	2020-06-02 19:43:39 +02:00
svlandeg	85b0597ed5	add test for minibatch util	2020-06-02 18:26:21 +02:00
svlandeg	5b350a6c99	bugfix of the bugfix	2020-06-02 17:49:33 +02:00
svlandeg	fdfd822936	rewrite minibatch_by_words function	2020-06-02 15:22:54 +02:00
svlandeg	ec52e7f886	add oversize examples before StopIteration returns	2020-06-02 13:21:55 +02:00
svlandeg	e0f9f448f1	remove Tensorizer	2020-06-01 23:38:48 +02:00
Ines Montani	b5ae2edcba	Merge pull request #5516 from explosion/feature/improve-model-version-deps	2020-05-31 12:54:01 +02:00
Ines Montani	dc186afdc5	Add warning	2020-05-30 15:34:54 +02:00
Ines Montani	b7aff6020c	Make functions more general purpose and update docstrings and tests	2020-05-30 15:18:53 +02:00
Ines Montani	a7e370bcbf	Don't override spaCy version	2020-05-30 15:03:18 +02:00
Ines Montani	e47e5a4b10	Use more sophisticated version parsing logic	2020-05-30 15:01:58 +02:00
Ines Montani	4fd087572a	WIP: improve model version deps	2020-05-28 12:51:37 +02:00
Matthw Honnibal	58750b06f8	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-05-27 22:18:36 +02:00
Ines Montani	1a15896ba9	unicode -> str consistency [ci skip]	2020-05-24 18:51:10 +02:00
Ines Montani	5d3806e059	unicode -> str consistency	2020-05-24 17:20:58 +02:00
Ines Montani	387c7aba15	Update test	2020-05-24 14:55:16 +02:00
Ines Montani	f9786d765e	Simplify is_package check	2020-05-24 14:48:56 +02:00
Matthw Honnibal	2d9de8684d	Support use_pytorch_for_gpu_memory config	2020-05-22 23:10:40 +02:00
Ines Montani	4465cad6c5	Rename spacy.analysis to spacy.pipe_analysis	2020-05-22 17:42:06 +02:00
Ines Montani	25d6ed3fb8	Merge pull request #5489 from explosion/feature/connected-components	2020-05-22 17:40:11 +02:00
Ines Montani	841c05b47b	Merge pull request #5490 from explosion/fix/remove-jsonschema	2020-05-22 17:39:54 +02:00
Ines Montani	569a65b60e	Auto-format	2020-05-22 16:55:42 +02:00
Ines Montani	d844528c5f	Add test for is_compatible_model	2020-05-22 16:55:15 +02:00
Ines Montani	12b7be1d98	Remove jsonschema from dependencies	2020-05-22 16:49:26 +02:00
Matthew Honnibal	f7f6df7275	Move to spacy.analysis	2020-05-22 16:43:18 +02:00
Matthew Honnibal	78d79d94ce	Guess set_annotations=True in nlp.update During `nlp.update`, components can be passed a boolean set_annotations to indicate whether they should assign annotations to the `Doc`. This needs to be called if downstream components expect to use the annotations during training, e.g. if we wanted to use tagger features in the parser. Components can specify their assignments and requirements, so we can figure out which components have these inter-dependencies. After figuring this out, we can guess whether to pass set_annotations=True. We could also call set_annotations=True always, or even just have this as the only behaviour. The downside of this is that it would require the `Doc` objects to be created afresh to avoid problematic modifications. One approach would be to make a fresh copy of the `Doc` objects within `nlp.update()`, so that we can write to the objects without any problems. If we do that, we can drop this logic and also drop the `set_annotations` mechanism. I would be fine with that approach, although it runs the risk of introducing some performance overhead, and we'll have to take care to copy all extension attributes etc.	2020-05-22 15:55:45 +02:00
Ines Montani	6e6db6afb6	Better model compatibility and validation	2020-05-22 15:42:46 +02:00
Matthw Honnibal	25b51f4fc8	Set version to v3.0.0.dev9	2020-05-21 20:47:52 +02:00
Matthw Honnibal	bc94fdabd0	Fix begin_training	2020-05-21 20:46:21 +02:00
Matthw Honnibal	d507ac28d8	Fix shape inference	2020-05-21 20:46:10 +02:00
Matthw Honnibal	df87c32a40	Pass smaller doc sample into model initialize	2020-05-21 20:17:24 +02:00
Matthw Honnibal	3b5cfec1fc	Tweak memory management in train_from_config	2020-05-21 19:32:04 +02:00
Matthw Honnibal	f075655deb	Fix shape inference in begin_training	2020-05-21 19:26:29 +02:00
Matthew Honnibal	e6c4c1a507	Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner Improve handling of NER in CoNLL-U MISC	2020-05-21 16:39:46 +02:00
Adriane Boyd	4b229bfc22	Improve handling of NER in CoNLL-U MISC	2020-05-20 18:48:51 +02:00
Matthew Honnibal	609c0ba557	Fix accidentally quadratic runtime in Example.split_sents (#5464 ) * Tidy up train-from-config a bit * Fix accidentally quadratic perf in TokenAnnotation.brackets When we're reading in the gold data, we had a nested loop where we looped over the brackets for each token, looking for brackets that start on that word. This is accidentally quadratic, because we have one bracket per word (for the POS tags). So we had an O(N*2) behaviour here that ended up being pretty slow. To solve this I'm indexing the brackets by their starting word on the TokenAnnotations object, and having a property to provide the previous view. Fixes	2020-05-20 18:48:18 +02:00

1 2 3 4 5 ...

6838 Commits