spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-08 00:09:45 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	c1bf3a5602	Fix significant performance bug in parser training (#6010 ) The parser training makes use of a trick for long documents, where we use the oracle to cut up the document into sections, so that we can have batch items in the middle of a document. For instance, if we have one document of 600 words, we might make 6 states, starting at words 0, 100, 200, 300, 400 and 500. The problem is for v3, I screwed this up and didn't stop parsing! So instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch of [600, 500, 400, 300, 200, 100]. Oops. The implementation here could probably be improved, it's annoying to have this extra variable in the state. But this'll do. This makes the v3 parser training 5-10 times faster, depending on document lengths. This problem wasn't in v2.	2020-09-02 12:57:13 +02:00
Sofie Van Landeghem	6bfb1b3a29	Fix sparse checkout for 'spacy project' (#6008 ) * exit if cloning fails * UX * rewrite http link to git protocol, don't use stdin * fixes to sparse checkout * formatting	2020-09-01 19:49:01 +02:00
Matthew Honnibal	4cce32f090	Fix tagger initialization	2020-09-01 16:38:34 +02:00
Matthew Honnibal	046c38bd26	Remove 'cleanup' of strings (#6007 ) A long time ago we went to some trouble to try to clean up "unused" strings, to avoid the `StringStore` growing in long-running processes. This never really worked reliably, and I think it was a really wrong approach. It's much better to let the user reload the `nlp` object as necessary, now that the string encoding is stable (in v1, the string IDs were sequential integers, making reloading the NLP object really annoying.) The extra book-keeping does make some performance difference, and the feature is unsed, so it's past time we killed it.	2020-09-01 16:12:15 +02:00
Ines Montani	70b226f69d	Support ignore marker in project document [ci skip]	2020-09-01 12:49:04 +02:00
Ines Montani	a4c51f0f18	Add v3 info to project docs [ci skip]	2020-09-01 12:36:21 +02:00
Ines Montani	ef9005273b	Update fill-config command and add silent mode [ci skip]	2020-09-01 12:07:04 +02:00
Matthew Honnibal	ec660e3131	Fix use_pytorch_for_gpu_memory	2020-09-01 00:41:38 +02:00
Adriane Boyd	9130094199	Prevent Tagger model init with 0 labels (#5984 ) * Prevent Tagger model init with 0 labels Raise an error before trying to initialize a tagger model with 0 labels. * Add dummy tagger label for test * Remove tagless tagger model initializiation * Fix error number after merge * Add dummy tagger label to test * Fix formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-31 21:24:33 +02:00
Matthw Honnibal	c38298b8fa	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-08-31 19:55:55 +02:00
Matthw Honnibal	fe298fa50a	Shuffle on first epoch of train	2020-08-31 19:55:22 +02:00
Ines Montani	9af82f3f11	Merge pull request #6003 from explosion/feature/matcher-as-spans	2020-08-31 17:50:56 +02:00
Ines Montani	add9de5487	Deprecate (Phrase)Matcher.pipe	2020-08-31 17:01:24 +02:00
Ines Montani	83aff38c59	Make argument keyword-only Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-31 15:39:03 +02:00
Ines Montani	6340d1c63d	Add as_spans to Matcher/PhraseMatcher	2020-08-31 14:53:22 +02:00
svlandeg	13ee742fb4	example of custom logger	2020-08-31 14:24:41 +02:00
svlandeg	c18eb63483	Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs # Conflicts: # website/docs/usage/embeddings-transformers.md	2020-08-31 13:21:36 +02:00
Sofie Van Landeghem	ec14744ee4	Rename Transformer listener (#6001 ) * rename to spacy-transformers.TransformerListener * add some more tok2vec tests * use select_pipes * fix docs - annotation setter was not changed in the end	2020-08-31 12:41:39 +02:00
Adriane Boyd	216efaf5f5	Restrict tokenizer exceptions to ORTH and NORM	2020-08-31 09:55:01 +02:00
Matthew Honnibal	9341cbc013	Set version to v3.0.0a13	2020-08-30 23:10:43 +02:00
Ines Montani	45f46a5c85	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00
Ines Montani	34146750d4	Use frozen list with custom errors We don't want to break backwards compatibility too much but we also want to provide the best possible UX	2020-08-29 15:20:11 +02:00
Ines Montani	744f432420	Merge pull request #5994 from explosion/feature/idempotent-component-decorator	2020-08-29 13:17:13 +02:00
Ines Montani	5de3f8604d	Update spacy/util.py Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-29 13:17:06 +02:00
Ines Montani	091a9b522a	Remove unused variable [ci skip]	2020-08-29 13:11:26 +02:00
Ines Montani	2bc31e15c9	Tidy up and auto-format [ci skip]	2020-08-29 13:01:10 +02:00
Ines Montani	6520d1a1df	Work around set order in Language.disabled	2020-08-29 12:58:22 +02:00
Ines Montani	f45095a666	Merge pull request #5995 from adrianeboyd/bugfix/attribute-ruler-bugfixes	2020-08-29 12:38:30 +02:00
Ines Montani	e0b4984aa4	Make deprecated disable_pipes call into select_pipes	2020-08-29 12:08:46 +02:00
Ines Montani	15d73f4dc3	Make user-facing Language.disabled return list More consistent with all the other properties	2020-08-29 12:08:33 +02:00
Matthew Honnibal	58f19421b1	Return empty batch from tok2vec listener if no doc.tensor	2020-08-29 03:46:50 +02:00
svlandeg	5230529de2	add loggers registry & logger docs sections	2020-08-28 21:44:04 +02:00
Ines Montani	0687d7148e	Rename user-facing API	2020-08-28 21:04:02 +02:00
Adriane Boyd	0104bd1600	Sort the AttributeRuler matches by rule order Sort the returned matches by rule order (the `match_id`) so that the rules are applied in the order they were added. This is necessary, for instance, if the `AttributeRuler` is used for the tag map and later rules require POS tags.	2020-08-28 21:01:06 +02:00
Ines Montani	6a999c9303	Remove outdated component attr check	2020-08-28 20:59:19 +02:00
Adriane Boyd	8674b17651	Serialize AttributeRuler.patterns Serialize `AttributeRuler.patterns` instead of the individual lists to simplify the serialized and so that patterns are reloaded exactly as they were originally provided (preserving `_attrs_unnormed`).	2020-08-28 20:44:45 +02:00
Ines Montani	10da74382f	Raise if disabled components are removed before DisabledPipes.restore	2020-08-28 20:35:26 +02:00
Ines Montani	1e0363290e	Remove todos and update docstrings	2020-08-28 20:34:46 +02:00
Ines Montani	cad988da7f	Allow component decorators to re-run with same function	2020-08-28 16:27:22 +02:00
Ines Montani	3ce5be4b76	Allow loaded but disabled components	2020-08-28 15:20:14 +02:00
Ines Montani	89f692bc8a	Merge pull request #5992 from svlandeg/feature/wandb-restrict-config	2020-08-28 15:05:29 +02:00
Ines Montani	9c4049b57f	Merge pull request #5986 from explosion/fix/language-config-interpolate-disk-bytes	2020-08-28 15:03:52 +02:00
Ines Montani	adc050cdc5	Fix code style in test [ci skip]	2020-08-28 15:03:21 +02:00
svlandeg	05a1bafa15	fix type	2020-08-28 14:08:33 +02:00
svlandeg	33883aa764	rename field	2020-08-28 14:06:23 +02:00
svlandeg	1d8c4070aa	add disable_fields to wandb_logger	2020-08-28 13:55:32 +02:00
Ines Montani	a51b4f3a19	Merge branch 'develop' into fix/language-config-interpolate-disk-bytes	2020-08-28 13:21:17 +02:00
Ines Montani	03dde511b4	Merge pull request #5987 from explosion/feature/debug-config [ci skip]	2020-08-28 11:30:18 +02:00
Ines Montani	62e9967228	Merge branch 'develop' into fix/language-config-interpolate-disk-bytes	2020-08-28 11:19:36 +02:00
Ines Montani	4ca2698f85	Merge branch 'develop' into feature/debug-config	2020-08-28 11:19:17 +02:00

1 2 3 4 5 ...

7607 Commits