spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-03 19:04:58 +03:00

Author	SHA1	Message	Date
svlandeg	422df9c2e2	Merge remote-tracking branch 'upstream/develop' into feature/docs-layers # Conflicts: # website/docs/usage/layers-architectures.md	2020-09-02 13:17:11 +02:00
Sofie Van Landeghem	eb56377799	Fix overfitting test (#6011 ) * remove unused MORPH_RULES * fix textcat architecture in overfitting test	2020-09-02 13:07:41 +02:00
Adriane Boyd	b97d98783a	Fix Hungarian % tokenization (#6013 )	2020-09-02 13:06:16 +02:00
Ines Montani	70238543c8	Update layers/arch docs structure [ci skip]	2020-09-02 13:04:35 +02:00
Matthew Honnibal	c1bf3a5602	Fix significant performance bug in parser training (#6010 ) The parser training makes use of a trick for long documents, where we use the oracle to cut up the document into sections, so that we can have batch items in the middle of a document. For instance, if we have one document of 600 words, we might make 6 states, starting at words 0, 100, 200, 300, 400 and 500. The problem is for v3, I screwed this up and didn't stop parsing! So instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch of [600, 500, 400, 300, 200, 100]. Oops. The implementation here could probably be improved, it's annoying to have this extra variable in the state. But this'll do. This makes the v3 parser training 5-10 times faster, depending on document lengths. This problem wasn't in v2.	2020-09-02 12:57:13 +02:00
svlandeg	474abb2e59	remove unused MORPH_RULES from test	2020-09-02 11:37:56 +02:00
svlandeg	6fd7f140ec	custom-architectures section	2020-09-02 11:14:06 +02:00
svlandeg	3d9ae9286f	small fixes	2020-09-02 10:46:38 +02:00
Sofie Van Landeghem	f7a25d69f7	Bugfix in merge_entities (#6005 ) * failing test * bugfix	2020-09-01 21:57:52 +02:00
Sofie Van Landeghem	6bfb1b3a29	Fix sparse checkout for 'spacy project' (#6008 ) * exit if cloning fails * UX * rewrite http link to git protocol, don't use stdin * fixes to sparse checkout * formatting	2020-09-01 19:49:01 +02:00
Matthew Honnibal	4cce32f090	Fix tagger initialization	2020-09-01 16:38:34 +02:00
Matthew Honnibal	046c38bd26	Remove 'cleanup' of strings (#6007 ) A long time ago we went to some trouble to try to clean up "unused" strings, to avoid the `StringStore` growing in long-running processes. This never really worked reliably, and I think it was a really wrong approach. It's much better to let the user reload the `nlp` object as necessary, now that the string encoding is stable (in v1, the string IDs were sequential integers, making reloading the NLP object really annoying.) The extra book-keeping does make some performance difference, and the feature is unsed, so it's past time we killed it.	2020-09-01 16:12:15 +02:00
Ines Montani	690bd77669	Add todos [ci skip]	2020-09-01 14:04:36 +02:00
Ines Montani	70b226f69d	Support ignore marker in project document [ci skip]	2020-09-01 12:49:04 +02:00
Ines Montani	a4c51f0f18	Add v3 info to project docs [ci skip]	2020-09-01 12:36:21 +02:00
Ines Montani	ef9005273b	Update fill-config command and add silent mode [ci skip]	2020-09-01 12:07:04 +02:00
Matthew Honnibal	027c82c068	Update makefile	2020-09-01 01:22:54 +02:00
Matthew Honnibal	bff1640a75	Try to debug tmpdir problem	2020-09-01 01:13:09 +02:00
Matthew Honnibal	61a71d8bcc	Try to debug tmpdir problem	2020-09-01 01:10:53 +02:00
Matthew Honnibal	ec660e3131	Fix use_pytorch_for_gpu_memory	2020-09-01 00:41:38 +02:00
Adriane Boyd	9130094199	Prevent Tagger model init with 0 labels (#5984 ) * Prevent Tagger model init with 0 labels Raise an error before trying to initialize a tagger model with 0 labels. * Add dummy tagger label for test * Remove tagless tagger model initializiation * Fix error number after merge * Add dummy tagger label to test * Fix formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-31 21:24:33 +02:00
Matthw Honnibal	c38298b8fa	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-08-31 19:55:55 +02:00
Matthw Honnibal	fe298fa50a	Shuffle on first epoch of train	2020-08-31 19:55:22 +02:00
Ines Montani	9af82f3f11	Merge pull request #6003 from explosion/feature/matcher-as-spans	2020-08-31 17:50:56 +02:00
Sofie Van Landeghem	3ac620f09d	fix config example [ci skip]	2020-08-31 17:40:04 +02:00
Ines Montani	3929431af1	Update docs [ci skip]	2020-08-31 17:06:33 +02:00
Ines Montani	c3b6cbd740	Merge pull request #6004 from svlandeg/feature/console-ex console logging example	2020-08-31 17:03:52 +02:00
Ines Montani	add9de5487	Deprecate (Phrase)Matcher.pipe	2020-08-31 17:01:24 +02:00
svlandeg	2c3b64a567	console logging example	2020-08-31 16:56:13 +02:00
Ines Montani	bca6bf8dda	Update docs [ci skip]	2020-08-31 16:39:53 +02:00
Ines Montani	97ffb4ed05	Merge pull request #6002 from svlandeg/feature/vectors-docs	2020-08-31 16:25:18 +02:00
Ines Montani	db9f8896f5	Add docs [ci skip]	2020-08-31 16:10:41 +02:00
Ines Montani	83aff38c59	Make argument keyword-only Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-31 15:39:03 +02:00
Ines Montani	6340d1c63d	Add as_spans to Matcher/PhraseMatcher	2020-08-31 14:53:22 +02:00
svlandeg	fe6c08218e	fixes	2020-08-31 14:51:49 +02:00
svlandeg	0e0abb0378	fix	2020-08-31 14:50:29 +02:00
svlandeg	56ba691ecd	small fixes	2020-08-31 14:46:00 +02:00
svlandeg	e47ea88aeb	revert annotations refactor	2020-08-31 14:40:55 +02:00
svlandeg	13ee742fb4	example of custom logger	2020-08-31 14:24:41 +02:00
svlandeg	2c90a06fee	some more information about the loggers	2020-08-31 13:43:17 +02:00
svlandeg	c18eb63483	Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs # Conflicts: # website/docs/usage/embeddings-transformers.md	2020-08-31 13:21:36 +02:00
Juan Gutiérrez	9002bea29f	Update suffixes example (#5989 ) * Update suffixes example The current example will throw `TypeError: can only concatenate list (not "tuple") to list` * Signing Contributor Agreement	2020-08-31 12:44:56 +02:00
Sofie Van Landeghem	ec14744ee4	Rename Transformer listener (#6001 ) * rename to spacy-transformers.TransformerListener * add some more tok2vec tests * use select_pipes * fix docs - annotation setter was not changed in the end	2020-08-31 12:41:39 +02:00
Ines Montani	6ac3299e2e	Merge pull request #6000 from adrianeboyd/feature/tokenizer-special-case-filter Restrict tokenizer exceptions to ORTH and NORM	2020-08-31 12:38:38 +02:00
Adriane Boyd	216efaf5f5	Restrict tokenizer exceptions to ORTH and NORM	2020-08-31 09:55:01 +02:00
Matthew Honnibal	9341cbc013	Set version to v3.0.0a13	2020-08-30 23:10:43 +02:00
Matthew Honnibal	b69a0e332d	Fix makefile	2020-08-30 20:14:52 +02:00
Matthew Honnibal	acdd7b9478	Allow wheelhouse to be set in makefile	2020-08-30 20:00:49 +02:00
Matthew Honnibal	2ee0154bd0	Fix makefile	2020-08-30 17:11:24 +02:00
Matthew Honnibal	b2463e4d04	Fix makefile	2020-08-30 16:37:04 +02:00

... 2 3 4 5 6 ...

13028 Commits