spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-29 07:26:44 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	7be8a0516a	Fix project pull	2020-09-03 18:54:03 +02:00
Ines Montani	b1eb98b15c	Remove todos [ci skip]	2020-09-03 17:43:58 +02:00
Ines Montani	23b7d9cfa3	Prefix span getters	2020-09-03 17:37:06 +02:00
Ines Montani	5afe6447cd	registry.assets -> registry.misc	2020-09-03 17:31:14 +02:00
Ines Montani	c063e55eb7	Add prefix to batchers	2020-09-03 17:30:41 +02:00
Ines Montani	804f120361	Don't use registered function version in title	2020-09-03 17:29:47 +02:00
Ines Montani	896caf45e3	Merge pull request #6023 from explosion/ux/model-terminology-consistency [ci skip]	2020-09-03 17:13:44 +02:00
Ines Montani	c53b1433b9	Adjust more arguments [ci skip]	2020-09-03 17:12:24 +02:00
Ines Montani	121809dd1e	Fix anchor [ci skip]	2020-09-03 16:49:56 +02:00
Ines Montani	25a595dc10	Fix typos and wording [ci skip]	2020-09-03 16:37:45 +02:00
Ines Montani	b5a0657fd6	"model" terminology consistency in docs	2020-09-03 13:13:03 +02:00
Matthew Honnibal	f038841798	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-03 12:52:39 +02:00
Matthew Honnibal	ef0d0630a4	Let Langugae.use_params work with falsey inputs The Language.use_params method was failing if you passed in None, which meant we had to use awkward conditionals for the parameter averaging. This solves the problem.	2020-09-03 12:51:04 +02:00
Ines Montani	b02ad8045b	Update docs [ci skip]	2020-09-03 10:10:13 +02:00
Yohei Tamura	5af432e0f2	fix for empty string (#5936 )	2020-09-03 10:09:03 +02:00
Ines Montani	1815c613c9	Update docs [ci skip]	2020-09-03 10:07:45 +02:00
Ines Montani	6f46d4e4d2	Merge pull request #6017 from svlandeg/feature/docs-layers [ci skip]	2020-09-03 10:03:23 +02:00
Adriane Boyd	77ac4a38aa	Simplify specials and cache checks (#6012 )	2020-09-03 09:42:49 +02:00
Adriane Boyd	8b5594df86	Remove near-duplicate test	2020-09-02 20:32:01 +02:00
Matthew Honnibal	122cb02001	Fix averages	2020-09-02 19:37:43 +02:00
Adriane Boyd	960d9cfadc	Officially support DependencyMatcher Add official support for the `DependencyMatcher`. Redesign the pattern specification. Fix and extend operator implementations. Update API docs and add usage docs. Patterns -------- Refactor pattern structure to: ``` { "LEFT_ID": str, "REL_OP": str, "RIGHT_ID": str, "RIGHT_ATTRS": dict, } ``` The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all subsequent nodes contain all four keys. New operators ------------- Because of the way patterns are constructed from left to right, it's helpful to have `follows` operators along with `precedes` operators. Add operators for simple precedes / follows alongside immediate precedes / follows. * `.`: precedes `;`: immediately follows * `;`: follows Operator fixes -------------- `<` and `<<` do not include the node itself * Fix reversed order for all operators involving linear precedence (`.`, all sibling operators) * Linear precedence operators do not match nodes outside the same parse Additional fixes ---------------- * Use v3 Matcher API * Support `get` and `remove` * Support pickling	2020-09-02 17:45:29 +02:00
svlandeg	ab909a3f68	Merge branch 'feature/docs-layers' of https://github.com/svlandeg/spaCy into feature/docs-layers	2020-09-02 17:44:00 +02:00
svlandeg	cda45dd1ab	Merge remote-tracking branch 'upstream/develop' into feature/docs-layers	2020-09-02 17:43:45 +02:00
svlandeg	19298de352	small fix	2020-09-02 17:43:11 +02:00
svlandeg	bbaea530f6	sublayers paragraph	2020-09-02 17:36:22 +02:00
svlandeg	1be7ff02a6	swapping section	2020-09-02 15:26:07 +02:00
Marek Grzenkowicz	92d7832a86	Fix off-by-one error for best iteration calculation (closes #6014 ) (#6016 )	2020-09-02 15:15:45 +02:00
Matthew Honnibal	737a1408d9	Improve implementation of fix #6010 Follow-ups to the parser efficiency fix. * Avoid introducing new counter for number of pushes * Base cut on number of transitions, keeping it more even * Reintroduce the randomization we had in v2.	2020-09-02 14:42:32 +02:00
svlandeg	57e432ba2a	editor tip as Accordion instead of Infobox	2020-09-02 14:26:57 +02:00
svlandeg	d19ec6c67b	small rewrites in types paragraph	2020-09-02 14:25:18 +02:00
svlandeg	821b2d4e63	update examples	2020-09-02 14:15:50 +02:00
svlandeg	e29a33449d	rewrite intro, simpel Model example	2020-09-02 13:41:18 +02:00
svlandeg	422df9c2e2	Merge remote-tracking branch 'upstream/develop' into feature/docs-layers # Conflicts: # website/docs/usage/layers-architectures.md	2020-09-02 13:17:11 +02:00
Sofie Van Landeghem	eb56377799	Fix overfitting test (#6011 ) * remove unused MORPH_RULES * fix textcat architecture in overfitting test	2020-09-02 13:07:41 +02:00
Adriane Boyd	b97d98783a	Fix Hungarian % tokenization (#6013 )	2020-09-02 13:06:16 +02:00
Ines Montani	70238543c8	Update layers/arch docs structure [ci skip]	2020-09-02 13:04:35 +02:00
Matthew Honnibal	c1bf3a5602	Fix significant performance bug in parser training (#6010 ) The parser training makes use of a trick for long documents, where we use the oracle to cut up the document into sections, so that we can have batch items in the middle of a document. For instance, if we have one document of 600 words, we might make 6 states, starting at words 0, 100, 200, 300, 400 and 500. The problem is for v3, I screwed this up and didn't stop parsing! So instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch of [600, 500, 400, 300, 200, 100]. Oops. The implementation here could probably be improved, it's annoying to have this extra variable in the state. But this'll do. This makes the v3 parser training 5-10 times faster, depending on document lengths. This problem wasn't in v2.	2020-09-02 12:57:13 +02:00
svlandeg	474abb2e59	remove unused MORPH_RULES from test	2020-09-02 11:37:56 +02:00
svlandeg	6fd7f140ec	custom-architectures section	2020-09-02 11:14:06 +02:00
svlandeg	3d9ae9286f	small fixes	2020-09-02 10:46:38 +02:00
Sofie Van Landeghem	f7a25d69f7	Bugfix in merge_entities (#6005 ) * failing test * bugfix	2020-09-01 21:57:52 +02:00
Sofie Van Landeghem	6bfb1b3a29	Fix sparse checkout for 'spacy project' (#6008 ) * exit if cloning fails * UX * rewrite http link to git protocol, don't use stdin * fixes to sparse checkout * formatting	2020-09-01 19:49:01 +02:00
Matthew Honnibal	4cce32f090	Fix tagger initialization	2020-09-01 16:38:34 +02:00
Matthew Honnibal	046c38bd26	Remove 'cleanup' of strings (#6007 ) A long time ago we went to some trouble to try to clean up "unused" strings, to avoid the `StringStore` growing in long-running processes. This never really worked reliably, and I think it was a really wrong approach. It's much better to let the user reload the `nlp` object as necessary, now that the string encoding is stable (in v1, the string IDs were sequential integers, making reloading the NLP object really annoying.) The extra book-keeping does make some performance difference, and the feature is unsed, so it's past time we killed it.	2020-09-01 16:12:15 +02:00
Ines Montani	690bd77669	Add todos [ci skip]	2020-09-01 14:04:36 +02:00
Ines Montani	70b226f69d	Support ignore marker in project document [ci skip]	2020-09-01 12:49:04 +02:00
Ines Montani	a4c51f0f18	Add v3 info to project docs [ci skip]	2020-09-01 12:36:21 +02:00
Ines Montani	ef9005273b	Update fill-config command and add silent mode [ci skip]	2020-09-01 12:07:04 +02:00
Matthew Honnibal	027c82c068	Update makefile	2020-09-01 01:22:54 +02:00
Matthew Honnibal	bff1640a75	Try to debug tmpdir problem	2020-09-01 01:13:09 +02:00

1 2 3 4 5 ...

13010 Commits