spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-16 09:02:35 +03:00

Author	SHA1	Message	Date
svlandeg	c32fcdf4c9	fix typo	2020-09-04 09:10:21 +02:00
Ines Montani	595f9dc2e4	Make displacy color registry consistent with others This was the only registry that expected the registered objects to be dictionaries instead of functions that return something. We can still support plain dicts but we should also support functions for consistency	2020-09-03 23:05:41 +02:00
Ines Montani	4daf138136	Fix alphabetic ordering [ci skip]	2020-09-03 23:01:50 +02:00
Matthew Honnibal	1c07820681	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-03 18:54:21 +02:00
Matthew Honnibal	7be8a0516a	Fix project pull	2020-09-03 18:54:03 +02:00
Ines Montani	b1eb98b15c	Remove todos [ci skip]	2020-09-03 17:43:58 +02:00
Ines Montani	23b7d9cfa3	Prefix span getters	2020-09-03 17:37:06 +02:00
Ines Montani	5afe6447cd	registry.assets -> registry.misc	2020-09-03 17:31:14 +02:00
Ines Montani	c063e55eb7	Add prefix to batchers	2020-09-03 17:30:41 +02:00
Ines Montani	804f120361	Don't use registered function version in title	2020-09-03 17:29:47 +02:00
Ines Montani	896caf45e3	Merge pull request #6023 from explosion/ux/model-terminology-consistency [ci skip]	2020-09-03 17:13:44 +02:00
Ines Montani	c53b1433b9	Adjust more arguments [ci skip]	2020-09-03 17:12:24 +02:00
Ines Montani	121809dd1e	Fix anchor [ci skip]	2020-09-03 16:49:56 +02:00
Ines Montani	25a595dc10	Fix typos and wording [ci skip]	2020-09-03 16:37:45 +02:00
Ines Montani	b5a0657fd6	"model" terminology consistency in docs	2020-09-03 13:13:03 +02:00
Matthew Honnibal	f038841798	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-03 12:52:39 +02:00
Matthew Honnibal	ef0d0630a4	Let Langugae.use_params work with falsey inputs The Language.use_params method was failing if you passed in None, which meant we had to use awkward conditionals for the parameter averaging. This solves the problem.	2020-09-03 12:51:04 +02:00
Ines Montani	b02ad8045b	Update docs [ci skip]	2020-09-03 10:10:13 +02:00
Yohei Tamura	5af432e0f2	fix for empty string (#5936 )	2020-09-03 10:09:03 +02:00
Ines Montani	1815c613c9	Update docs [ci skip]	2020-09-03 10:07:45 +02:00
Ines Montani	6f46d4e4d2	Merge pull request #6017 from svlandeg/feature/docs-layers [ci skip]	2020-09-03 10:03:23 +02:00
Adriane Boyd	77ac4a38aa	Simplify specials and cache checks (#6012 )	2020-09-03 09:42:49 +02:00
Adriane Boyd	8b5594df86	Remove near-duplicate test	2020-09-02 20:32:01 +02:00
Matthew Honnibal	122cb02001	Fix averages	2020-09-02 19:37:43 +02:00
Adriane Boyd	960d9cfadc	Officially support DependencyMatcher Add official support for the `DependencyMatcher`. Redesign the pattern specification. Fix and extend operator implementations. Update API docs and add usage docs. Patterns -------- Refactor pattern structure to: ``` { "LEFT_ID": str, "REL_OP": str, "RIGHT_ID": str, "RIGHT_ATTRS": dict, } ``` The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all subsequent nodes contain all four keys. New operators ------------- Because of the way patterns are constructed from left to right, it's helpful to have `follows` operators along with `precedes` operators. Add operators for simple precedes / follows alongside immediate precedes / follows. * `.`: precedes `;`: immediately follows * `;`: follows Operator fixes -------------- `<` and `<<` do not include the node itself * Fix reversed order for all operators involving linear precedence (`.`, all sibling operators) * Linear precedence operators do not match nodes outside the same parse Additional fixes ---------------- * Use v3 Matcher API * Support `get` and `remove` * Support pickling	2020-09-02 17:45:29 +02:00
svlandeg	ab909a3f68	Merge branch 'feature/docs-layers' of https://github.com/svlandeg/spaCy into feature/docs-layers	2020-09-02 17:44:00 +02:00
svlandeg	cda45dd1ab	Merge remote-tracking branch 'upstream/develop' into feature/docs-layers	2020-09-02 17:43:45 +02:00
svlandeg	19298de352	small fix	2020-09-02 17:43:11 +02:00
svlandeg	bbaea530f6	sublayers paragraph	2020-09-02 17:36:22 +02:00
svlandeg	1be7ff02a6	swapping section	2020-09-02 15:26:07 +02:00
Marek Grzenkowicz	92d7832a86	Fix off-by-one error for best iteration calculation (closes #6014 ) (#6016 )	2020-09-02 15:15:45 +02:00
Matthew Honnibal	737a1408d9	Improve implementation of fix #6010 Follow-ups to the parser efficiency fix. * Avoid introducing new counter for number of pushes * Base cut on number of transitions, keeping it more even * Reintroduce the randomization we had in v2.	2020-09-02 14:42:32 +02:00
svlandeg	57e432ba2a	editor tip as Accordion instead of Infobox	2020-09-02 14:26:57 +02:00
svlandeg	d19ec6c67b	small rewrites in types paragraph	2020-09-02 14:25:18 +02:00
svlandeg	821b2d4e63	update examples	2020-09-02 14:15:50 +02:00
svlandeg	e29a33449d	rewrite intro, simpel Model example	2020-09-02 13:41:18 +02:00
svlandeg	422df9c2e2	Merge remote-tracking branch 'upstream/develop' into feature/docs-layers # Conflicts: # website/docs/usage/layers-architectures.md	2020-09-02 13:17:11 +02:00
Sofie Van Landeghem	eb56377799	Fix overfitting test (#6011 ) * remove unused MORPH_RULES * fix textcat architecture in overfitting test	2020-09-02 13:07:41 +02:00
Adriane Boyd	b97d98783a	Fix Hungarian % tokenization (#6013 )	2020-09-02 13:06:16 +02:00
Ines Montani	70238543c8	Update layers/arch docs structure [ci skip]	2020-09-02 13:04:35 +02:00
Matthew Honnibal	c1bf3a5602	Fix significant performance bug in parser training (#6010 ) The parser training makes use of a trick for long documents, where we use the oracle to cut up the document into sections, so that we can have batch items in the middle of a document. For instance, if we have one document of 600 words, we might make 6 states, starting at words 0, 100, 200, 300, 400 and 500. The problem is for v3, I screwed this up and didn't stop parsing! So instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch of [600, 500, 400, 300, 200, 100]. Oops. The implementation here could probably be improved, it's annoying to have this extra variable in the state. But this'll do. This makes the v3 parser training 5-10 times faster, depending on document lengths. This problem wasn't in v2.	2020-09-02 12:57:13 +02:00
svlandeg	474abb2e59	remove unused MORPH_RULES from test	2020-09-02 11:37:56 +02:00
svlandeg	6fd7f140ec	custom-architectures section	2020-09-02 11:14:06 +02:00
svlandeg	3d9ae9286f	small fixes	2020-09-02 10:46:38 +02:00
Sofie Van Landeghem	f7a25d69f7	Bugfix in merge_entities (#6005 ) * failing test * bugfix	2020-09-01 21:57:52 +02:00
Sofie Van Landeghem	6bfb1b3a29	Fix sparse checkout for 'spacy project' (#6008 ) * exit if cloning fails * UX * rewrite http link to git protocol, don't use stdin * fixes to sparse checkout * formatting	2020-09-01 19:49:01 +02:00
Matthew Honnibal	4cce32f090	Fix tagger initialization	2020-09-01 16:38:34 +02:00
Matthew Honnibal	046c38bd26	Remove 'cleanup' of strings (#6007 ) A long time ago we went to some trouble to try to clean up "unused" strings, to avoid the `StringStore` growing in long-running processes. This never really worked reliably, and I think it was a really wrong approach. It's much better to let the user reload the `nlp` object as necessary, now that the string encoding is stable (in v1, the string IDs were sequential integers, making reloading the NLP object really annoying.) The extra book-keeping does make some performance difference, and the feature is unsed, so it's past time we killed it.	2020-09-01 16:12:15 +02:00
Ines Montani	690bd77669	Add todos [ci skip]	2020-09-01 14:04:36 +02:00
Ines Montani	70b226f69d	Support ignore marker in project document [ci skip]	2020-09-01 12:49:04 +02:00

... 2 3 4 5 6 ...

13064 Commits