spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Author	SHA1	Message	Date
adrianeboyd	5ee9d8c9b8	Add MORPH attr, add support in retokenizer (#4947 ) * Add MORPH attr / symbol for token attrs * Update retokenizer for MORPH	2020-01-29 17:45:46 +01:00
svlandeg	8608685543	ensure Span.as_doc keeps the entity links + unit test	2019-06-25 15:28:51 +02:00
Matthew Honnibal	8aa7882762	Make NORM a token attribute (#3029 ) See #3028. The solution in this patch is pretty debateable. What we do is give the TokenC struct a .norm field, by repurposing the previously idle .sense attribute. It's nice to repurpose a previous field because it means the TokenC doesn't change size, so even if someone's using the internals very deeply, nothing will break. The weird thing here is that the TokenC and the LexemeC both have an attribute named NORM. This arguably assists in backwards compatibility. On the other hand, maybe it's really bad! We're changing the semantics of the attribute subtly, so maybe it's better if someone calling lex.norm gets a breakage, and instead is told to write lex.default_norm? Overall I believe this patch makes the NORM feature work the way we sort of expected it to work. Certainly it's much more like how the docs describe it, and more in line with how we've been directing people to use the norm attribute. We'll also be able to use token.norm to do stuff like spelling correction, which is pretty cool.	2018-12-08 10:49:10 +01:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Thomas Opsomer	515e25910e	fix sent_start in serialization	2018-01-28 19:50:42 +01:00
Matthew Honnibal	a002264fec	Remove caching of Token in Doc, as caused cycle.	2017-10-16 19:34:21 +02:00
Matthew Honnibal	87613edf8f	Add set_struct_attr staticmethod to token	2016-11-25 12:41:47 +01:00
Matthew Honnibal	9a03a3f85e	Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.	2016-11-25 11:35:17 +01:00
Matthew Honnibal	09ab447a18	Remove tensor property from token.	2016-10-17 02:45:09 +02:00
Matthew Honnibal	f3be9d0a9a	Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs	2016-10-14 03:24:13 +02:00
Matthew Honnibal	58e83fe34b	Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.	2016-09-21 14:54:55 +02:00
Wolfgang Seeker	46e3f979f1	add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions	2016-03-11 17:31:06 +01:00
Matthew Honnibal	30be4f15da	* Import attrs from spacy.attrs, not spacy.typedefs	2015-07-16 11:23:25 +02:00
Matthew Honnibal	81aa4e6dcc	* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API	2015-07-14 00:10:11 +02:00
Matthew Honnibal	8214b74eec	* Restore _py_tokens cache, to handle orphan tokens.	2015-07-13 22:28:10 +02:00
Matthew Honnibal	dba6b47d4e	* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference	2015-07-13 19:20:48 +02:00

16 Commits