spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-10 18:51:21 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	30e67fa808	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 16:08:23 +02:00
Matthew Honnibal	b0f6fd3f1d	Disable tokenizer cache for special-cases. Fixes #1250	2017-10-24 16:08:05 +02:00
Matthew Honnibal	63f0bde749	Add test for #1250 : Tokenizer cache clobbered special-case attrs	2017-10-24 16:07:18 +02:00
ines	8492d5be6d	Always make lemmatizer return a list of lemmas, not a set	2017-10-24 16:00:56 +02:00
ines	95f866f99f	Add lookup argument to Lemmatizer.load	2017-10-24 16:00:56 +02:00
ines	95f6174516	Remove tensorizer from model pipeline example in spacy package	2017-10-24 16:00:56 +02:00
ines	6686e53530	Allow GitHub embeds to specify optional language	2017-10-24 16:00:56 +02:00
ines	56a47f137f	Add title description for tokenizer	2017-10-24 16:00:56 +02:00
ines	3944c1d6e7	Document lemmatizer	2017-10-24 16:00:56 +02:00
ines	c9dc88ddfc	Document current JSON format for training	2017-10-24 16:00:56 +02:00
ines	2b8e7c45e0	Use better training data JSON example	2017-10-24 16:00:56 +02:00
ines	090aed940a	Add test for currently failing span.as_doc case	2017-10-24 16:00:56 +02:00
ines	4ef81a9ebc	Fix whitespace	2017-10-24 16:00:56 +02:00
Matthew Honnibal	18f1c1d0ba	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 14:29:43 +02:00
Matthew Honnibal	4bea65a1a8	Fix Issue #1450 : Off-by-1 in * and ? matches Patterns that end in variable-length operators e.g. * and ? now end on the correct token. Previously, they were off by 1: the next token was pulled into the match, even if that's where the pattern failed.	2017-10-24 14:26:27 +02:00
Matthew Honnibal	391d5ef0d1	Normalize imports in regression test	2017-10-24 14:25:49 +02:00
ines	c55db0a4a1	Add example sentences for Japanese and Chinese (see #1107 )	2017-10-24 13:02:24 +02:00
ines	66f8f9d4a0	Fix Japanese tokenizer JapaneseTokenizer now returns a Doc, not individual words	2017-10-24 13:02:19 +02:00
Matthew Honnibal	5ae0b8613a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 12:41:07 +02:00
Matthew Honnibal	dd5b2d8fa3	Check for out-of-memory when calling calloc. Closes #1446	2017-10-24 12:40:47 +02:00
ines	9bf5751064	Pretty-print JSON	2017-10-24 12:22:17 +02:00
Matthew Honnibal	0f9d966317	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-24 12:10:58 +02:00
Matthew Honnibal	b66b8f028b	Fix #1375 -- out-of-bounds on token.nbor()	2017-10-24 12:10:39 +02:00
Matthew Honnibal	a68d89a4f3	Add failing test for bug #1375 -- no out-of-bounds error for token.nbor()	2017-10-24 12:05:25 +02:00
ines	6675755005	Add training data JSON example	2017-10-24 12:05:10 +02:00
Matthew Honnibal	ccd2ab1a62	Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix Add LCA matrix for spans and docs	2017-10-24 11:22:46 +02:00
Matthew Honnibal	ef3e5a361b	Merge pull request #1442 from explosion/feature/fix-sp 💫Fix SP tag, tweak Vectors.__init__, fix Morphology	2017-10-24 10:24:07 +02:00
Matthew Honnibal	fdf25d10ba	Merge pull request #1440 from ramananbalakrishnan/develop Support single value for attribute list in doc.to_array	2017-10-24 10:23:12 +02:00
ines	7701984f13	Document Span.as_doc	2017-10-23 10:38:27 +02:00
ines	db15902e84	Tidy up	2017-10-23 10:38:21 +02:00
ines	3f0a157b33	Fix typo	2017-10-23 10:38:13 +02:00
ines	a31f048b4d	Fix formatting	2017-10-23 10:38:06 +02:00
Ines Montani	0ed0c41bad	Merge pull request #1448 from jerbob92/feature/fix-training-new-entity-type-example Fix #1444: fix training new entity type example	2017-10-22 15:43:33 +02:00
Jeroen Bobbeldijk	84c6c20d1c	Fix #1444 : fix pipeline logic and wrong paramater in update call	2017-10-22 15:18:36 +02:00
Matthew Honnibal	490ad3eaf0	Check that empty strings are handled. Closes #1242	2017-10-21 00:52:14 +02:00
Matthew Honnibal	8f8bccecb9	Patch deserialisation for invalid loads, to avoid model failure	2017-10-21 00:51:42 +02:00
Ramanan Balakrishnan	d2fe56a577	Add LCA matrix for spans and docs	2017-10-20 23:58:00 +05:30
Matthew Honnibal	d8391b1c4d	Fix #1434 : Matcher failed on ending ? if no token	2017-10-20 16:49:36 +02:00
Matthew Honnibal	fec53f09f7	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-10-20 16:28:34 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	9010a1a060	Create vectors correctly	2017-10-20 14:19:46 +02:00
Matthew Honnibal	33229b1c9e	Remove print statement	2017-10-20 14:19:29 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	ebecaddb76	Make 'data_or_width' two keyword args in Vectors.__init__ Previously the data and width options were one argument in Vectors, which meant you couldn't say vectors = Vectors(strings, width=300). It's better to have two keywords.	2017-10-20 14:17:15 +02:00
Matthew Honnibal	49895fbef6	Rename 'SP' special tag to '_SP' Renaming the tag with an underscore lets us add it to the tag map without worrying that we'll change the sequence of tags, which throws off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag, the "VERB" tag is pushed to a different class ID, and the model is all messed up.	2017-10-20 14:01:12 +02:00
Matthew Honnibal	506cf2eb13	Remove cpdef enum, to avoid too much code generation	2017-10-20 14:00:23 +02:00
Matthew Honnibal	6218af0105	Remove cpdef enum, to avoid too much code generation	2017-10-20 13:59:57 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Ramanan Balakrishnan	0726946563	cleanup to_array implementation using fixes on master	2017-10-20 17:09:37 +05:30
ines	108f1f786e	Update symbols and document missing token attributes (see #1439 )	2017-10-20 13:08:44 +02:00

1 2 3 4 5 ...

6840 Commits