spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-15 02:32:37 +03:00

Author	SHA1	Message	Date
Henning Peters	bfde91fa49	add custom download tool (uget), replace wget with uget	2015-10-18 12:35:04 +02:00
Matthew Honnibal	9839cd2c0b	* Fix whitespace_ calculation in Token	2015-10-18 17:21:11 +11:00
Matthew Honnibal	c99285b8b9	* Clean up C++ usage in spacy/matcher.pyx	2015-10-18 17:20:50 +11:00
Matthew Honnibal	a7e6c5ac8f	* Fix Issue #122 : Incorrect calculation of children after Doc.merge()	2015-10-18 17:17:27 +11:00
Matthew Honnibal	3ba66f2dc7	* Add string length cap in Tokenizer.__call__	2015-10-16 04:54:16 +11:00
Matthew Honnibal	6e0f985afc	* Fix token.conjuncts	2015-10-15 03:49:45 +11:00
Matthew Honnibal	2e0104ac81	* Fix token.conjuncts	2015-10-15 03:47:45 +11:00
Matthew Honnibal	b8f3345a82	* Fix token.conjuncts method	2015-10-15 03:36:01 +11:00
Matthew Honnibal	23818f89b8	* Fix token.conjuncts method	2015-10-15 03:34:57 +11:00
Matthew Honnibal	7a15d1b60c	* Add Python 2/3 compatibility fix for copy_reg	2015-10-13 20:04:40 +11:00
Matthew Honnibal	329ae57520	* Fix whitespace attachment thing	2015-10-13 09:46:38 +02:00
Matthew Honnibal	37919eac82	* Fix whitespace attachment in simpler way. Leaves problem with setting left/right children.	2015-10-13 18:23:24 +11:00
Matthew Honnibal	c70eb776ae	* Fix whitespace attachment, so that left/right children are consistent with head.	2015-10-13 15:58:22 +11:00
Matthew Honnibal	531182f937	* Fix Model.__reduce__	2015-10-13 15:14:38 +11:00
Matthew Honnibal	6c227a6c1f	* Fix Model.__reduce__	2015-10-13 15:10:04 +11:00
Matthew Honnibal	358c82595c	* Fix NAMES list in spacy/parts_of_speech.pyx	2015-10-13 14:18:45 +11:00
Matthew Honnibal	c1fdc487bc	Merge branch 'attrs'	2015-10-13 14:03:41 +11:00
Matthew Honnibal	e886e6a406	* Inc version	2015-10-13 13:46:17 +11:00
Matthew Honnibal	20fd36a0f7	* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.	2015-10-13 13:44:41 +11:00
Matthew Honnibal	f8de403483	* Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125	2015-10-13 13:44:41 +11:00
Matthew Honnibal	85e7944572	* Start trying to pickle Vocab	2015-10-13 13:44:41 +11:00
Matthew Honnibal	5ca57bd859	* Ensure Morphology can be pickled, to address Issue #125 .	2015-10-13 13:44:41 +11:00
Matthew Honnibal	0cee928467	* Allow StringStore to be pickled, to start addressing Issue #125	2015-10-13 13:44:41 +11:00
Matthew Honnibal	41012907a8	* Fix variable name	2015-10-13 13:44:40 +11:00
Matthew Honnibal	e70368d157	* Use lower case strings for dependency label names in symbols enum	2015-10-13 13:44:40 +11:00
Matthew Honnibal	7b4af3d1e7	* Fix parts_of_speech now that symbols list has been reformed	2015-10-13 13:44:40 +11:00
Matthew Honnibal	37b909b6b6	* Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd	2015-10-13 13:44:40 +11:00
Matthew Honnibal	ce65ec698c	* Remove qualified naming in symbols	2015-10-13 13:44:40 +11:00
Matthew Honnibal	9f4be0adcd	* Map NO_TAG to NIL in parts_of_speech.pxd	2015-10-13 13:44:40 +11:00
Matthew Honnibal	278e12f7e8	* Addmorphology symbols to morphology. May need to remove these as an enum.	2015-10-13 13:44:40 +11:00
Matthew Honnibal	d80067eda1	* Map empty string to NULL_ATTR in attrs	2015-10-13 13:44:40 +11:00
Matthew Honnibal	d70e8cac2c	* Fix empty values in attributes and parts of speech, so symbols align correctly with the StringStore	2015-10-13 13:44:40 +11:00
Matthew Honnibal	a29c8ee23d	* Add symbols to the vocab before reading the strings, so that they line up correctly	2015-10-13 13:44:39 +11:00
Matthew Honnibal	74c0853471	* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS	2015-10-13 13:44:39 +11:00
Matthew Honnibal	10a4a843ea	* Enumerate all symbols in one file	2015-10-13 13:44:39 +11:00
Matthew Honnibal	85ce36ab11	* Refactor symbols, so that frequency rank can be derived from the orth id of a word.	2015-10-13 13:44:39 +11:00
Matthew Honnibal	dfbcff2ff1	* Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate.	2015-10-10 15:54:55 +11:00
Matthew Honnibal	9dd2f25c74	* Fix Issue #131 : Force whitespace characters to attach syntactically to previous token, and ensure they cannot serve as stand-alone 'sentence' units.	2015-10-10 15:53:30 +11:00
Matthew Honnibal	8b39feefbe	* Add dependency post-process rule to ensure spaces are attached to neighbouring tokens, so that they can't be sentence boundaries	2015-10-10 15:32:13 +11:00
Matthew Honnibal	2153067958	* Fix use of io in strings.pyx	2015-10-10 15:03:12 +11:00
Matthew Honnibal	ec874247b5	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-10-10 14:23:51 +11:00
Matthew Honnibal	30de4135c9	* Fix merge problem	2015-10-10 14:22:32 +11:00
Matthew Honnibal	dc393a5f1d	Merge pull request #126 from tomtung/master Improve slicing support for both Doc and Span	2015-10-10 14:14:57 +11:00
Matthew Honnibal	83dccf0fd7	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
Matthew Honnibal	a3dfe2b901	* Increment data version	2015-10-09 13:26:17 +02:00
Matthew Honnibal	2d9e5bf566	* Allow punctuation to be lemmatized	2015-10-09 19:02:42 +11:00
Matthew Honnibal	5332c0b697	* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130	2015-10-09 18:54:40 +11:00
Yubing (Tom) Dong	9a6811acc4	Merge remote-tracking branch 'upstream/master'	2015-10-08 22:53:02 -07:00
Matthew Honnibal	b125289f30	* Fix type declaration in asciied function	2015-10-09 13:46:57 +11:00
Matthew Honnibal	801d55a6d9	* Fix phrase matcher	2015-10-09 02:00:45 +11:00

1 2 3 4 5 ...

1169 Commits