spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 02:46:40 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	0cee928467	* Allow StringStore to be pickled, to start addressing Issue #125	2015-10-13 13:44:41 +11:00
Matthew Honnibal	41012907a8	* Fix variable name	2015-10-13 13:44:40 +11:00
Matthew Honnibal	e70368d157	* Use lower case strings for dependency label names in symbols enum	2015-10-13 13:44:40 +11:00
Matthew Honnibal	7b4af3d1e7	* Fix parts_of_speech now that symbols list has been reformed	2015-10-13 13:44:40 +11:00
Matthew Honnibal	37b909b6b6	* Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd	2015-10-13 13:44:40 +11:00
Matthew Honnibal	ce65ec698c	* Remove qualified naming in symbols	2015-10-13 13:44:40 +11:00
Matthew Honnibal	9f4be0adcd	* Map NO_TAG to NIL in parts_of_speech.pxd	2015-10-13 13:44:40 +11:00
Matthew Honnibal	278e12f7e8	* Addmorphology symbols to morphology. May need to remove these as an enum.	2015-10-13 13:44:40 +11:00
Matthew Honnibal	d80067eda1	* Map empty string to NULL_ATTR in attrs	2015-10-13 13:44:40 +11:00
Matthew Honnibal	d70e8cac2c	* Fix empty values in attributes and parts of speech, so symbols align correctly with the StringStore	2015-10-13 13:44:40 +11:00
Matthew Honnibal	a29c8ee23d	* Add symbols to the vocab before reading the strings, so that they line up correctly	2015-10-13 13:44:39 +11:00
Matthew Honnibal	74c0853471	* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS	2015-10-13 13:44:39 +11:00
Matthew Honnibal	10a4a843ea	* Enumerate all symbols in one file	2015-10-13 13:44:39 +11:00
Matthew Honnibal	85ce36ab11	* Refactor symbols, so that frequency rank can be derived from the orth id of a word.	2015-10-13 13:44:39 +11:00
Matthew Honnibal	dfbcff2ff1	* Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate.	2015-10-10 15:54:55 +11:00
Matthew Honnibal	9dd2f25c74	* Fix Issue #131 : Force whitespace characters to attach syntactically to previous token, and ensure they cannot serve as stand-alone 'sentence' units.	2015-10-10 15:53:30 +11:00
Matthew Honnibal	8b39feefbe	* Add dependency post-process rule to ensure spaces are attached to neighbouring tokens, so that they can't be sentence boundaries	2015-10-10 15:32:13 +11:00
Matthew Honnibal	2153067958	* Fix use of io in strings.pyx	2015-10-10 15:03:12 +11:00
Matthew Honnibal	ec874247b5	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-10-10 14:23:51 +11:00
Matthew Honnibal	30de4135c9	* Fix merge problem	2015-10-10 14:22:32 +11:00
Matthew Honnibal	dc393a5f1d	Merge pull request #126 from tomtung/master Improve slicing support for both Doc and Span	2015-10-10 14:14:57 +11:00
Matthew Honnibal	83dccf0fd7	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
Matthew Honnibal	a3dfe2b901	* Increment data version	2015-10-09 13:26:17 +02:00
Matthew Honnibal	2d9e5bf566	* Allow punctuation to be lemmatized	2015-10-09 19:02:42 +11:00
Matthew Honnibal	5332c0b697	* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130	2015-10-09 18:54:40 +11:00
Yubing (Tom) Dong	9a6811acc4	Merge remote-tracking branch 'upstream/master'	2015-10-08 22:53:02 -07:00
Matthew Honnibal	b125289f30	* Fix type declaration in asciied function	2015-10-09 13:46:57 +11:00
Matthew Honnibal	801d55a6d9	* Fix phrase matcher	2015-10-09 02:00:45 +11:00
Matthew Honnibal	b3a70e6375	* Clean up unnecessary try/except block	2015-10-08 14:34:11 +11:00
Yubing (Tom) Dong	0f601b8b75	Update docstring of Doc.__getitem__	2015-10-07 01:27:28 -07:00
Yubing (Tom) Dong	3fd3bc79aa	Refactor to remove duplicate slicing logic	2015-10-07 01:25:35 -07:00
Yubing (Tom) Dong	97685aecb7	Add slicing support to Span	2015-10-06 02:45:49 -07:00
Yubing (Tom) Dong	ef2af20cd3	Make Doc's slicing behavior conform to Python conventions	2015-10-06 02:41:28 -07:00
Yubing (Tom) Dong	2fc33e8024	Allow step=1 when slicing a Doc	2015-10-06 00:57:05 -07:00
Matthew Honnibal	b228a8f4a6	* Remove spacy/en/attrs	2015-10-06 16:20:46 +11:00
Matthew Honnibal	693677fd8d	* Prepare to remove en/attrx file, now that moving to symbols.pyx	2015-10-06 16:20:13 +11:00
Matthew Honnibal	3d9f41c2c9	* Add LookupError for better error reporting in Vocab	2015-10-06 10:34:59 +11:00
Matthew Honnibal	ecc5281b36	* Remove en/pos.pyx, as the tagger code now lives in spacy/tagger.pyx	2015-10-06 10:12:08 +11:00
alvations	8caedba42a	caught more codecs.open -> io.open	2015-09-30 20:20:09 +02:00
alvations	8199012d26	changing deprecated codecs.open to io.open =)	2015-09-30 20:10:15 +02:00
Matthew Honnibal	87e6186828	* Rename _seq to doc attribute in Span	2015-09-29 23:03:55 +10:00
Matthew Honnibal	ab694b0364	* Fix open-bounded slice indices.	2015-09-29 23:03:09 +10:00
Matthew Honnibal	a6ced80c0c	* Fix Issue #116 : Misleading handling of True value in Language.__init__.	2015-09-29 20:54:12 +10:00
Matthew Honnibal	f9d2a5b651	* Fix issue #112 : Replace unidecode with text-unidecode, to avoid license problems.	2015-09-28 23:40:18 +10:00
Matthew Honnibal	2c33a96ac3	Merge pull request #99 from rw/patch-1 Force SSL for downloading English language data.	2015-09-28 17:46:26 +10:00
Matthew Honnibal	abf0d930af	* Fix API for loading word vectors from a file.	2015-09-23 23:51:08 +10:00
Matthew Honnibal	f5c256745b	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-09-22 12:26:24 +10:00
Matthew Honnibal	528e26a506	* Add rule to ensure ordinals are preserved as single tokens	2015-09-22 12:26:05 +10:00
Robert	8711b64860	Force SSL for downloading English language data. It would also be nice to have a checksum for this.	2015-09-21 17:26:01 -07:00
Matthew Honnibal	f7283a5067	* Fix vectors bugs for OOV words	2015-09-22 02:10:25 +02:00

1 2 3 4 5 ...

1147 Commits