Matthew Honnibal
|
0cee928467
|
* Allow StringStore to be pickled, to start addressing Issue #125
|
2015-10-13 13:44:41 +11:00 |
|
Matthew Honnibal
|
41012907a8
|
* Fix variable name
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
e70368d157
|
* Use lower case strings for dependency label names in symbols enum
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
7b4af3d1e7
|
* Fix parts_of_speech now that symbols list has been reformed
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
37b909b6b6
|
* Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
ce65ec698c
|
* Remove qualified naming in symbols
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
9f4be0adcd
|
* Map NO_TAG to NIL in parts_of_speech.pxd
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
278e12f7e8
|
* Addmorphology symbols to morphology. May need to remove these as an enum.
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
d80067eda1
|
* Map empty string to NULL_ATTR in attrs
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
d70e8cac2c
|
* Fix empty values in attributes and parts of speech, so symbols align correctly with the StringStore
|
2015-10-13 13:44:40 +11:00 |
|
Matthew Honnibal
|
a29c8ee23d
|
* Add symbols to the vocab before reading the strings, so that they line up correctly
|
2015-10-13 13:44:39 +11:00 |
|
Matthew Honnibal
|
74c0853471
|
* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS
|
2015-10-13 13:44:39 +11:00 |
|
Matthew Honnibal
|
10a4a843ea
|
* Enumerate all symbols in one file
|
2015-10-13 13:44:39 +11:00 |
|
Matthew Honnibal
|
85ce36ab11
|
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
|
2015-10-13 13:44:39 +11:00 |
|
Matthew Honnibal
|
dfbcff2ff1
|
* Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate.
|
2015-10-10 15:54:55 +11:00 |
|
Matthew Honnibal
|
9dd2f25c74
|
* Fix Issue #131: Force whitespace characters to attach syntactically to previous token, and ensure they cannot serve as stand-alone 'sentence' units.
|
2015-10-10 15:53:30 +11:00 |
|
Matthew Honnibal
|
8b39feefbe
|
* Add dependency post-process rule to ensure spaces are attached to neighbouring tokens, so that they can't be sentence boundaries
|
2015-10-10 15:32:13 +11:00 |
|
Matthew Honnibal
|
2153067958
|
* Fix use of io in strings.pyx
|
2015-10-10 15:03:12 +11:00 |
|
Matthew Honnibal
|
ec874247b5
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-10-10 14:23:51 +11:00 |
|
Matthew Honnibal
|
30de4135c9
|
* Fix merge problem
|
2015-10-10 14:22:32 +11:00 |
|
Matthew Honnibal
|
dc393a5f1d
|
Merge pull request #126 from tomtung/master
Improve slicing support for both Doc and Span
|
2015-10-10 14:14:57 +11:00 |
|
Matthew Honnibal
|
83dccf0fd7
|
* Use io module insteads of deprecated codecs module
|
2015-10-10 14:13:01 +11:00 |
|
Matthew Honnibal
|
a3dfe2b901
|
* Increment data version
|
2015-10-09 13:26:17 +02:00 |
|
Matthew Honnibal
|
2d9e5bf566
|
* Allow punctuation to be lemmatized
|
2015-10-09 19:02:42 +11:00 |
|
Matthew Honnibal
|
5332c0b697
|
* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130
|
2015-10-09 18:54:40 +11:00 |
|
Yubing (Tom) Dong
|
9a6811acc4
|
Merge remote-tracking branch 'upstream/master'
|
2015-10-08 22:53:02 -07:00 |
|
Matthew Honnibal
|
b125289f30
|
* Fix type declaration in asciied function
|
2015-10-09 13:46:57 +11:00 |
|
Matthew Honnibal
|
801d55a6d9
|
* Fix phrase matcher
|
2015-10-09 02:00:45 +11:00 |
|
Matthew Honnibal
|
b3a70e6375
|
* Clean up unnecessary try/except block
|
2015-10-08 14:34:11 +11:00 |
|
Yubing (Tom) Dong
|
0f601b8b75
|
Update docstring of Doc.__getitem__
|
2015-10-07 01:27:28 -07:00 |
|
Yubing (Tom) Dong
|
3fd3bc79aa
|
Refactor to remove duplicate slicing logic
|
2015-10-07 01:25:35 -07:00 |
|
Yubing (Tom) Dong
|
97685aecb7
|
Add slicing support to Span
|
2015-10-06 02:45:49 -07:00 |
|
Yubing (Tom) Dong
|
ef2af20cd3
|
Make Doc's slicing behavior conform to Python conventions
|
2015-10-06 02:41:28 -07:00 |
|
Yubing (Tom) Dong
|
2fc33e8024
|
Allow step=1 when slicing a Doc
|
2015-10-06 00:57:05 -07:00 |
|
Matthew Honnibal
|
b228a8f4a6
|
* Remove spacy/en/attrs
|
2015-10-06 16:20:46 +11:00 |
|
Matthew Honnibal
|
693677fd8d
|
* Prepare to remove en/attrx file, now that moving to symbols.pyx
|
2015-10-06 16:20:13 +11:00 |
|
Matthew Honnibal
|
3d9f41c2c9
|
* Add LookupError for better error reporting in Vocab
|
2015-10-06 10:34:59 +11:00 |
|
Matthew Honnibal
|
ecc5281b36
|
* Remove en/pos.pyx, as the tagger code now lives in spacy/tagger.pyx
|
2015-10-06 10:12:08 +11:00 |
|
alvations
|
8caedba42a
|
caught more codecs.open -> io.open
|
2015-09-30 20:20:09 +02:00 |
|
alvations
|
8199012d26
|
changing deprecated codecs.open to io.open =)
|
2015-09-30 20:10:15 +02:00 |
|
Matthew Honnibal
|
87e6186828
|
* Rename _seq to doc attribute in Span
|
2015-09-29 23:03:55 +10:00 |
|
Matthew Honnibal
|
ab694b0364
|
* Fix open-bounded slice indices.
|
2015-09-29 23:03:09 +10:00 |
|
Matthew Honnibal
|
a6ced80c0c
|
* Fix Issue #116: Misleading handling of True value in Language.__init__.
|
2015-09-29 20:54:12 +10:00 |
|
Matthew Honnibal
|
f9d2a5b651
|
* Fix issue #112: Replace unidecode with text-unidecode, to avoid license problems.
|
2015-09-28 23:40:18 +10:00 |
|
Matthew Honnibal
|
2c33a96ac3
|
Merge pull request #99 from rw/patch-1
Force SSL for downloading English language data.
|
2015-09-28 17:46:26 +10:00 |
|
Matthew Honnibal
|
abf0d930af
|
* Fix API for loading word vectors from a file.
|
2015-09-23 23:51:08 +10:00 |
|
Matthew Honnibal
|
f5c256745b
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-09-22 12:26:24 +10:00 |
|
Matthew Honnibal
|
528e26a506
|
* Add rule to ensure ordinals are preserved as single tokens
|
2015-09-22 12:26:05 +10:00 |
|
Robert
|
8711b64860
|
Force SSL for downloading English language data.
It would also be nice to have a checksum for this.
|
2015-09-21 17:26:01 -07:00 |
|
Matthew Honnibal
|
f7283a5067
|
* Fix vectors bugs for OOV words
|
2015-09-22 02:10:25 +02:00 |
|