Commit Graph

2121 Commits

Author SHA1 Message Date
Matthew Honnibal
c70eb776ae * Fix whitespace attachment, so that left/right children are consistent with head. 2015-10-13 15:58:22 +11:00
Matthew Honnibal
63df729edd * Fix test 2015-10-13 15:48:15 +11:00
Matthew Honnibal
00ae3edd3a * Fix tests 2015-10-13 15:39:52 +11:00
Matthew Honnibal
531182f937 * Fix Model.__reduce__ 2015-10-13 15:14:38 +11:00
Matthew Honnibal
6c227a6c1f * Fix Model.__reduce__ 2015-10-13 15:10:04 +11:00
Matthew Honnibal
f6d74b14de * Merge 2015-10-13 05:25:49 +02:00
Matthew Honnibal
59b792058d * Fix test_parse_navigate looking for test file in wrong place 2015-10-13 14:19:12 +11:00
Matthew Honnibal
358c82595c * Fix NAMES list in spacy/parts_of_speech.pyx 2015-10-13 14:18:45 +11:00
Matthew Honnibal
c1fdc487bc Merge branch 'attrs' 2015-10-13 14:03:41 +11:00
Matthew Honnibal
41cbbdefe3 Merge branch 'attrs' 2015-10-13 05:03:25 +02:00
Matthew Honnibal
38109dd912 * Allow preshed v0.42 2015-10-13 13:56:23 +11:00
Matthew Honnibal
d698aa546d Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-10-13 13:56:09 +11:00
Matthew Honnibal
1ca1beff4b * Allow preshed v0.42 in setup.py 2015-10-13 13:55:50 +11:00
Matthew Honnibal
404e484276 * Fix prag_sbd tests 2015-10-13 04:54:15 +02:00
Matthew Honnibal
b866f1443e Merge branch 'master' of https://github.com/honnibal/spaCy into attrs 2015-10-13 04:52:27 +02:00
Matthew Honnibal
6c2da06c18 * Package tag_map.json 2015-10-13 13:52:10 +11:00
Matthew Honnibal
e886e6a406 * Inc version 2015-10-13 13:46:17 +11:00
Matthew Honnibal
20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
f8de403483 * Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125 2015-10-13 13:44:41 +11:00
Matthew Honnibal
85e7944572 * Start trying to pickle Vocab 2015-10-13 13:44:41 +11:00
Matthew Honnibal
5ca57bd859 * Ensure Morphology can be pickled, to address Issue #125. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
dfe0ad51ff * Add pickle test for lemmatizer 2015-10-13 13:44:41 +11:00
Matthew Honnibal
0cee928467 * Allow StringStore to be pickled, to start addressing Issue #125 2015-10-13 13:44:41 +11:00
Matthew Honnibal
41012907a8 * Fix variable name 2015-10-13 13:44:40 +11:00
Matthew Honnibal
e70368d157 * Use lower case strings for dependency label names in symbols enum 2015-10-13 13:44:40 +11:00
Matthew Honnibal
7b4af3d1e7 * Fix parts_of_speech now that symbols list has been reformed 2015-10-13 13:44:40 +11:00
Matthew Honnibal
37b909b6b6 * Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd 2015-10-13 13:44:40 +11:00
Matthew Honnibal
ce65ec698c * Remove qualified naming in symbols 2015-10-13 13:44:40 +11:00
Matthew Honnibal
9f4be0adcd * Map NO_TAG to NIL in parts_of_speech.pxd 2015-10-13 13:44:40 +11:00
Matthew Honnibal
278e12f7e8 * Addmorphology symbols to morphology. May need to remove these as an enum. 2015-10-13 13:44:40 +11:00
Matthew Honnibal
d80067eda1 * Map empty string to NULL_ATTR in attrs 2015-10-13 13:44:40 +11:00
Matthew Honnibal
fd204d3cd5 * Map NIL to empty string in tag map 2015-10-13 13:44:40 +11:00
Matthew Honnibal
d70e8cac2c * Fix empty values in attributes and parts of speech, so symbols align correctly with the StringStore 2015-10-13 13:44:40 +11:00
Matthew Honnibal
ce3e306376 * Allow SPACY_DATA environment variable in website tests 2015-10-13 13:44:40 +11:00
Matthew Honnibal
a29c8ee23d * Add symbols to the vocab before reading the strings, so that they line up correctly 2015-10-13 13:44:39 +11:00
Matthew Honnibal
74c0853471 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-13 13:44:39 +11:00
Matthew Honnibal
10a4a843ea * Enumerate all symbols in one file 2015-10-13 13:44:39 +11:00
Matthew Honnibal
5c24ad3f5c * Whitespace 2015-10-13 13:44:39 +11:00
Matthew Honnibal
85ce36ab11 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-13 13:44:39 +11:00
Matthew Honnibal
3b79d67462 * Fix assertion in test_basic_create 2015-10-12 00:48:18 +11:00
Matthew Honnibal
afec8cac20 * Add more tests to probe mingw32 failure 2015-10-11 22:40:04 +11:00
Matthew Honnibal
dba1daf597 * Add script to test loading different components 2015-10-11 19:46:53 +11:00
Matthew Honnibal
92f750cf8b * Use a gzipped frequencies file in init_model 2015-10-11 06:59:44 +02:00
Matthew Honnibal
cc92f3f0ed * Fix Matcher test 2015-10-11 14:59:12 +11:00
Matthew Honnibal
1f8f81f0c8 * Fix missing import 2015-10-11 14:38:21 +11:00
Matthew Honnibal
693dd06547 * Add basic, non-data dependent class creation tests, without depending on pytest. For use in debugging MS build issues, for Issue #132 2015-10-11 14:29:12 +11:00
Matthew Honnibal
0090f79fbd * Use lower case strings for dependency label names in symbols enum 2015-10-10 22:59:14 +11:00
Matthew Honnibal
4c16307b10 * Fix parts_of_speech now that symbols list has been reformed 2015-10-10 22:58:34 +11:00
Matthew Honnibal
8f0f47b9a6 * Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd 2015-10-10 22:12:06 +11:00
Matthew Honnibal
6b30d1cf7b * Remove qualified naming in symbols 2015-10-10 22:11:38 +11:00