Commit Graph

2088 Commits

Author SHA1 Message Date
Matthew Honnibal
20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
f8de403483 * Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125 2015-10-13 13:44:41 +11:00
Matthew Honnibal
85e7944572 * Start trying to pickle Vocab 2015-10-13 13:44:41 +11:00
Matthew Honnibal
5ca57bd859 * Ensure Morphology can be pickled, to address Issue #125. 2015-10-13 13:44:41 +11:00
Matthew Honnibal
dfe0ad51ff * Add pickle test for lemmatizer 2015-10-13 13:44:41 +11:00
Matthew Honnibal
0cee928467 * Allow StringStore to be pickled, to start addressing Issue #125 2015-10-13 13:44:41 +11:00
Matthew Honnibal
41012907a8 * Fix variable name 2015-10-13 13:44:40 +11:00
Matthew Honnibal
e70368d157 * Use lower case strings for dependency label names in symbols enum 2015-10-13 13:44:40 +11:00
Matthew Honnibal
7b4af3d1e7 * Fix parts_of_speech now that symbols list has been reformed 2015-10-13 13:44:40 +11:00
Matthew Honnibal
37b909b6b6 * Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd 2015-10-13 13:44:40 +11:00
Matthew Honnibal
ce65ec698c * Remove qualified naming in symbols 2015-10-13 13:44:40 +11:00
Matthew Honnibal
9f4be0adcd * Map NO_TAG to NIL in parts_of_speech.pxd 2015-10-13 13:44:40 +11:00
Matthew Honnibal
278e12f7e8 * Addmorphology symbols to morphology. May need to remove these as an enum. 2015-10-13 13:44:40 +11:00
Matthew Honnibal
d80067eda1 * Map empty string to NULL_ATTR in attrs 2015-10-13 13:44:40 +11:00
Matthew Honnibal
fd204d3cd5 * Map NIL to empty string in tag map 2015-10-13 13:44:40 +11:00
Matthew Honnibal
d70e8cac2c * Fix empty values in attributes and parts of speech, so symbols align correctly with the StringStore 2015-10-13 13:44:40 +11:00
Matthew Honnibal
ce3e306376 * Allow SPACY_DATA environment variable in website tests 2015-10-13 13:44:40 +11:00
Matthew Honnibal
a29c8ee23d * Add symbols to the vocab before reading the strings, so that they line up correctly 2015-10-13 13:44:39 +11:00
Matthew Honnibal
74c0853471 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-13 13:44:39 +11:00
Matthew Honnibal
10a4a843ea * Enumerate all symbols in one file 2015-10-13 13:44:39 +11:00
Matthew Honnibal
5c24ad3f5c * Whitespace 2015-10-13 13:44:39 +11:00
Matthew Honnibal
85ce36ab11 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-13 13:44:39 +11:00
Matthew Honnibal
3b79d67462 * Fix assertion in test_basic_create 2015-10-12 00:48:18 +11:00
Matthew Honnibal
afec8cac20 * Add more tests to probe mingw32 failure 2015-10-11 22:40:04 +11:00
Matthew Honnibal
dba1daf597 * Add script to test loading different components 2015-10-11 19:46:53 +11:00
Matthew Honnibal
cc92f3f0ed * Fix Matcher test 2015-10-11 14:59:12 +11:00
Matthew Honnibal
1f8f81f0c8 * Fix missing import 2015-10-11 14:38:21 +11:00
Matthew Honnibal
693dd06547 * Add basic, non-data dependent class creation tests, without depending on pytest. For use in debugging MS build issues, for Issue #132 2015-10-11 14:29:12 +11:00
Matthew Honnibal
08e29519a6 * Add test for how spaces are attached by the parser. 2015-10-10 16:03:13 +11:00
Matthew Honnibal
dfbcff2ff1 * Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate. 2015-10-10 15:54:55 +11:00
Matthew Honnibal
bdcb8d695c * Add non-breaking space to specials.json 2015-10-10 15:54:06 +11:00
Matthew Honnibal
9dd2f25c74 * Fix Issue #131: Force whitespace characters to attach syntactically to previous token, and ensure they cannot serve as stand-alone 'sentence' units. 2015-10-10 15:53:30 +11:00
Matthew Honnibal
8b39feefbe * Add dependency post-process rule to ensure spaces are attached to neighbouring tokens, so that they can't be sentence boundaries 2015-10-10 15:32:13 +11:00
Matthew Honnibal
1521cf25c9 * Fix merge problem in test_parse_navigate 2015-10-10 15:04:01 +11:00
Matthew Honnibal
c12d36d5f4 * Fix quote marks in lemma_rules 2015-10-10 15:03:36 +11:00
Matthew Honnibal
2153067958 * Fix use of io in strings.pyx 2015-10-10 15:03:12 +11:00
Matthew Honnibal
ec874247b5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-10-10 14:23:51 +11:00
Matthew Honnibal
30de4135c9 * Fix merge problem 2015-10-10 14:22:32 +11:00
Matthew Honnibal
dc393a5f1d Merge pull request #126 from tomtung/master
Improve slicing support for both Doc and Span
2015-10-10 14:14:57 +11:00
Matthew Honnibal
6ea8f99a10 Merge branch 'alvations-master' 2015-10-10 14:13:24 +11:00
Matthew Honnibal
83dccf0fd7 * Use io module insteads of deprecated codecs module 2015-10-10 14:13:01 +11:00
Matthew Honnibal
55cd7008bb Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-10-10 14:07:55 +11:00
Matthew Honnibal
57b3cd4661 * Add smart-quotes to lemma rules 2015-10-10 14:06:46 +11:00
Matthew Honnibal
7e7f28e1fd * Add smart-quote possessive marker in generate_specials 2015-10-10 14:06:09 +11:00
Matthew Honnibal
41c50e509c Merge pull request #137 from henningpeters/master
push version and add spacy channel
2015-10-10 01:40:29 +11:00
Matthew Honnibal
8b8d048385 Merge pull request #135 from henningpeters/patch-1
remove compile warning noise
2015-10-10 01:40:15 +11:00
Matthew Honnibal
d31c911f83 Merge pull request #136 from henningpeters/patch-2
cleanup
2015-10-10 01:40:00 +11:00
Henning Peters
7a47c0c872 push version 2015-10-09 16:37:57 +02:00
Henning Peters
88b2f7ea5d push version and add spacy channel 2015-10-09 16:30:23 +02:00
Henning Peters
876fc99c44 cleanup
looks like this file was accidentally added
2015-10-09 16:11:56 +02:00