Matthew Honnibal
a6ff7e6ca4
* Fix redundant options in train.py
2015-07-17 22:38:05 +02:00
Matthew Honnibal
e29daea85f
* Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe.
2015-07-17 22:37:24 +02:00
Matthew Honnibal
6cfa83157e
Merge branch 'refactor' of ssh://github.com/honnibal/spaCy into refactor
2015-07-17 21:38:04 +02:00
Matthew Honnibal
f7f0ad1a78
* Fix tests
2015-07-17 21:31:44 +02:00
Matthew Honnibal
68374149ae
* Move huffman encoding test to tests/serialize directory
2015-07-17 21:22:18 +02:00
Matthew Honnibal
e950f5a408
* Tests for serializer
2015-07-17 21:21:10 +02:00
Matthew Honnibal
cf0c788892
* Tests passing on round-trip pack/unpack on basic example
2015-07-17 21:20:48 +02:00
Matthew Honnibal
44f39a876f
* Add a blank attrs.pyx
2015-07-17 16:40:42 +02:00
Matthew Honnibal
c2c83120d4
* Remove codec property from Vocab
2015-07-17 16:40:11 +02:00
Matthew Honnibal
dfdf19f6a9
* Draft a from_orth method for Doc
2015-07-17 16:39:54 +02:00
Matthew Honnibal
a9149fdcbd
* Compile attrs.pyx
2015-07-17 16:39:25 +02:00
Matthew Honnibal
9e3f17051b
* Move to ORTH instead of ID for encoding lexemes. Basic tests of the codec wrappers now passing
2015-07-17 16:38:29 +02:00
Matthew Honnibal
15ff739996
* Fix passing of ID attribute in string store
2015-07-17 14:49:42 +02:00
Matthew Honnibal
95e57c2780
* Remove unnecessary key and id properties from Utf8String.
2015-07-17 01:40:18 +02:00
Matthew Honnibal
234c7e440a
* Add spacy/serialize/__init__ files
2015-07-17 01:37:33 +02:00
Matthew Honnibal
221f7e51c7
* Ignore spacy/serialize/*.cpp
2015-07-17 01:36:49 +02:00
Matthew Honnibal
db9dfd2e23
* Major refactor of serialization. Nearly complete now.
2015-07-17 01:27:54 +02:00
Matthew Honnibal
c8282f9934
* Work on serialization. Needs more reorganisation
2015-07-16 19:56:02 +02:00
Matthew Honnibal
d8458d6a25
* Fix attr_id_t import in Spans
2015-07-16 19:55:21 +02:00
Matthew Honnibal
897de2d438
* Add 'bitter' property for serializer in English class
2015-07-16 17:47:53 +02:00
Matthew Honnibal
fb54052ae0
* Work on serializer design
2015-07-16 17:46:46 +02:00
Matthew Honnibal
a6f401580d
* Add from_array function to Doc.
2015-07-16 17:46:11 +02:00
Matthew Honnibal
2a5d050134
* Give codec loading back to Vocab.
2015-07-16 17:45:42 +02:00
Matthew Honnibal
8bf0f65f1c
* Remove dead code in strings.pyx
2015-07-16 17:35:53 +02:00
Matthew Honnibal
a9c3863665
* Fix inefficiency in StringStore.dump function
2015-07-16 17:34:32 +02:00
Matthew Honnibal
b59d271510
* Move serialization functionality into Serializer class
2015-07-16 11:23:48 +02:00
Matthew Honnibal
30be4f15da
* Import attrs from spacy.attrs, not spacy.typedefs
2015-07-16 11:23:25 +02:00
Matthew Honnibal
6c99e5f4aa
* Move serialization into Serializer class, with __call__ and train() api
2015-07-16 11:22:35 +02:00
Matthew Honnibal
e2133d990e
* Move serialization functionality out into a Serializer object
2015-07-16 11:21:44 +02:00
Matthew Honnibal
a6d040bd11
* Import Lexeme attrs from spacy.attrs, not spacy.typedefs
2015-07-16 11:20:08 +02:00
Matthew Honnibal
d8bc279e0c
* Fix 'you' contraction capitals in specials.json
2015-07-16 01:28:32 +02:00
Matthew Honnibal
45ae1ce428
* Remove unused declaration in parser
2015-07-16 01:27:11 +02:00
Matthew Honnibal
efa80096f1
* Upd attrs id list
2015-07-16 01:26:54 +02:00
Matthew Honnibal
01fab6bb90
* Improve de/serialize functions
2015-07-16 01:26:35 +02:00
Matthew Honnibal
0e07c1ed2a
* draft de/serialization functions in doc.pyx
2015-07-16 01:16:33 +02:00
Matthew Honnibal
9d956b07e9
* Fix import of attrs in doc.pyx, and update the get_token_attr function.
2015-07-16 01:15:34 +02:00
Matthew Honnibal
65251e7625
* Remove redundant attr_id_t from typedefs.pxd
2015-07-16 00:58:51 +02:00
Matthew Honnibal
9a8db9743c
* Remove gil from parser.call
2015-07-14 23:47:33 +02:00
Matthew Honnibal
3c1e3e9ee8
* Fix capitalization problems in specials.json
2015-07-14 23:46:31 +02:00
Matthew Honnibal
38ca0c33f5
Merge branch 'neuralnet' into refactor
...
Mostly refactors parser, to use new thinc3.2 Example class.
Aim is to remove use of shared memory, so that we can parallelize
over documents easily.
Conflicts:
setup.py
spacy/syntax/parser.pxd
spacy/syntax/parser.pyx
spacy/syntax/stateclass.pyx
2015-07-14 14:13:47 +02:00
Matthew Honnibal
6405d2384c
* Add first draft of annotation standards doc
2015-07-14 12:50:13 +02:00
Matthew Honnibal
af54d05d60
* Remove sense stuff from init_model
2015-07-14 10:56:17 +02:00
Matthew Honnibal
3de1b3ef1d
* Change get_freqs to take a list of files
2015-07-14 10:55:56 +02:00
Matthew Honnibal
935ac53ee3
* Extend count_by method
2015-07-14 03:20:09 +02:00
Matthew Honnibal
39c93116eb
* Add get_freqs script
2015-07-14 02:31:32 +02:00
Matthew Honnibal
3b5baa660f
* Fix tokenizer
2015-07-14 00:10:51 +02:00
Matthew Honnibal
2ae0b439b2
* Fix space check in gold.pyx
2015-07-14 00:10:27 +02:00
Matthew Honnibal
81aa4e6dcc
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API
2015-07-14 00:10:11 +02:00
Matthew Honnibal
e1c702e498
* Upd tests after refactor
2015-07-14 00:08:50 +02:00
Matthew Honnibal
ba9a22ae0b
* Ignore cpp files in spacy/tokens
2015-07-13 22:30:15 +02:00