| 
							
							
								 Matthew Honnibal | b89b489bb4 | * Implement both character and orth encoding in Packer, so that we can decide which to use per-text | 2015-07-19 22:39:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae78c9e3ce | * Implement character-based codec, so that we can do word/char backoff | 2015-07-19 22:03:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cd1d047cb8 | * Delete out-dated HuffmanCodec comment | 2015-07-19 18:28:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 879ef9fa3e | * Update tests for huffman codec | 2015-07-19 17:59:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b8086067d5 | * Build Huffman codec from unsorted inputs | 2015-07-19 17:58:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 317cbbc015 | * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. | 2015-07-19 15:18:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0973e2f107 | * Update serializer tests | 2015-07-18 22:46:40 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6b13e7227c | * Remove duplicate get_lex_attr method from doc.pyx | 2015-07-18 22:46:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e49c7f1478 | * Update oov check in tokenizer | 2015-07-18 22:45:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cfd842769e | * Allow infix tokens to be variable length | 2015-07-18 22:45:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5b4c78bbb2 | * Use an AttributeCodec based on orth for words. Still no oov handling mechanism. | 2015-07-18 22:43:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 82d84b0f2b | * Index lexemes by orth, instead of a lexemes vector. Breaks the mechanism for deciding not to own LexemeC structs during parsing. Need to reinstate this. | 2015-07-18 22:42:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4dddc8a69b | * Fix type declarations for attr_t. Remove unused id_t. | 2015-07-18 22:39:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ced59ab9ea | * Make minor efficiency improvement in Doc.__iter__ | 2015-07-18 04:10:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cd91914dd8 | * Fix hard-coded length | 2015-07-18 04:09:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b1d74ce60d | * Remove unused joint.pyx and joint.pxd files | 2015-07-17 23:31:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c27514512b | * Remove cruft ner/ directory | 2015-07-17 23:24:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f8d6d319f4 | * Remove cruft module | 2015-07-17 23:23:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fb0a641a2d | * Don't release the gil around Parser.parse. Does this indicate thread problems? | 2015-07-17 23:07:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a6ff7e6ca4 | * Fix redundant options in train.py | 2015-07-17 22:38:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e29daea85f | * Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe. | 2015-07-17 22:37:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6cfa83157e | Merge branch 'refactor' of ssh://github.com/honnibal/spaCy into refactor | 2015-07-17 21:38:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f7f0ad1a78 | * Fix tests | 2015-07-17 21:31:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 68374149ae | * Move huffman encoding test to tests/serialize directory | 2015-07-17 21:22:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e950f5a408 | * Tests for serializer | 2015-07-17 21:21:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cf0c788892 | * Tests passing on round-trip pack/unpack on basic example | 2015-07-17 21:20:48 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 44f39a876f | * Add a blank attrs.pyx | 2015-07-17 16:40:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c2c83120d4 | * Remove codec property from Vocab | 2015-07-17 16:40:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dfdf19f6a9 | * Draft a from_orth method for Doc | 2015-07-17 16:39:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a9149fdcbd | * Compile attrs.pyx | 2015-07-17 16:39:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9e3f17051b | * Move to ORTH instead of ID for encoding lexemes. Basic tests of the codec wrappers now passing | 2015-07-17 16:38:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 15ff739996 | * Fix passing of ID attribute in string store | 2015-07-17 14:49:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 95e57c2780 | * Remove unnecessary key and id properties from Utf8String. | 2015-07-17 01:40:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 234c7e440a | * Add spacy/serialize/__init__ files | 2015-07-17 01:37:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 221f7e51c7 | * Ignore spacy/serialize/*.cpp | 2015-07-17 01:36:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db9dfd2e23 | * Major refactor of serialization. Nearly complete now. | 2015-07-17 01:27:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c8282f9934 | * Work on serialization. Needs more reorganisation | 2015-07-16 19:56:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d8458d6a25 | * Fix attr_id_t import in Spans | 2015-07-16 19:55:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d1cb30dbc4 | * Remove unnecessary key and id properties from Utf8String. | 2015-07-16 19:29:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 897de2d438 | * Add 'bitter' property for serializer in English class | 2015-07-16 17:47:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fb54052ae0 | * Work on serializer design | 2015-07-16 17:46:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a6f401580d | * Add from_array function to Doc. | 2015-07-16 17:46:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2a5d050134 | * Give codec loading back to Vocab. | 2015-07-16 17:45:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8bf0f65f1c | * Remove dead code in strings.pyx | 2015-07-16 17:35:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a9c3863665 | * Fix inefficiency in StringStore.dump function | 2015-07-16 17:34:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b59d271510 | * Move serialization functionality into Serializer class | 2015-07-16 11:23:48 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 30be4f15da | * Import attrs from spacy.attrs, not spacy.typedefs | 2015-07-16 11:23:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c99e5f4aa | * Move serialization into Serializer class, with __call__ and train() api | 2015-07-16 11:22:35 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e2133d990e | * Move serialization functionality out into a Serializer object | 2015-07-16 11:21:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a6d040bd11 | * Import Lexeme attrs from spacy.attrs, not spacy.typedefs | 2015-07-16 11:20:08 +02:00 |  |