| 
							
							
								 Matthew Honnibal | 72abbb43fb | * Add type declarations in strings.pyx | 2015-11-06 00:47:26 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5b2af4864f | * When lemmatizing non-noun, non-verb, non-adj words, output lower-case | 2015-11-06 00:45:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 754bf04162 | * Remove declaration of Model.update | 2015-11-06 00:31:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e18bdff23a | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-11-06 00:26:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b9991fbd20 | * Update to use thinc 3.0 | 2015-11-06 00:25:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 864a8f45d8 | * Use unicode in StringStore.intern, instead of unreliably casting to bytes. | 2015-11-05 11:32:19 +00:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b18204cd52 | * Fix StringStore._realloc, re Issue #155 | 2015-11-05 11:28:26 +00:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f8004c5f65 | * Begin upgrading to improved thinc API | 2015-11-05 03:53:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | adc7bbd6cf | * Fix name of like_num in default_lex_attrs | 2015-11-04 22:02:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e96faf29e7 | * Rename like_number to like_num, to fix inconsistency re Issue #166 | 2015-11-04 22:01:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 65934b7cd4 | * Enforce import of ujson in strings.pyx, because otherwise it's too slow | 2015-11-04 00:32:02 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1ce5d5602d | * Rename Doc.data to Doc.c | 2015-11-04 00:17:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 68f479e821 | * Rename Doc.data to Doc.c | 2015-11-04 00:15:14 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3ddea19b2b | * Rename spans.pyx to span.pyx | 2015-11-04 00:14:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9482d616bc | * Rename spans.pyx to span.pyx | 2015-11-03 23:51:05 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 116da5990a | * Clean up setting of tag in doc.from_bytes | 2015-11-03 23:48:57 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9ec7b9c454 | * Clean up unused Constituent struct. | 2015-11-03 23:48:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1e99fcd413 | * Rename .repvec to .vector in C API | 2015-11-03 23:47:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ee3f9ba581 | * Fix test of serializer | 2015-11-03 19:45:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d06ba26371 | * Fix test of serializer | 2015-11-03 19:43:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4083059650 | Merge branch 'master' of https://github.com/honnibal/spaCy | 2015-11-03 09:07:19 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9e37437ba8 | * Fix assign_tag in doc.merge | 2015-11-03 19:07:02 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dde9e1357c | * Add todo to morphology.lemmatize | 2015-11-03 18:54:35 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ffedff9e6c | * Remove the archive after download, to save disk space | 2015-11-03 18:54:05 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 85372468e3 | * Fix serialize test | 2015-11-03 08:51:33 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 833eb35c57 | * Fix tag assignment in doc.from_array | 2015-11-03 18:45:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 09664177d7 | * Fix tag handling in doc.merge, and assign sent_start when setting heads. | 2015-11-03 18:15:52 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 389a373807 | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-11-03 18:07:25 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3f44b3e43f | * Mark serializer test as requiring models | 2015-11-03 18:07:08 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 25ed7be8f8 | Merge branch 'master' of https://github.com/honnibal/spaCy | 2015-11-03 07:58:17 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 604ceac4c6 | * Fix morphological assignment in doc.merge() | 2015-11-03 17:57:51 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5e040855a5 | * Ensure morphological features and lemmas are loaded in from_array, re Issue #152 | 2015-11-03 17:56:50 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5668feb235 | * Fix pickle test for python3 | 2015-11-03 04:57:02 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6161d2529a | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-11-03 13:36:30 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5887506f5d | * Don't expect lexemes.bin in Vocab | 2015-11-03 13:23:39 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f7dd377575 | * Adjust conjuncts iterator in Token | 2015-11-03 13:23:22 +11:00 |  | 
			
				
					| 
							
							
								 Andreas Grivas | d418f00eb1 | fixed error when printing unicode | 2015-11-02 20:23:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 52fc338001 | * Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue #152 | 2015-10-28 10:43:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1c0356e4c2 | * Set test file mode to w+t | 2015-10-26 22:40:48 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0fe98f358b | * Fix mode on text file for Python3 in strings test | 2015-10-26 22:25:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8ba9cf905e | * Fix mode on text file for Python3 in strings test | 2015-10-26 21:44:34 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a0730699b1 | * Fix mode on text file for Python3 in strings test | 2015-10-26 21:25:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 725344d349 | * Fix tempfile in test | 2015-10-26 21:08:18 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f11030aadc | * Remove out-dated TODO comment | 2015-10-26 12:33:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a371a1071d | * Save and load word vectors during pickling, re Issue #125 | 2015-10-26 12:33:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a824a98312 | * Add tests for pickling vectors, re: Issue #125 | 2015-10-26 12:31:05 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 314090cc78 | * Set vectors length when unpickling vocab, re Issue #125 | 2015-10-26 12:05:08 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e16f9e435 | * Move tests underneath spacy/ | 2015-10-26 00:07:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3a6e48e814 | Merge pull request #149 from chrisdubois/pickle-patch Add __reduce__ to Tokenizer so that English pickles. | 2015-10-25 15:30:31 +11:00 |  | 
			
				
					| 
							
							
								 Chris DuBois | dac8fe7bdb | Add __reduce__ to Tokenizer so that English pickles. - Add tests to test_pickle and test_tokenizer that save to tempfiles. | 2015-10-23 22:24:03 -07:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ff4fe524ee | * Fix exception for python 2 | 2015-10-23 01:56:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 341a3e85cd | * Upd downloaded data version | 2015-10-23 00:56:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f18fd8c659 | * Fix language.py for change in StringStore load API | 2015-10-23 03:48:12 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 23855db3ca | Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop | 2015-10-23 03:46:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4f13849065 | Merge pull request #145 from henningpeters/master better error reporting, cleanup | 2015-10-23 03:45:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3be94be0c0 | Merge pull request #148 from maxirmx/master Utf8 encoding for lemma_rules.json | 2015-10-22 21:46:28 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c86bda8d1a | * Fix import of uget | 2015-10-22 21:13:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2348a08481 | * Load/dump strings with a json file, instead of the hacky strings file we were using. | 2015-10-22 21:13:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9baf0abd59 | * Save vocab after training. | 2015-10-22 21:09:14 +11:00 |  | 
			
				
					| 
							
							
								 maxirmx | f07e4accd7 | Fixing encoding issue #4 | 2015-10-21 20:45:56 +03:00 |  | 
			
				
					| 
							
							
								 maxirmx | fcbfff043f | Fixing encoding issue #3 | 2015-10-21 15:52:34 +03:00 |  | 
			
				
					| 
							
							
								 maxirmx | fe9d2e2c4e | Fixing encode issue #2 | 2015-10-21 15:36:21 +03:00 |  | 
			
				
					| 
							
							
								 maxirmx | e4a1726f77 | Fixing encoding issue UTF-8 | 2015-10-21 14:16:37 +03:00 |  | 
			
				
					| 
							
							
								 Andreas Grivas | 93ada458e2 | added __repr__ that prints text in ipython for doc, token, and span objects | 2015-10-21 14:11:46 +03:00 |  | 
			
				
					| 
							
							
								 Henning Peters | ccffd2ef53 | fixed extract directory | 2015-10-21 07:59:34 +02:00 |  | 
			
				
					| 
							
							
								 Henning Peters | da4c9cee06 | assert filename match | 2015-10-20 19:33:59 +02:00 |  | 
			
				
					| 
							
							
								 Henning Peters | 4f703f0cb4 | better error reporting, cleanup | 2015-10-20 19:11:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9cdea6e450 | * Import uget correctly | 2015-10-19 08:32:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6727a46bb5 | * Fix Issue #118: Matcher behaves unpredictably when matches overlap. | 2015-10-19 16:45:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 135062d23c | * Fix error with merged text when merged region did not have trailing whitespace | 2015-10-19 15:47:04 +11:00 |  | 
			
				
					| 
							
							
								 Henning Peters | bfde91fa49 | add custom download tool (uget), replace wget with uget | 2015-10-18 12:35:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9839cd2c0b | * Fix whitespace_ calculation in Token | 2015-10-18 17:21:11 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c99285b8b9 | * Clean up C++ usage in spacy/matcher.pyx | 2015-10-18 17:20:50 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a7e6c5ac8f | * Fix Issue #122: Incorrect calculation of children after Doc.merge() | 2015-10-18 17:17:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3ba66f2dc7 | * Add string length cap in Tokenizer.__call__ | 2015-10-16 04:54:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6e0f985afc | * Fix token.conjuncts | 2015-10-15 03:49:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e0104ac81 | * Fix token.conjuncts | 2015-10-15 03:47:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b8f3345a82 | * Fix token.conjuncts method | 2015-10-15 03:36:01 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 23818f89b8 | * Fix token.conjuncts method | 2015-10-15 03:34:57 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7a15d1b60c | * Add Python 2/3 compatibility fix for copy_reg | 2015-10-13 20:04:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 329ae57520 | * Fix whitespace attachment thing | 2015-10-13 09:46:38 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 37919eac82 | * Fix whitespace attachment in simpler way. Leaves problem with setting left/right children. | 2015-10-13 18:23:24 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c70eb776ae | * Fix whitespace attachment, so that left/right children are consistent with head. | 2015-10-13 15:58:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 531182f937 | * Fix Model.__reduce__ | 2015-10-13 15:14:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c227a6c1f | * Fix Model.__reduce__ | 2015-10-13 15:10:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 358c82595c | * Fix NAMES list in spacy/parts_of_speech.pyx | 2015-10-13 14:18:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c1fdc487bc | Merge branch 'attrs' | 2015-10-13 14:03:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e886e6a406 | * Inc version | 2015-10-13 13:46:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 20fd36a0f7 | * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. | 2015-10-13 13:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f8de403483 | * Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125 | 2015-10-13 13:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 85e7944572 | * Start trying to pickle Vocab | 2015-10-13 13:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ca57bd859 | * Ensure Morphology can be pickled, to address Issue #125. | 2015-10-13 13:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0cee928467 | * Allow StringStore to be pickled, to start addressing Issue #125 | 2015-10-13 13:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 41012907a8 | * Fix variable name | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e70368d157 | * Use lower case strings for dependency label names in symbols enum | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7b4af3d1e7 | * Fix parts_of_speech now that symbols list has been reformed | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 37b909b6b6 | * Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ce65ec698c | * Remove qualified naming in symbols | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f4be0adcd | * Map NO_TAG to NIL in parts_of_speech.pxd | 2015-10-13 13:44:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 278e12f7e8 | * Addmorphology symbols to morphology. May need to remove these as an enum. | 2015-10-13 13:44:40 +11:00 |  |