| 
							
							
								 Matthew Honnibal | 014b6936ac | Fix #608 -- __version__ should be available at the base of the package. | 2016-11-04 21:21:02 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 42b0736db7 | Increment version | 2016-11-04 20:04:21 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f93386994 | Update version | 2016-11-04 19:28:16 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1fb09c3dc1 | Fix morphology tagger | 2016-11-04 19:19:09 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a36353df47 | Temporarily put back the tokenize_from_strings method, while tests aren't updated yet. | 2016-11-04 19:18:07 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f0917b6808 | Fix Issue #376: and/or was tagged as a noun. | 2016-11-04 15:21:28 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 737816e86e | Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly. | 2016-11-04 15:16:20 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab952b4756 | Fix #578 -- Sputnik had been purging all files on --force, not just the relevant one. | 2016-11-04 10:44:11 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6e37ba1d82 | Fix #602, #603 --- Broken build | 2016-11-04 09:54:24 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 293c79c09a | Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. | 2016-11-04 00:29:07 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e30348b331 | Prefer to import from symbols instead of parts_of_speech | 2016-11-04 00:27:55 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4a8a2b6001 | Test #595 -- Bug in lemmatization of base forms. | 2016-11-04 00:27:32 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f1605df2ec | Fix #588: Matcher should reject empty pattern. | 2016-11-03 00:16:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 72b9bd57ec | Test Issue #588: Matcher accepts invalid, empty patterns. | 2016-11-03 00:09:35 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 41a90a7fbb | Add tokenizer exception for 'Ph.D.', to fix 592. | 2016-11-03 00:03:34 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 532318e80b | Import Jieba inside zh.make_doc | 2016-11-02 23:49:19 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f292f7f0e6 | Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. | 2016-11-02 23:48:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b6b01d4680 | Remove deprecated tokens_from_list test. | 2016-11-02 23:47:21 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d6c79e595 | Test Issue #599: .is_tagged and .is_parsed attributes not reflected after deserialization for empty documents. | 2016-11-02 23:40:11 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 05a8b752a2 | Fix Issue #600: Missing setters for Token attribute. | 2016-11-02 23:28:59 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 125c910a8d | Test Issue #600 | 2016-11-02 23:24:13 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e0c9695615 | Fix doc strings for tokenizer | 2016-11-02 23:15:39 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 80824f6d29 | Fix test | 2016-11-02 20:48:40 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dbe47902bc | Add import fr | 2016-11-02 20:48:29 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8f24dc1982 | Fix infixes in Italian | 2016-11-02 20:43:52 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 41a4766c1c | Fix infixes in spanish and portuguese | 2016-11-02 20:43:12 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d4bd96e8a | Fix infixes in french | 2016-11-02 20:41:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c09a8ce5bb | Add test for french tokenizer | 2016-11-02 20:40:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b012ae3044 | Add test for loading languages | 2016-11-02 20:38:48 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ad1c747c6b | Fix stray POS in language stubs | 2016-11-02 20:37:55 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e9e6fce576 | Handle null prefix/suffix/infix search in tokenizer | 2016-11-02 20:35:48 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 22647c2423 | Check that patterns aren't null before compiling regex for tokenizer | 2016-11-02 20:35:29 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ac735df33 | Link languages in __init__.py | 2016-11-02 20:05:14 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c68dfe2965 | Stub out support for Italian | 2016-11-02 20:03:24 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6dbf4f7ad7 | Stub out support for French, Spanish, Italian and Portuguese | 2016-11-02 20:02:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6b8b05ef83 | Specify that spacy.util is encoded in utf8 | 2016-11-02 19:58:00 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5363224395 | Add draft Jieba tokenizer for Chinese | 2016-11-02 19:57:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f7fee6c24b | Check for class-defined make_docs method before assigning one provided as an argument | 2016-11-02 19:57:13 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 19c1e83d3d | Work on draft Italian tokenizer | 2016-11-02 19:56:32 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9efe568177 | Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue #596 | 2016-11-02 12:31:34 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d8db648ebf | Add __init__.py file for regression tests | 2016-11-01 13:45:06 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 11664b9f20 | Fix variable error in token | 2016-11-01 13:28:00 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8c4d1b46ce | Fix variable error in Span | 2016-11-01 13:27:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e7af6b937f | Fix syntax error while fixing doc strings | 2016-11-01 13:27:32 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 62fc6b1afa | Use 32 bit hashes for OOV, re Issue #589, Issue #285 | 2016-11-01 13:27:13 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6977a2b8cd | Add test for Issue #589 | 2016-11-01 12:33:36 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b86f8af0c1 | Fix doc strings | 2016-11-01 12:25:36 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d563f1eadb | Fix Issue #587: Segfault in Matcher, due to simple error in the state machine. | 2016-10-28 17:42:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7e5f63a595 | Improve test slightly | 2016-10-28 17:41:16 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 782e4814f4 | Test Issue #587: Matcher segfaults on particular input | 2016-10-28 16:38:32 +02:00 |  |