| 
							
							
								 Matthew Honnibal | 3ff09614e0 | Changes to matcher.pyx for new StringStore scheme | 2016-09-30 19:56:48 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | eceeaefe53 | Fix defaults for Parser and Entity, adding a blank= argument. | 2016-09-30 19:56:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d61feffe24 | Require new preshed | 2016-09-30 18:41:01 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 7537b0d637 | Update README.rst | 2016-09-30 14:41:35 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | 8039c1a92d | Update README.rst | 2016-09-30 14:21:19 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | d6cc4d3dfe | Update README.rst | 2016-09-30 14:17:23 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8423e8627f | Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good. | 2016-09-30 10:14:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d3dc5718b2 | Fix syntax error in Doc | 2016-09-28 11:39:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1b520e7bab | Improve docstrings for Doc object | 2016-09-28 11:15:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 81a47c01d8 | Fix test for empty sentence string. | 2016-09-27 19:21:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4cbf0d3bb6 | Handle errors when no valid actions are available, pointing users to the issue tracker. | 2016-09-27 19:19:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 430473bd98 | Raise errors when no actions are available, re Issue #429 | 2016-09-27 19:09:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fc4a7ad794 | Test and fix Issue #411: IndexError when .sents property is used on empty string. | 2016-09-27 18:49:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d370b7d45 | Add test for Issue #445, fixed in 3cb4d455d, with improved lemmatizer logic | 2016-09-27 18:39:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a2f3510d6d | Fix lemmatizer | 2016-09-27 17:47:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 07776d8096 | Fix pos name conflict in lemmatize | 2016-09-27 17:35:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 35cd953f9e | Fix pos name conflict with morphology | 2016-09-27 14:16:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8e7df3c4ca | Expect the parser data, if parser.load() is called. | 2016-09-27 14:02:12 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb4f201ad2 | Pass morphological features from tag map into the lemmatizer. | 2016-09-27 14:01:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 40509e8bca | Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed. | 2016-09-27 14:01:16 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9c8ac91d72 | Add test for Issue #435 | 2016-09-27 13:52:38 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3cb4d455d2 | Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435 | 2016-09-27 13:52:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e233328d38 | Fix Issue #371: Lexeme objects were unhashable. | 2016-09-27 13:22:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e382e48d9f | Temporarily patch handling of defaul templates for tagger. Need to move these to language_data. | 2016-09-27 13:21:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a44763af0e | Fix Issue #469: Incorrectly cased root label in noun chunk iterator | 2016-09-27 13:13:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b14b9b096b | Return None if /deps directory not present, instead of trying to load the parser. | 2016-09-26 18:48:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e07b9665f7 | Don't expect parser model | 2016-09-26 18:09:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ee6fa106da | Fix parser features | 2016-09-26 17:57:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e607e4b598 | Fix parser loading | 2016-09-26 17:51:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0b2d7ae9d6 | Fix Entity creation | 2016-09-26 15:41:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2debc4e0a2 | Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class. | 2016-09-26 11:57:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 722199acb8 | Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey | 2016-09-26 11:07:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae202e7a60 | Fix init_model.py | 2016-09-25 15:58:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e56653f848 | Add language data for German | 2016-09-25 15:44:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7db956133e | Move tokenizer data for German into spacy.de.language_data | 2016-09-25 15:37:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 95aaea0d3f | Refactor so that the tokenizer data is read from Python data, rather than from disk | 2016-09-25 14:49:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d7e9acdcdf | Add English language data, so that the tokenizer doesn't require the data download | 2016-09-25 14:49:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 82b8cc5efb | Whitespace | 2016-09-24 22:17:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fd58f7655a | Python 3 compatible basestring | 2016-09-24 22:16:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 082e95b19e | Python 3 compatible basestring | 2016-09-24 22:09:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f19af6cb2c | Python 3 compatible basestring | 2016-09-24 22:08:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3ed4cdfe32 | Handle pathlib.Path objects in CFile | 2016-09-24 22:01:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | df88690177 | Fix encoding of path variable | 2016-09-24 21:13:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | af847e07fc | Fix usage of pathlib for Python3 -- turning paths to strings. | 2016-09-24 21:05:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 453683aaf0 | Fix spacy/vocab.pyx | 2016-09-24 20:50:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d310dc73ef | Fix bin/init_model.py after refactoring | 2016-09-24 20:38:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fd65cf6cbb | Finish refactoring data loading | 2016-09-24 20:26:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 83e364188c | Mostly finished loading refactoring. Design is in place, but doesn't work yet. | 2016-09-24 15:42:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9dc8043a7e | Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib. | 2016-09-24 14:08:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b00f683a0c | Fix matcher test | 2016-09-24 11:20:58 +02:00 |  |