Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							046c38bd26 
							
						 
					 
					
						
						
							
							Remove 'cleanup' of strings ( #6007 )  
						
						... 
						
						
						
						A long time ago we went to some trouble to try to clean up "unused"
strings, to avoid the `StringStore` growing in long-running processes.
This never really worked reliably, and I think it was a really wrong
approach. It's much better to let the user reload the `nlp` object as
necessary, now that the string encoding is stable (in v1, the string IDs
were sequential integers, making reloading the NLP object really
annoying.)
The extra book-keeping does make some performance difference, and the
feature is unsed, so it's past time we killed it. 
						
					 
					
						2020-09-01 16:12:15 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							648f61d077 
							
						 
					 
					
						
						
							
							Tidy up compiler flags and imports ( #5071 )  
						
						
						
					 
					
						2020-03-02 11:48:10 +01:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							3c600adf23 
							
						 
					 
					
						
						
							
							Try to fix StringStore clean up (see  #1506 )  
						
						
						
					 
					
						2017-11-11 03:11:27 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3e037054c8 
							
						 
					 
					
						
						
							
							Remove obsolete is_frozen functionality from StringStore  
						
						
						
					 
					
						2017-10-16 19:23:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a5606c3eda 
							
						 
					 
					
						
						
							
							Work on changing StringStore to return hashes.  
						
						
						
					 
					
						2017-05-28 12:36:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							276478fe0f 
							
						 
					 
					
						
						
							
							Update strings.pxd  
						
						
						
					 
					
						2016-10-24 14:00:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ea23b64cc8 
							
						 
					 
					
						
						
							
							Refactor training, with new spacy.train module. Defaults still a little awkward.  
						
						
						
					 
					
						2016-10-09 12:24:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ca32a1ab01 
							
						 
					 
					
						
						
							
							Revert "Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."  
						
						... 
						
						
						
						This reverts commit 8423e8627f 
						
					 
					
						2016-09-30 20:20:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							de01e427fd 
							
						 
					 
					
						
						
							
							Revert "Changes to strings.pyx for new StringStore scheme"  
						
						... 
						
						
						
						This reverts commit 22d4752d64 
						
					 
					
						2016-09-30 20:19:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							22d4752d64 
							
						 
					 
					
						
						
							
							Changes to strings.pyx for new StringStore scheme  
						
						
						
					 
					
						2016-09-30 19:58:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8423e8627f 
							
						 
					 
					
						
						
							
							Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.  
						
						
						
					 
					
						2016-09-30 10:14:47 +02:00 
						 
				 
			
				
					
						
							
							
								Stefan Behnel 
							
						 
					 
					
						
						
						
						
							
						
						
							f2cfbfc412 
							
						 
					 
					
						
						
							
							remove internal redundancy and overhead from StringStore  
						
						
						
					 
					
						2016-03-24 15:25:27 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							864a8f45d8 
							
						 
					 
					
						
						
							
							* Use unicode in StringStore.intern, instead of unreliably casting to bytes.  
						
						
						
					 
					
						2015-11-05 11:32:19 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							109106a949 
							
						 
					 
					
						
						
							
							* Replace UniStr, using unicode objects instead  
						
						
						
					 
					
						2015-07-22 04:52:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01a97b90f3 
							
						 
					 
					
						
						
							
							* Fix header for string store  
						
						
						
					 
					
						2015-07-20 12:06:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4dddc8a69b 
							
						 
					 
					
						
						
							
							* Fix type declarations for attr_t. Remove unused id_t.  
						
						
						
					 
					
						2015-07-18 22:39:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95e57c2780 
							
						 
					 
					
						
						
							
							* Remove unnecessary key and id properties from Utf8String.  
						
						
						
					 
					
						2015-07-17 01:40:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ce2edd6312 
							
						 
					 
					
						
						
							
							* Tmp commit. Refactoring to create a Python Lexeme class.  
						
						
						
					 
					
						2015-01-12 10:26:22 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							73f200436f 
							
						 
					 
					
						
						
							
							* Tests passing except for morphology/lemmatization stuff  
						
						
						
					 
					
						2014-12-23 11:40:32 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cf8d26c3d2 
							
						 
					 
					
						
						
							
							* POS tagger training working after reorg  
						
						
						
					 
					
						2014-12-22 08:54:47 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4c4aa2c5c9 
							
						 
					 
					
						
						
							
							* Work on train  
						
						
						
					 
					
						2014-12-22 07:25:43 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e1c1a4b868 
							
						 
					 
					
						
						
							
							* Tmp  
						
						
						
					 
					
						2014-12-21 05:36:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							89a1cc1a48 
							
						 
					 
					
						
						
							
							* Move murmurhash to .pxd in strings file  
						
						
						
					 
					
						2014-12-20 07:41:08 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7d48bba6c4 
							
						 
					 
					
						
						
							
							* Move StringStore class to its own file  
						
						
						
					 
					
						2014-12-20 06:42:01 +11:00