ines 
							
						 
					 
					
						
						
						
						
							
						
						
							fd6207426a 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						 
						
						
						
					 
					
						2018-07-09 18:05:10 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ole Henrik Skogstrøm 
							
						 
					 
					
						
						
						
						
							
						
						
							c21efea9bb 
							
						 
					 
					
						
						
							
							Add sent property to token ( #2521 )  
						
						 
						
						... 
						
						
						
						* Add sent property to token
* Refactored and cleaned up copy paste errors. 
						
					 
					
						2018-07-06 15:54:15 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							cae4457c38 
							
						 
					 
					
						
						
							
							💫  Add .similarity warnings for no vectors and option to exclude warnings ( #2197 )  
						
						 
						
						... 
						
						
						
						* Add logic to filter out warning IDs via environment variable
Usage: SPACY_WARNING_EXCLUDE=W001,W007
* Add warnings for empty vectors
* Add warning if no word vectors are used in .similarity methods
For example, if only tensors are available in small models – should hopefully clear up some confusion around this
* Capture warnings in tests
* Rename SPACY_WARNING_EXCLUDE to SPACY_WARNING_IGNORE 
						
					 
					
						2018-05-21 01:22:38 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							546dd99cdf 
							
						 
					 
					
						
						
							
							Merge master into develop -- mostly Arabic and website  
						
						 
						
						
						
					 
					
						2018-05-15 18:14:28 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Douglas Knox 
							
						 
					 
					
						
						
						
						
							
						
						
							9b49a40f4e 
							
						 
					 
					
						
						
							
							Test and fix for Issue  #2219  ( #2272 )  
						
						 
						
						... 
						
						
						
						Test and fix for Issue #2219 : Token.similarity() failed if single letter 
						
					 
					
						2018-05-03 18:40:46 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a8bc947fd4 
							
						 
					 
					
						
						
							
							Fix Token.set_extension  
						
						 
						
						
						
					 
					
						2018-04-29 15:48:19 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c4a6d66fa 
							
						 
					 
					
						
						
							
							Merge master into develop. Big merge, many conflicts -- need to review  
						
						 
						
						
						
					 
					
						2018-04-29 14:49:26 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1c6d77610c 
							
						 
					 
					
						
						
							
							Add remove_extension method on Doc, Token and Span ( closes   #2242 )  
						
						 
						
						
						
					 
					
						2018-04-28 23:33:09 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							62b4b527d7 
							
						 
					 
					
						
						
							
							Don't raise error if set_extension has getter and setter ( closes   #2177 )  
						
						 
						
						... 
						
						
						
						Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests. 
						
					 
					
						2018-04-03 18:30:17 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3141e04822 
							
						 
					 
					
						
						
							
							💫  New system for error messages and warnings ( #2163 )  
						
						 
						
						... 
						
						
						
						* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None 
						
					 
					
						2018-04-03 15:50:31 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							de9fd091ac 
							
						 
					 
					
						
						
							
							Fix   #2014 : token.pos_ not writeable  
						
						 
						
						
						
					 
					
						2018-03-27 21:21:11 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1f7229f40f 
							
						 
					 
					
						
						
							
							Revert "Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop"  
						
						 
						
						... 
						
						
						
						This reverts commit c9ba3d3c2d , reversing
changes made to 92c26a35d4 . 
						
					 
					
						2018-03-27 19:23:02 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							63a267b34d 
							
						 
					 
					
						
						
							
							Fix   #2073 : Token.set_extension not working  
						
						 
						
						
						
					 
					
						2018-03-27 13:36:20 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							fbf48b3f9f 
							
						 
					 
					
						
						
							
							lemma property to return hash instead of unicode  
						
						 
						
						
						
					 
					
						2018-03-14 17:03:00 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								4altinok 
							
						 
					 
					
						
						
						
						
							
						
						
							ca8728035d 
							
						 
					 
					
						
						
							
							added new lex feat to token  
						
						 
						
						
						
					 
					
						2018-02-11 18:55:48 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ccb51a9f36 
							
						 
					 
					
						
						
							
							Make .similarity() return 1.0 if all orth attrs match  
						
						 
						
						
						
					 
					
						2018-01-15 16:29:48 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b904d81e9a 
							
						 
					 
					
						
						
							
							Fix rich comparison against None objects.  Closes   #1757  
						
						 
						
						
						
					 
					
						2018-01-15 15:51:25 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0cb090e526 
							
						 
					 
					
						
						
							
							Fix infinite recursion in token.sent_start.  Closes   #1640  
						
						 
						
						
						
					 
					
						2018-01-14 15:02:15 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5cbe913b6f 
							
						 
					 
					
						
						
							
							Don't raise deprecation warning in property.  Closes   #1813 ,  #1712  
						
						 
						
						
						
					 
					
						2018-01-14 14:55:58 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fb26b2cb12 
							
						 
					 
					
						
						
							
							Use lookup lemmatizer if lemma unset  
						
						 
						
						
						
					 
					
						2017-11-23 12:31:58 +00:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							144a93c2a5 
							
						 
					 
					
						
						
							
							Back-off to tensor for similarity if no vectors  
						
						 
						
						
						
					 
					
						2017-11-03 20:56:33 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9659391944 
							
						 
					 
					
						
						
							
							Update deprecated methods and add warnings  
						
						 
						
						
						
					 
					
						2017-11-01 16:49:42 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e0ebee81c 
							
						 
					 
					
						
						
							
							Add Token.is_sent_start property, so can deprecate Token.sent_start  
						
						 
						
						
						
					 
					
						2017-11-01 13:27:14 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							86eba61fae 
							
						 
					 
					
						
						
							
							Fix token.vector when vectors are missing  
						
						 
						
						
						
					 
					
						2017-11-01 00:47:35 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							544a407b93 
							
						 
					 
					
						
						
							
							Tidy up Doc, Token and Span and add missing docs  
						
						 
						
						
						
					 
					
						2017-10-27 17:07:26 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6a0483b7aa 
							
						 
					 
					
						
						
							
							Tidy up and document Doc, Token and Span  
						
						 
						
						
						
					 
					
						2017-10-27 15:41:45 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b66b8f028b 
							
						 
					 
					
						
						
							
							Fix   #1375  -- out-of-bounds on token.nbor()  
						
						 
						
						
						
					 
					
						2017-10-24 12:10:39 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e0a9b02b67 
							
						 
					 
					
						
						
							
							Merge Span._ and Span.as_doc methods  
						
						 
						
						
						
					 
					
						2017-10-09 22:00:15 -05:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3fc4fe61d2 
							
						 
					 
					
						
						
							
							Fix typo  
						
						 
						
						
						
					 
					
						2017-10-10 04:15:14 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							080afd4924 
							
						 
					 
					
						
						
							
							Add ternary value setting to Token.sent_start  
						
						 
						
						
						
					 
					
						2017-10-08 23:51:58 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							668a0ea640 
							
						 
					 
					
						
						
							
							Pass extensions into Underscore class  
						
						 
						
						
						
					 
					
						2017-10-07 18:56:01 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d55d6e1cfa 
							
						 
					 
					
						
						
							
							Fix comparison of Token from different docs.  Closes   #1257  
						
						 
						
						
						
					 
					
						2017-08-19 16:39:32 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f4662e9218 
							
						 
					 
					
						
						
							
							Fix vector linkage for token  
						
						 
						
						
						
					 
					
						2017-06-04 14:19:58 -05:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							498ad85309 
							
						 
					 
					
						
						
							
							Try using tensor for vector/similarity methdos  
						
						 
						
						
						
					 
					
						2017-05-30 23:35:17 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fe11564b8e 
							
						 
					 
					
						
						
							
							Finish stringstore change. Also xfail vectors tests  
						
						 
						
						
						
					 
					
						2017-05-28 15:10:22 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2445707f3c 
							
						 
					 
					
						
						
							
							Re-delegate vectors to vocab  
						
						 
						
						
						
					 
					
						2017-05-28 11:46:10 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01e59e4e6e 
							
						 
					 
					
						
						
							
							* Add Token.sent_start property, re Issue  #235  
						
						 
						
						
						
					 
					
						2017-05-23 18:41:11 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							7ed8a92ed1 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Token  
						
						 
						
						
						
					 
					
						2017-05-20 15:13:33 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a804045597 
							
						 
					 
					
						
						
							
							Use is_ancestor instead of deprecated is_ancestor_of  
						
						 
						
						
						
					 
					
						2017-05-19 20:23:40 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e9e62b01b0 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Token  
						
						 
						
						
						
					 
					
						2017-05-19 18:47:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9d85cda8e4 
							
						 
					 
					
						
						
							
							Fix models error message and use about.__docs_models__ (see  #1051 )  
						
						 
						
						
						
					 
					
						2017-05-13 13:05:47 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6b942763f0 
							
						 
					 
					
						
						
							
							Tidy up imports  
						
						 
						
						
						
					 
					
						2017-05-13 13:04:40 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6a4221a6de 
							
						 
					 
					
						
						
							
							Allow lemma to be set from Python. Re  #973  
						
						 
						
						
						
					 
					
						2017-04-16 18:07:53 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0739ae7b76 
							
						 
					 
					
						
						
							
							Tidy up and fix formatting and imports  
						
						 
						
						
						
					 
					
						2017-04-15 13:05:15 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e71a1f4bd0 
							
						 
					 
					
						
						
							
							Fix download commands in error messages (see  #946 )  
						
						 
						
						
						
					 
					
						2017-04-01 10:20:57 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc3900e5b2 
							
						 
					 
					
						
						
							
							Allow ent_id to be set in Token  
						
						 
						
						
						
					 
					
						2017-03-31 14:00:14 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66c1f194f9 
							
						 
					 
					
						
						
							
							Use consistent unicode declarations  
						
						 
						
						
						
					 
					
						2017-03-12 13:07:28 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Roman Inflianskas 
							
						 
					 
					
						
						
						
						
							
						
						
							66e1109b53 
							
						 
					 
					
						
						
							
							Add support for Universal Dependencies v2.0  
						
						 
						
						
						
					 
					
						2017-03-03 13:17:34 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e7f8e13cf3 
							
						 
					 
					
						
						
							
							Make Token hashable.  Fixes   #743  
						
						 
						
						
						
					 
					
						2017-01-16 13:27:57 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							12cd27b821 
							
						 
					 
					
						
						
							
							Amend 8ae8b443f: Handle comparison with None tokens.  
						
						 
						
						
						
					 
					
						2017-01-11 13:03:32 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ae8b443f1 
							
						 
					 
					
						
						
							
							Add richcmp method to Token.  Closes   #631  
						
						 
						
						
						
					 
					
						2017-01-09 19:30:31 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							404019ad2f 
							
						 
					 
					
						
						
							
							Fix issue  #672 : ent_iob_ was a string, not unicode, due to missing unicode_literals statement.  
						
						 
						
						
						
					 
					
						2016-12-18 22:33:53 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							293c79c09a 
							
						 
					 
					
						
						
							
							Fix   #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.  
						
						 
						
						
						
					 
					
						2016-11-04 00:29:07 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							05a8b752a2 
							
						 
					 
					
						
						
							
							Fix Issue  #600 : Missing setters for Token attribute.  
						
						 
						
						
						
					 
					
						2016-11-02 23:28:59 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							11664b9f20 
							
						 
					 
					
						
						
							
							Fix variable error in token  
						
						 
						
						
						
					 
					
						2016-11-01 13:28:00 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b86f8af0c1 
							
						 
					 
					
						
						
							
							Fix doc strings  
						
						 
						
						
						
					 
					
						2016-11-01 12:25:36 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d5742b773 
							
						 
					 
					
						
						
							
							Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.  
						
						 
						
						
						
					 
					
						2016-10-19 20:54:22 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7fd98fc91c 
							
						 
					 
					
						
						
							
							Remove deprecation shim around str/bytes in Token.  
						
						 
						
						
						
					 
					
						2016-10-17 14:02:47 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c1abc8f6ed 
							
						 
					 
					
						
						
							
							Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec  
						
						 
						
						
						
					 
					
						2016-10-17 11:18:41 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d10e2005c 
							
						 
					 
					
						
						
							
							Defer some attributes to Doc, via getters_for_tokens attribute.  
						
						 
						
						
						
					 
					
						2016-10-17 02:44:49 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ca32a1ab01 
							
						 
					 
					
						
						
							
							Revert "Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."  
						
						 
						
						... 
						
						
						
						This reverts commit 8423e8627f . 
						
					 
					
						2016-09-30 20:20:22 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6736977d82 
							
						 
					 
					
						
						
							
							Revert "Changes to Doc and Token for new string store scheme"  
						
						 
						
						... 
						
						
						
						This reverts commit 99de44d864 . 
						
					 
					
						2016-09-30 20:11:15 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99de44d864 
							
						 
					 
					
						
						
							
							Changes to Doc and Token for new string store scheme  
						
						 
						
						
						
					 
					
						2016-09-30 20:00:21 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8423e8627f 
							
						 
					 
					
						
						
							
							Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.  
						
						 
						
						
						
					 
					
						2016-09-30 10:14:47 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4de13606fd 
							
						 
					 
					
						
						
							
							Fix token.pyx  
						
						 
						
						
						
					 
					
						2016-09-23 15:07:07 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b4de419e19 
							
						 
					 
					
						
						
							
							Import hash_t typedef in token.pyx  
						
						 
						
						
						
					 
					
						2016-09-23 14:22:06 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c1a2e96604 
							
						 
					 
					
						
						
							
							Clean up notes at end of token.pyx  
						
						 
						
						
						
					 
					
						2016-09-21 20:45:51 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							58e83fe34b 
							
						 
					 
					
						
						
							
							Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.  
						
						 
						
						
						
					 
					
						2016-09-21 14:54:55 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6df3858dbc 
							
						 
					 
					
						
						
							
							* Fix Issue  #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.  
						
						 
						
						
						
					 
					
						2016-04-12 13:17:59 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							872695759d 
							
						 
					 
					
						
						
							
							Merge pull request  #306  from wbwseeker/german_noun_chunks  
						
						 
						
						... 
						
						
						
						add German noun chunk functionality 
						
					 
					
						2016-04-08 00:54:24 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							d65ef41d08 
							
						 
					 
					
						
						
							
							make error messages language independent  
						
						 
						
						
						
					 
					
						2016-03-24 11:47:09 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							5080077097 
							
						 
					 
					
						
						
							
							revert init_model.py back to pre-german state (because it makes more sense)  
						
						 
						
						... 
						
						
						
						simplify token.n_rights and token.n_lefts 
						
					 
					
						2016-03-21 16:10:25 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							2ae253ef5b 
							
						 
					 
					
						
						
							
							changed head.__set__ to make it simpler  
						
						 
						
						
						
					 
					
						2016-03-14 13:43:48 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							46e3f979f1 
							
						 
					 
					
						
						
							
							add function for setting head and label to token  
						
						 
						
						... 
						
						
						
						change PseudoProjectivity.deprojectivize to use these functions 
						
					 
					
						2016-03-11 17:31:06 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							03fb498dbe 
							
						 
					 
					
						
						
							
							introduce lang field for LexemeC to hold language id  
						
						 
						
						... 
						
						
						
						put noun_chunk logic into iterators.py for each language separately 
						
					 
					
						2016-03-10 13:01:34 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							3448cb40a4 
							
						 
					 
					
						
						
							
							integrated pseudo-projective parsing into parser  
						
						 
						
						... 
						
						
						
						- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures 
						
					 
					
						2016-03-01 10:09:08 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							419edfab50 
							
						 
					 
					
						
						
							
							* Use generic flags for the new attributes until they're added  
						
						 
						
						
						
					 
					
						2016-02-04 15:50:54 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							11810be33e 
							
						 
					 
					
						
						
							
							* Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct  
						
						 
						
						
						
					 
					
						2016-02-04 13:04:16 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							995b2d18fd 
							
						 
					 
					
						
						
							
							* Route token.string via token.txt_with_ws, to deprecate token.string in future  
						
						 
						
						
						
					 
					
						2016-01-16 17:14:34 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							03e8a4293d 
							
						 
					 
					
						
						
							
							* Add loop guard to Token.lefts and Token.rights properties  
						
						 
						
						
						
					 
					
						2016-01-16 16:18:17 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ab5aac5b2f 
							
						 
					 
					
						
						
							
							* Add .rank property to Token and Lexeme, for frequency rank  
						
						 
						
						
						
					 
					
						2015-11-08 16:18:25 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							68f479e821 
							
						 
					 
					
						
						
							
							* Rename Doc.data to Doc.c  
						
						 
						
						
						
					 
					
						2015-11-04 00:15:14 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1e99fcd413 
							
						 
					 
					
						
						
							
							* Rename .repvec to .vector in C API  
						
						 
						
						
						
					 
					
						2015-11-03 23:47:59 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6161d2529a 
							
						 
					 
					
						
						
							
							Merge branch 'master' of ssh://github.com/honnibal/spaCy  
						
						 
						
						
						
					 
					
						2015-11-03 13:36:30 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f7dd377575 
							
						 
					 
					
						
						
							
							* Adjust conjuncts iterator in Token  
						
						 
						
						
						
					 
					
						2015-11-03 13:23:22 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andreas Grivas 
							
						 
					 
					
						
						
						
						
							
						
						
							d418f00eb1 
							
						 
					 
					
						
						
							
							fixed error when printing unicode  
						
						 
						
						
						
					 
					
						2015-11-02 20:23:18 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andreas Grivas 
							
						 
					 
					
						
						
						
						
							
						
						
							93ada458e2 
							
						 
					 
					
						
						
							
							added __repr__ that prints text in ipython for doc, token, and span objects  
						
						 
						
						
						
					 
					
						2015-10-21 14:11:46 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9839cd2c0b 
							
						 
					 
					
						
						
							
							* Fix whitespace_ calculation in Token  
						
						 
						
						
						
					 
					
						2015-10-18 17:21:11 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6e0f985afc 
							
						 
					 
					
						
						
							
							* Fix token.conjuncts  
						
						 
						
						
						
					 
					
						2015-10-15 03:49:45 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e0104ac81 
							
						 
					 
					
						
						
							
							* Fix token.conjuncts  
						
						 
						
						
						
					 
					
						2015-10-15 03:47:45 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b8f3345a82 
							
						 
					 
					
						
						
							
							* Fix token.conjuncts method  
						
						 
						
						
						
					 
					
						2015-10-15 03:36:01 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							23818f89b8 
							
						 
					 
					
						
						
							
							* Fix token.conjuncts method  
						
						 
						
						
						
					 
					
						2015-10-15 03:34:57 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							94bafc1417 
							
						 
					 
					
						
						
							
							* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS  
						
						 
						
						
						
					 
					
						2015-10-10 17:57:29 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f7283a5067 
							
						 
					 
					
						
						
							
							* Fix vectors bugs for OOV words  
						
						 
						
						
						
					 
					
						2015-09-22 02:10:25 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44aecba701 
							
						 
					 
					
						
						
							
							* Fix Token.has_vector and Lexeme.has_vector  
						
						 
						
						
						
					 
					
						2015-09-22 01:43:16 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							596fde8daa 
							
						 
					 
					
						
						
							
							* Add has_vector attribute to Token and Lexeme  
						
						 
						
						
						
					 
					
						2015-09-21 19:52:43 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f32927efbf 
							
						 
					 
					
						
						
							
							* Raise exceptions if attempt to access parse, but data is not installed. This partly but not fully addresses Issue  #97 . Still need exceptions on the various Token attributes that access the parse tree, e.g. token.head, token.lefts, token.rights, etc. Exceptions should be centralized, too.  
						
						 
						
						
						
					 
					
						2015-09-21 18:35:40 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							388062ae01 
							
						 
					 
					
						
						
							
							* Fix repvec_length problem  
						
						 
						
						
						
					 
					
						2015-09-21 18:10:51 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							193f127f81 
							
						 
					 
					
						
						
							
							* Fix ugly py_check_flag and py_set_flag functions in Lexeme  
						
						 
						
						
						
					 
					
						2015-09-15 13:06:18 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							65dc0d1dfb 
							
						 
					 
					
						
						
							
							* Extend word vectors support, with .similarity() function, vector_norm property, and rename repvec to vector. Keep repvec name as well for now for backwards compatibility.  
						
						 
						
						
						
					 
					
						2015-09-14 17:49:58 +10:00