Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1759abf1e5 
							
						 
					 
					
						
						
							
							Fix bug in sentence starts for non-projective parses  
						
						... 
						
						
						
						The set_children_from_heads function assumed parse trees were
projective. However, non-projective parses may be passed in during
deserialization, or after deprojectivising. This caused incorrect
sentence boundaries to be set for non-projective parses. Close  #2772 . 
						
					 
					
						2018-09-19 14:50:06 +02:00 
						 
				 
			
				
					
						
							
							
								Grivaz 
							
						 
					 
					
						
						
						
						
							
						
						
							aeba99ab0d 
							
						 
					 
					
						
						
							
							Introduces a bulk merge function, in order to solve issue  #653  ( #2696 )  
						
						... 
						
						
						
						* Fix comment
* Introduce bulk merge to increase performance on many span merges
* Sign contributor agreement
* Implement pull request suggestions 
						
					 
					
						2018-09-10 16:41:42 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3c30d1763c 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2018-07-21 15:34:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e0caf3ae8c 
							
						 
					 
					
						
						
							
							Fix msgpack for new version  
						
						
						
					 
					
						2018-07-20 17:32:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9db77fd914 
							
						 
					 
					
						
						
							
							Fix deserialization for msgpack  
						
						
						
					 
					
						2018-07-20 14:11:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							cae4457c38 
							
						 
					 
					
						
						
							
							💫  Add .similarity warnings for no vectors and option to exclude warnings ( #2197 )  
						
						... 
						
						
						
						* Add logic to filter out warning IDs via environment variable
Usage: SPACY_WARNING_EXCLUDE=W001,W007
* Add warnings for empty vectors
* Add warning if no word vectors are used in .similarity methods
For example, if only tensors are available in small models – should hopefully clear up some confusion around this
* Capture warnings in tests
* Rename SPACY_WARNING_EXCLUDE to SPACY_WARNING_IGNORE 
						
					 
					
						2018-05-21 01:22:38 +02:00 
						 
				 
			
				
					
						
							
							
								Mr Roboto 
							
						 
					 
					
						
						
						
						
							
						
						
							6f5ccda19c 
							
						 
					 
					
						
						
							
							Addresses Issue  #2228  - Deserialization fails when using tensor=False or sentiment=False ( #2230 )  
						
						... 
						
						
						
						* Fixes issue #2228 
* Adds a new contributor 
						
					 
					
						2018-05-01 13:40:22 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1c6d77610c 
							
						 
					 
					
						
						
							
							Add remove_extension method on Doc, Token and Span ( closes   #2242 )  
						
						
						
					 
					
						2018-04-28 23:33:09 +02:00 
						 
				 
			
				
					
						
							
							
								Suraj Rajan 
							
						 
					 
					
						
						
						
						
							
						
						
							5957f15227 
							
						 
					 
					
						
						
							
							Fixed typos for #2222,#2223 ( #2233 ) ( closes   #2222 ,  closes   #2223 )  
						
						
						
					 
					
						2018-04-18 14:55:26 -07:00 
						 
				 
			
				
					
						
							
							
								Xiaoquan Kong 
							
						 
					 
					
						
						
						
						
							
						
						
							e2f13ec722 
							
						 
					 
					
						
						
							
							bugfix: Doc.noun_chunks call Doc.noun_chunks_iterator without checking ( closes   #2194 )  
						
						
						
					 
					
						2018-04-08 23:44:05 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e5f47cd82d 
							
						 
					 
					
						
						
							
							Update errors  
						
						
						
					 
					
						2018-04-03 21:40:29 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							62b4b527d7 
							
						 
					 
					
						
						
							
							Don't raise error if set_extension has getter and setter ( closes   #2177 )  
						
						... 
						
						
						
						Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests. 
						
					 
					
						2018-04-03 18:30:17 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ee3082ad29 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2018-04-03 18:29:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3141e04822 
							
						 
					 
					
						
						
							
							💫  New system for error messages and warnings ( #2163 )  
						
						... 
						
						
						
						* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None 
						
					 
					
						2018-04-03 15:50:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							abf8b16d71 
							
						 
					 
					
						
						
							
							Add doc.retokenize() context manager ( #2172 )  
						
						... 
						
						
						
						This patch takes a step towards #1487  by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.
The idea is to do merging and splitting like this:
with doc.retokenize() as retokenizer:
    for start, end, label in matches:
        retokenizer.merge(doc[start : end], attrs={'ent_type': label})
The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.
A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.
The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.
We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager. 
						
					 
					
						2018-04-03 14:10:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0b375d50c8 
							
						 
					 
					
						
						
							
							Fix ent_iob tags in doc.merge to avoid inconsistent sequences  
						
						
						
					 
					
						2018-03-28 18:39:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e807f88410 
							
						 
					 
					
						
						
							
							Resolve merge when cherry-picking ent iob patches from develop  
						
						
						
					 
					
						2018-03-28 18:38:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99fbc7db33 
							
						 
					 
					
						
						
							
							Improve error message when entity sequence is inconsistent  
						
						
						
					 
					
						2018-03-28 18:36:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9e83513004 
							
						 
					 
					
						
						
							
							Add position of invalid token to error message  
						
						
						
					 
					
						2018-03-27 23:56:59 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							693971dd8f 
							
						 
					 
					
						
						
							
							Improve error message if token text is empty string (see  #2101 )  
						
						
						
					 
					
						2018-03-27 22:25:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0c829e6605 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2018-03-27 22:20:59 +02:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							515e25910e 
							
						 
					 
					
						
						
							
							fix sent_start in serialization  
						
						
						
					 
					
						2018-01-28 19:50:42 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							56164ab688 
							
						 
					 
					
						
						
							
							Set l_edge and r_edge correctly for non-projective parses.  Fixes   #1799  
						
						
						
					 
					
						2018-01-22 20:18:04 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ccb51a9f36 
							
						 
					 
					
						
						
							
							Make .similarity() return 1.0 if all orth attrs match  
						
						
						
					 
					
						2018-01-15 16:29:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ab7c45b12d 
							
						 
					 
					
						
						
							
							Fix error message and handling of doc.sents  
						
						
						
					 
					
						2018-01-15 15:21:11 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e10e9ad2c5 
							
						 
					 
					
						
						
							
							Improve efficiency of Doc.to_array  
						
						
						
					 
					
						2017-11-23 12:33:27 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fa62427300 
							
						 
					 
					
						
						
							
							Remove lookup-based lemmatization  
						
						
						
					 
					
						2017-11-23 12:32:22 +00:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1c218397f6 
							
						 
					 
					
						
						
							
							Ensure path in Doc.to_disk/from_disk (resolves ##1521)  
						
						... 
						
						
						
						Also add Doc serialization tests with both Path and string path options 
						
					 
					
						2017-11-09 02:29:03 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							144a93c2a5 
							
						 
					 
					
						
						
							
							Back-off to tensor for similarity if no vectors  
						
						
						
					 
					
						2017-11-03 20:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62ed58935a 
							
						 
					 
					
						
						
							
							Add Doc.extend_tensor() method  
						
						
						
					 
					
						2017-11-03 11:20:31 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9659391944 
							
						 
					 
					
						
						
							
							Update deprecated methods and add warnings  
						
						
						
					 
					
						2017-11-01 16:49:42 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							705a4e3e4a 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-11-01 16:44:08 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7e7116cdf7 
							
						 
					 
					
						
						
							
							Fix Doc.to_array when only one string attr provided  
						
						
						
					 
					
						2017-11-01 13:26:43 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							544a407b93 
							
						 
					 
					
						
						
							
							Tidy up Doc, Token and Span and add missing docs  
						
						
						
					 
					
						2017-10-27 17:07:26 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6a0483b7aa 
							
						 
					 
					
						
						
							
							Tidy up and document Doc, Token and Span  
						
						
						
					 
					
						2017-10-27 15:41:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ccd2ab1a62 
							
						 
					 
					
						
						
							
							Merge pull request  #1443  from ramananbalakrishnan/develop-get-lca-matrix  
						
						... 
						
						
						
						Add LCA matrix for spans and docs 
						
					 
					
						2017-10-24 11:22:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d2fe56a577 
							
						 
					 
					
						
						
							
							Add LCA matrix for spans and docs  
						
						
						
					 
					
						2017-10-20 23:58:00 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0726946563 
							
						 
					 
					
						
						
							
							cleanup to_array implementation using fixes on master  
						
						
						
					 
					
						2017-10-20 17:09:37 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b3ab124fc5 
							
						 
					 
					
						
						
							
							Support strings for attribute list in doc.to_array  
						
						
						
					 
					
						2017-10-20 11:46:57 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7b9b1be44c 
							
						 
					 
					
						
						
							
							Support single value for attribute list in doc.to_array  
						
						
						
					 
					
						2017-10-19 17:00:41 +05:30 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							394633efce 
							
						 
					 
					
						
						
							
							Make doc pickling support hooks  
						
						
						
					 
					
						2017-10-17 19:44:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cdb0c426d8 
							
						 
					 
					
						
						
							
							Improve deserialization of user_data, esp. for Underscore  
						
						
						
					 
					
						2017-10-17 19:29:20 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							32a8564c79 
							
						 
					 
					
						
						
							
							Fix doc pickling  
						
						
						
					 
					
						2017-10-17 18:20:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							92c1eb2d6f 
							
						 
					 
					
						
						
							
							Fix Doc pickling. This also removes need for Binder class  
						
						
						
					 
					
						2017-10-17 16:11:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a002264fec 
							
						 
					 
					
						
						
							
							Remove caching of Token in Doc, as caused cycle.  
						
						
						
					 
					
						2017-10-16 19:34:21 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e0ff145a8b 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/dot-underscore  
						
						
						
					 
					
						2017-10-11 11:57:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3b527fa52b 
							
						 
					 
					
						
						
							
							Call morphology.assign_untagged when pushing token to Doc  
						
						
						
					 
					
						2017-10-11 03:23:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e0a9b02b67 
							
						 
					 
					
						
						
							
							Merge Span._ and Span.as_doc methods  
						
						
						
					 
					
						2017-10-09 22:00:15 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e938bce320 
							
						 
					 
					
						
						
							
							Adjust parsing transition system to allow preset sentence segments.  
						
						
						
					 
					
						2017-10-08 23:53:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							668a0ea640 
							
						 
					 
					
						
						
							
							Pass extensions into Underscore class  
						
						
						
					 
					
						2017-10-07 18:56:01 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2480f8f521 
							
						 
					 
					
						
						
							
							Add missing return in Doc.from_disk() ( closes   #1330 )  
						
						
						
					 
					
						2017-09-18 15:32:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							03b5b9727a 
							
						 
					 
					
						
						
							
							Fix Doc.vector for empty doc objects  
						
						
						
					 
					
						2017-08-22 19:52:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0551b7b03a 
							
						 
					 
					
						
						
							
							Fix doc.vector  
						
						
						
					 
					
						2017-08-22 19:46:52 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8b7ac77c23 
							
						 
					 
					
						
						
							
							Allow span label to be string in Doc.char_span  
						
						
						
					 
					
						2017-08-19 16:18:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							80236116a6 
							
						 
					 
					
						
						
							
							Add Doc.char_span method, to get a span by character offset  
						
						
						
					 
					
						2017-08-19 12:21:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6a2159969 
							
						 
					 
					
						
						
							
							Add slot for text categories to Doc  
						
						
						
					 
					
						2017-07-22 00:34:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2a3bd5ee90 
							
						 
					 
					
						
						
							
							Fix fetching of noun chunk iterator  
						
						
						
					 
					
						2017-06-04 15:53:05 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							92ae36f84e 
							
						 
					 
					
						
						
							
							Improve way noun chunks iterator is looked up  
						
						
						
					 
					
						2017-06-04 21:53:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							675f448313 
							
						 
					 
					
						
						
							
							Fix vector linkage on Doc  
						
						
						
					 
					
						2017-06-04 14:25:30 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							459a1e8470 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2017-06-03 11:31:18 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5109bba910 
							
						 
					 
					
						
						
							
							Port over fix from  #1070  
						
						
						
					 
					
						2017-06-03 11:31:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							498ad85309 
							
						 
					 
					
						
						
							
							Try using tensor for vector/similarity methdos  
						
						
						
					 
					
						2017-05-30 23:35:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ddff020c3 
							
						 
					 
					
						
						
							
							Fix compile error  
						
						
						
					 
					
						2017-05-28 23:30:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6d3caeadd2 
							
						 
					 
					
						
						
							
							Fix type check for long  
						
						
						
					 
					
						2017-05-28 23:22:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7996d21717 
							
						 
					 
					
						
						
							
							Fixes for new StringStore  
						
						
						
					 
					
						2017-05-28 11:09:27 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fe11564b8e 
							
						 
					 
					
						
						
							
							Finish stringstore change. Also xfail vectors tests  
						
						
						
					 
					
						2017-05-28 15:10:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							84e66ca6d4 
							
						 
					 
					
						
						
							
							WIP on stringstore change. 27 failures  
						
						
						
					 
					
						2017-05-28 14:06:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66088851dc 
							
						 
					 
					
						
						
							
							Add Doc.to_disk() and Doc.from_disk() methods  
						
						
						
					 
					
						2017-05-24 11:58:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d44b1eafc4 
							
						 
					 
					
						
						
							
							Fix conflict artefacts  
						
						
						
					 
					
						2017-05-23 18:47:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d68dd1f251 
							
						 
					 
					
						
						
							
							Add SENT_START attribute, for custom sentence boundary detection  
						
						
						
					 
					
						2017-05-23 18:37:58 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							23f9a3ccc8 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc  
						
						
						
					 
					
						2017-05-19 18:47:39 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8455cb1327 
							
						 
					 
					
						
						
							
							Update docstring for Doc.__getitem__  
						
						
						
					 
					
						2017-05-19 00:30:51 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b687ad109d 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc class  
						
						
						
					 
					
						2017-05-18 23:59:44 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b87066ff10 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc class  
						
						
						
					 
					
						2017-05-18 22:17:41 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9d85cda8e4 
							
						 
					 
					
						
						
							
							Fix models error message and use about.__docs_models__ (see  #1051 )  
						
						
						
					 
					
						2017-05-13 13:05:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6b942763f0 
							
						 
					 
					
						
						
							
							Tidy up imports  
						
						
						
					 
					
						2017-05-13 13:04:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b9dea345e5 
							
						 
					 
					
						
						
							
							Remove old import  
						
						
						
					 
					
						2017-05-13 12:32:11 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							293ee359c5 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-05-13 12:32:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ee1d35bdb0 
							
						 
					 
					
						
						
							
							Fix merge conflict  
						
						
						
					 
					
						2017-05-13 03:20:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b2540d2379 
							
						 
					 
					
						
						
							
							Merge Kengz's tree_print patch  
						
						
						
					 
					
						2017-05-13 03:18:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4efb391994 
							
						 
					 
					
						
						
							
							Fix serializer  
						
						
						
					 
					
						2017-05-09 18:45:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1166b0c491 
							
						 
					 
					
						
						
							
							Implement Doc.to_bytes and Doc.from_bytes methods  
						
						
						
					 
					
						2017-05-09 18:11:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e167b7bb6 
							
						 
					 
					
						
						
							
							Strip serializer from code  
						
						
						
					 
					
						2017-05-09 17:28:50 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0739ae7b76 
							
						 
					 
					
						
						
							
							Tidy up and fix formatting and imports  
						
						
						
					 
					
						2017-04-15 13:05:15 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e71a1f4bd0 
							
						 
					 
					
						
						
							
							Fix download commands in error messages (see  #946 )  
						
						
						
					 
					
						2017-04-01 10:20:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							51882ee2b8 
							
						 
					 
					
						
						
							
							Fix check for setting ent_id in merge  
						
						
						
					 
					
						2017-03-31 19:32:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9720103428 
							
						 
					 
					
						
						
							
							Improve attribute handlign in doc.merge(). Still unsatisfying  
						
						
						
					 
					
						2017-03-31 13:59:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0fefdfcbda 
							
						 
					 
					
						
						
							
							Merge pull request  #935  from ericzhao28/master  
						
						... 
						
						
						
						Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862 ) 
						
					 
					
						2017-03-30 02:51:24 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Zhao 
							
						 
					 
					
						
						
						
						
							
						
						
							aafdf6ffb8 
							
						 
					 
					
						
						
							
							Add option to use label karg to determine ent_type in doc.merge  
						
						
						
					 
					
						2017-03-28 23:35:03 -07:00 
						 
				 
			
				
					
						
							
							
								Roman Inflianskas 
							
						 
					 
					
						
						
						
						
							
						
						
							66e1109b53 
							
						 
					 
					
						
						
							
							Add support for Universal Dependencies v2.0  
						
						
						
					 
					
						2017-03-03 13:17:34 +01:00 
						 
				 
			
				
					
						
							
							
								Matvey Ezhov 
							
						 
					 
					
						
						
						
						
							
						
						
							32a22291bc 
							
						 
					 
					
						
						
							
							Small Doc.count_by documentation update  
						
						... 
						
						
						
						Current example doesn't work 
						
					 
					
						2017-01-31 19:18:45 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6c665b81df 
							
						 
					 
					
						
						
							
							Fix redundant == TAG in from_array conditional  
						
						
						
					 
					
						2017-01-31 00:46:21 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44e2b0100d 
							
						 
					 
					
						
						
							
							Support TAG attribute in doc.from_array  
						
						
						
					 
					
						2017-01-10 22:47:07 +01:00 
						 
				 
			
				
					
						
							
							
								kengz 
							
						 
					 
					
						
						
						
						
							
						
						
							73a38bd4d1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/master'  
						
						
						
					 
					
						2016-12-30 12:19:59 -05:00 
						 
				 
			
				
					
						
							
							
								kengz 
							
						 
					 
					
						
						
						
						
							
						
						
							da44183ae1 
							
						 
					 
					
						
						
							
							move parse_tree logic to a new tokens/printers.py file  
						
						
						
					 
					
						2016-12-30 12:19:18 -05:00 
						 
				 
			
				
					
						
							
							
								Pokey Rule 
							
						 
					 
					
						
						
						
						
							
						
						
							3e3bda142d 
							
						 
					 
					
						
						
							
							Add noun_chunks to Span  
						
						
						
					 
					
						2016-11-24 10:47:20 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1fb09c3dc1 
							
						 
					 
					
						
						
							
							Fix morphology tagger  
						
						
						
					 
					
						2016-11-04 19:19:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f292f7f0e6 
							
						 
					 
					
						
						
							
							Fix Issue  #599 , by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.  
						
						
						
					 
					
						2016-11-02 23:48:43 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e7af6b937f 
							
						 
					 
					
						
						
							
							Fix syntax error while fixing doc strings  
						
						
						
					 
					
						2016-11-01 13:27:32 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b86f8af0c1 
							
						 
					 
					
						
						
							
							Fix doc strings  
						
						
						
					 
					
						2016-11-01 12:25:36 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ca31b4d87 
							
						 
					 
					
						
						
							
							Fix clobbering of 'missing' named ent values after assigning ents.  
						
						
						
					 
					
						2016-10-26 13:13:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							15c9b59f0e 
							
						 
					 
					
						
						
							
							Fix Issue  #461 : O tag was being clobbered by doc.ents.__set__  
						
						
						
					 
					
						2016-10-23 15:50:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c3a67b693 
							
						 
					 
					
						
						
							
							Fix calculation of vector norm, re Issue  #522 . Need to consolidate the calculations into a helper function.  
						
						
						
					 
					
						2016-10-23 14:49:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3588a18fb8 
							
						 
					 
					
						
						
							
							Fix hook names in doc  
						
						
						
					 
					
						2016-10-19 21:15:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d5742b773 
							
						 
					 
					
						
						
							
							Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.  
						
						
						
					 
					
						2016-10-19 20:54:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9b60186266 
							
						 
					 
					
						
						
							
							Fix doc class  
						
						
						
					 
					
						2016-10-17 15:23:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b67697a97b 
							
						 
					 
					
						
						
							
							Improve API for doc.merge() and span.merge(), to use keyword arguments.  
						
						
						
					 
					
						2016-10-17 14:02:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fbb7f3f15c 
							
						 
					 
					
						
						
							
							Add user_data attribute to Doc object.  
						
						
						
					 
					
						2016-10-17 11:43:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62230dd13a 
							
						 
					 
					
						
						
							
							Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring  
						
						
						
					 
					
						2016-10-17 02:42:51 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							311a985fe0 
							
						 
					 
					
						
						
							
							Add input error handling in Doc  
						
						
						
					 
					
						2016-10-16 18:16:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							06322ba99d 
							
						 
					 
					
						
						
							
							Add words and spaces keyword arguments to Doc.  
						
						
						
					 
					
						2016-10-16 18:13:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6736977d82 
							
						 
					 
					
						
						
							
							Revert "Changes to Doc and Token for new string store scheme"  
						
						... 
						
						
						
						This reverts commit 99de44d864 
						
					 
					
						2016-09-30 20:11:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99de44d864 
							
						 
					 
					
						
						
							
							Changes to Doc and Token for new string store scheme  
						
						
						
					 
					
						2016-09-30 20:00:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d3dc5718b2 
							
						 
					 
					
						
						
							
							Fix syntax error in Doc  
						
						
						
					 
					
						2016-09-28 11:39:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1b520e7bab 
							
						 
					 
					
						
						
							
							Improve docstrings for Doc object  
						
						
						
					 
					
						2016-09-28 11:15:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc4a7ad794 
							
						 
					 
					
						
						
							
							Test and fix Issue  #411 : IndexError when .sents property is used on empty string.  
						
						
						
					 
					
						2016-09-27 18:49:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							15e42a1ba9 
							
						 
					 
					
						
						
							
							Allow entities to be set by Span, or by 4-tuple (with entity ID)  
						
						
						
					 
					
						2016-09-24 01:17:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2735b6247b 
							
						 
					 
					
						
						
							
							Fix orths_and_spaces in Doc.__init__  
						
						
						
					 
					
						2016-09-21 14:52:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cdc10e9a1c 
							
						 
					 
					
						
						
							
							* Fix Issue  #375 : noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work.  
						
						
						
					 
					
						2016-05-20 10:14:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d86c30f0b 
							
						 
					 
					
						
						
							
							* Fix Issue  #367 : Missing has_vector property on Doc and Span objects  
						
						
						
					 
					
						2016-05-09 12:36:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							76021cb853 
							
						 
					 
					
						
						
							
							* Fix bug in Doc.text, introduced by  a862edc 
						
						
						
					 
					
						2016-05-04 11:02:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							29a114e645 
							
						 
					 
					
						
						
							
							* Don't assign 0-valued tags in Doc.from_array  
						
						
						
					 
					
						2016-05-02 16:07:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							508fd1f6dc 
							
						 
					 
					
						
						
							
							* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.  
						
						
						
					 
					
						2016-05-02 14:25:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							872695759d 
							
						 
					 
					
						
						
							
							Merge pull request  #306  from wbwseeker/german_noun_chunks  
						
						... 
						
						
						
						add German noun chunk functionality 
						
					 
					
						2016-04-08 00:54:24 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ad119c074f 
							
						 
					 
					
						
						
							
							* Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics.  
						
						
						
					 
					
						2016-03-29 13:02:42 +11:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							d65ef41d08 
							
						 
					 
					
						
						
							
							make error messages language independent  
						
						
						
					 
					
						2016-03-24 11:47:09 +01:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							5e2e8e951a 
							
						 
					 
					
						
						
							
							add baseclass DocIterator for iterators over documents  
						
						... 
						
						
						
						add classes for English and German noun chunks
the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model 
						
					 
					
						2016-03-16 15:53:35 +01:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							03fb498dbe 
							
						 
					 
					
						
						
							
							introduce lang field for LexemeC to hold language id  
						
						... 
						
						
						
						put noun_chunk logic into iterators.py for each language separately 
						
					 
					
						2016-03-10 13:01:34 +01:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							d9312bc9ea 
							
						 
					 
					
						
						
							
							add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators  
						
						
						
					 
					
						2016-03-09 16:18:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							af8514cb0c 
							
						 
					 
					
						
						
							
							* Refine the way the is_parsed attribute is set by from_array  
						
						
						
					 
					
						2016-02-06 14:44:35 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6bb007d16e 
							
						 
					 
					
						
						
							
							* Make set_parse nogil  
						
						
						
					 
					
						2016-01-30 20:27:52 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f24833d607 
							
						 
					 
					
						
						
							
							* Fix merge for coordinations  
						
						
						
					 
					
						2016-01-18 16:03:19 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc8f26584a 
							
						 
					 
					
						
						
							
							* Don't consider NPs connected to parse via conj relation as noun chunks. Change motivated by the nested noun chunks identified in Issue  #203 , but might be problematic. Also allow root NPs to be considered noun chunks.  
						
						
						
					 
					
						2016-01-16 17:52:40 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							54a98eaf19 
							
						 
					 
					
						
						
							
							* Fix typo text_wth_ws --> text_with_ws. Reroute .string attribute to text_with_ws, to deprecate .string in future  
						
						
						
					 
					
						2016-01-16 17:13:50 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a9b612abdf 
							
						 
					 
					
						
						
							
							* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient  
						
						
						
					 
					
						2015-11-07 09:01:12 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							56499d89ef 
							
						 
					 
					
						
						
							
							* Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient  
						
						
						
					 
					
						2015-11-07 08:55:34 +11:00 
						 
				 
			
				
					
						
							
							
								Andreas Grivas 
							
						 
					 
					
						
						
						
						
							
						
						
							562db6d2d0 
							
						 
					 
					
						
						
							
							* merge add lex last - add index finder funcs  
						
						
						
					 
					
						2015-11-07 07:57:04 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							68f479e821 
							
						 
					 
					
						
						
							
							* Rename Doc.data to Doc.c  
						
						
						
					 
					
						2015-11-04 00:15:14 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9482d616bc 
							
						 
					 
					
						
						
							
							* Rename spans.pyx to span.pyx  
						
						
						
					 
					
						2015-11-03 23:51:05 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							116da5990a 
							
						 
					 
					
						
						
							
							* Clean up setting of tag in doc.from_bytes  
						
						
						
					 
					
						2015-11-03 23:48:57 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1e99fcd413 
							
						 
					 
					
						
						
							
							* Rename .repvec to .vector in C API  
						
						
						
					 
					
						2015-11-03 23:47:59 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e37437ba8 
							
						 
					 
					
						
						
							
							* Fix assign_tag in doc.merge  
						
						
						
					 
					
						2015-11-03 19:07:02 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							833eb35c57 
							
						 
					 
					
						
						
							
							* Fix tag assignment in doc.from_array  
						
						
						
					 
					
						2015-11-03 18:45:54 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							09664177d7 
							
						 
					 
					
						
						
							
							* Fix tag handling in doc.merge, and assign sent_start when setting heads.  
						
						
						
					 
					
						2015-11-03 18:15:52 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							604ceac4c6 
							
						 
					 
					
						
						
							
							* Fix morphological assignment in doc.merge()  
						
						
						
					 
					
						2015-11-03 17:57:51 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5e040855a5 
							
						 
					 
					
						
						
							
							* Ensure morphological features and lemmas are loaded in from_array, re Issue  #152  
						
						
						
					 
					
						2015-11-03 17:56:50 +11:00 
						 
				 
			
				
					
						
							
							
								Andreas Grivas 
							
						 
					 
					
						
						
						
						
							
						
						
							d418f00eb1 
							
						 
					 
					
						
						
							
							fixed error when printing unicode  
						
						
						
					 
					
						2015-11-02 20:23:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							52fc338001 
							
						 
					 
					
						
						
							
							* Set is_parsed and is_tagged attrs when loading annotations into Doc, re Issue  #152  
						
						
						
					 
					
						2015-10-28 10:43:22 +11:00 
						 
				 
			
				
					
						
							
							
								Andreas Grivas 
							
						 
					 
					
						
						
						
						
							
						
						
							93ada458e2 
							
						 
					 
					
						
						
							
							added __repr__ that prints text in ipython for doc, token, and span objects  
						
						
						
					 
					
						2015-10-21 14:11:46 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							135062d23c 
							
						 
					 
					
						
						
							
							* Fix error with merged text when merged region did not have trailing whitespace  
						
						
						
					 
					
						2015-10-19 15:47:04 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a7e6c5ac8f 
							
						 
					 
					
						
						
							
							* Fix Issue  #122 : Incorrect calculation of children after Doc.merge()  
						
						
						
					 
					
						2015-10-18 17:17:27 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							94bafc1417 
							
						 
					 
					
						
						
							
							* Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS  
						
						
						
					 
					
						2015-10-10 17:57:29 +11:00 
						 
				 
			
				
					
						
							
							
								Yubing (Tom) Dong 
							
						 
					 
					
						
						
						
						
							
						
						
							0f601b8b75 
							
						 
					 
					
						
						
							
							Update docstring of Doc.__getitem__  
						
						
						
					 
					
						2015-10-07 01:27:28 -07:00 
						 
				 
			
				
					
						
							
							
								Yubing (Tom) Dong 
							
						 
					 
					
						
						
						
						
							
						
						
							3fd3bc79aa 
							
						 
					 
					
						
						
							
							Refactor to remove duplicate slicing logic  
						
						
						
					 
					
						2015-10-07 01:25:35 -07:00 
						 
				 
			
				
					
						
							
							
								Yubing (Tom) Dong 
							
						 
					 
					
						
						
						
						
							
						
						
							2fc33e8024 
							
						 
					 
					
						
						
							
							Allow step=1 when slicing a Doc  
						
						
						
					 
					
						2015-10-06 00:57:05 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ab694b0364 
							
						 
					 
					
						
						
							
							* Fix open-bounded slice indices.  
						
						
						
					 
					
						2015-09-29 23:03:09 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f7283a5067 
							
						 
					 
					
						
						
							
							* Fix vectors bugs for OOV words  
						
						
						
					 
					
						2015-09-22 02:10:25 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f32927efbf 
							
						 
					 
					
						
						
							
							* Raise exceptions if attempt to access parse, but data is not installed. This partly but not fully addresses Issue  #97 . Still need exceptions on the various Token attributes that access the parse tree, e.g. token.head, token.lefts, token.rights, etc. Exceptions should be centralized, too.  
						
						
						
					 
					
						2015-09-21 18:35:40 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							77856c4fcd 
							
						 
					 
					
						
						
							
							* Try giving Doc and Span objects vector and vector_norm attributes, and .similarity functions. Turns out to be bad idea.  
						
						
						
					 
					
						2015-09-17 11:50:11 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							60c26b2dfa 
							
						 
					 
					
						
						
							
							* Fix slicing when start or stop is None  
						
						
						
					 
					
						2015-09-15 14:43:10 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							65dc0d1dfb 
							
						 
					 
					
						
						
							
							* Extend word vectors support, with .similarity() function, vector_norm property, and rename repvec to vector. Keep repvec name as well for now for backwards compatibility.  
						
						
						
					 
					
						2015-09-14 17:49:58 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c08f10083c 
							
						 
					 
					
						
						
							
							* Add test and test_with_ws attributes.  
						
						
						
					 
					
						2015-09-13 10:27:42 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e7bfe8449 
							
						 
					 
					
						
						
							
							* Fix space at end of merged token  
						
						
						
					 
					
						2015-09-10 14:45:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							31ccf494e6 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/honnibal/spaCy  into develop  
						
						
						
					 
					
						2015-09-09 14:33:38 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							07686470a9 
							
						 
					 
					
						
						
							
							* Don't consider a coordinated NP a base chunk  
						
						
						
					 
					
						2015-09-09 14:32:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0e24d099a1 
							
						 
					 
					
						
						
							
							* Fix L/R edge bug, by ensuring l_edge and r_edge are preset, and fixing the way the edge update in del_arc. Bugs keep arising here because the edges are absolute positions, where everything else is relative. I'm also not 100% convinced that del_arc is handled correctly. Do we need to update the parents?  
						
						
						
					 
					
						2015-09-09 03:40:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							86c888667f 
							
						 
					 
					
						
						
							
							* Merge in changes from de branch  
						
						
						
					 
					
						2015-09-06 19:49:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d2fc104a26 
							
						 
					 
					
						
						
							
							* Begin merge of Gazetteer and DE branches  
						
						
						
					 
					
						2015-09-06 19:45:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd1eeb3102 
							
						 
					 
					
						
						
							
							* Add POS attribute support in get_attr  
						
						
						
					 
					
						2015-09-06 04:13:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c2307fa9ee 
							
						 
					 
					
						
						
							
							* More work on language-generic parsing  
						
						
						
					 
					
						2015-08-28 02:02:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6f1743692a 
							
						 
					 
					
						
						
							
							* Work on language-independent refactoring  
						
						
						
					 
					
						2015-08-23 20:49:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b0f5c39084 
							
						 
					 
					
						
						
							
							* Fix handling of exclusion entities  
						
						
						
					 
					
						2015-08-06 17:28:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							10d869d102 
							
						 
					 
					
						
						
							
							* Don't allow conjunction between NPs in base NP chunks  
						
						
						
					 
					
						2015-08-06 16:31:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9c1724ecae 
							
						 
					 
					
						
						
							
							* Gazetteer stuff working, now need to wire up to API  
						
						
						
					 
					
						2015-08-06 00:35:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eb7138c761 
							
						 
					 
					
						
						
							
							* Add attr relation in base NP detection  
						
						
						
					 
					
						2015-08-01 00:34:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4988356cf0 
							
						 
					 
					
						
						
							
							* Fix dependency type bug from merged tokens  
						
						
						
					 
					
						2015-08-01 00:33:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							78a9068319 
							
						 
					 
					
						
						
							
							* Fix spacy attr on merged tokens  
						
						
						
					 
					
						2015-07-30 04:25:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							430e2edb96 
							
						 
					 
					
						
						
							
							* Fix noun_chunks issue  
						
						
						
					 
					
						2015-07-30 03:51:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							74d8cb3980 
							
						 
					 
					
						
						
							
							* Add noun_chunks iterator, and fix left/right child setting in Doc.merge  
						
						
						
					 
					
						2015-07-30 02:29:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b5132bed7d 
							
						 
					 
					
						
						
							
							* Set left and right children when loading parse from byte string  
						
						
						
					 
					
						2015-07-28 21:03:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aa7a964a4f 
							
						 
					 
					
						
						
							
							* Add a type declaration for doc.from_array  
						
						
						
					 
					
						2015-07-27 22:57:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2060935cdb 
							
						 
					 
					
						
						
							
							* Remove explicit bytes type in doc.from_bytes, to accept bytearray  
						
						
						
					 
					
						2015-07-24 04:54:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0bb839d299 
							
						 
					 
					
						
						
							
							* Fix string coercion for Python 3  
						
						
						
					 
					
						2015-07-24 03:49:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a0e36e8efc 
							
						 
					 
					
						
						
							
							* Add working to/from bytes API to Doc  
						
						
						
					 
					
						2015-07-23 01:14:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4d61239eac 
							
						 
					 
					
						
						
							
							* Reorganize the serialization functions on Doc  
						
						
						
					 
					
						2015-07-22 04:53:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8743a8c084 
							
						 
					 
					
						
						
							
							* Update Doc serialization for new Packer interface  
						
						
						
					 
					
						2015-07-20 01:38:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							317cbbc015 
							
						 
					 
					
						
						
							
							* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.  
						
						
						
					 
					
						2015-07-19 15:18:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6b13e7227c 
							
						 
					 
					
						
						
							
							* Remove duplicate get_lex_attr method from doc.pyx  
						
						
						
					 
					
						2015-07-18 22:46:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ced59ab9ea 
							
						 
					 
					
						
						
							
							* Make minor efficiency improvement in Doc.__iter__  
						
						
						
					 
					
						2015-07-18 04:10:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cf0c788892 
							
						 
					 
					
						
						
							
							* Tests passing on round-trip pack/unpack on basic example  
						
						
						
					 
					
						2015-07-17 21:20:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dfdf19f6a9 
							
						 
					 
					
						
						
							
							* Draft a from_orth method for Doc  
						
						
						
					 
					
						2015-07-17 16:39:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							db9dfd2e23 
							
						 
					 
					
						
						
							
							* Major refactor of serialization. Nearly complete now.  
						
						
						
					 
					
						2015-07-17 01:27:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6f401580d 
							
						 
					 
					
						
						
							
							* Add from_array function to Doc.  
						
						
						
					 
					
						2015-07-16 17:46:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e2133d990e 
							
						 
					 
					
						
						
							
							* Move serialization functionality out into a Serializer object  
						
						
						
					 
					
						2015-07-16 11:21:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01fab6bb90 
							
						 
					 
					
						
						
							
							* Improve de/serialize functions  
						
						
						
					 
					
						2015-07-16 01:26:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0e07c1ed2a 
							
						 
					 
					
						
						
							
							* draft de/serialization functions in doc.pyx  
						
						
						
					 
					
						2015-07-16 01:16:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9d956b07e9 
							
						 
					 
					
						
						
							
							* Fix import of attrs in doc.pyx, and update the get_token_attr function.  
						
						
						
					 
					
						2015-07-16 01:15:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							935ac53ee3 
							
						 
					 
					
						
						
							
							* Extend count_by method  
						
						
						
					 
					
						2015-07-14 03:20:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							81aa4e6dcc 
							
						 
					 
					
						
						
							
							* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API  
						
						
						
					 
					
						2015-07-14 00:10:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8214b74eec 
							
						 
					 
					
						
						
							
							* Restore _py_tokens cache, to handle orphan tokens.  
						
						
						
					 
					
						2015-07-13 22:28:10 +02:00