ines 
							
						 
					 
					
						
						
						
						
							
						
						
							62b4b527d7 
							
						 
					 
					
						
						
							
							Don't raise error if set_extension has getter and setter ( closes   #2177 )  
						
						... 
						
						
						
						Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests. 
						
					 
					
						2018-04-03 18:30:17 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ee3082ad29 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2018-04-03 18:29:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3141e04822 
							
						 
					 
					
						
						
							
							💫  New system for error messages and warnings ( #2163 )  
						
						... 
						
						
						
						* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None 
						
					 
					
						2018-04-03 15:50:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							abf8b16d71 
							
						 
					 
					
						
						
							
							Add doc.retokenize() context manager ( #2172 )  
						
						... 
						
						
						
						This patch takes a step towards #1487  by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.
The idea is to do merging and splitting like this:
with doc.retokenize() as retokenizer:
    for start, end, label in matches:
        retokenizer.merge(doc[start : end], attrs={'ent_type': label})
The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.
A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.
The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.
We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager. 
						
					 
					
						2018-04-03 14:10:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0b375d50c8 
							
						 
					 
					
						
						
							
							Fix ent_iob tags in doc.merge to avoid inconsistent sequences  
						
						
						
					 
					
						2018-03-28 18:39:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e807f88410 
							
						 
					 
					
						
						
							
							Resolve merge when cherry-picking ent iob patches from develop  
						
						
						
					 
					
						2018-03-28 18:38:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99fbc7db33 
							
						 
					 
					
						
						
							
							Improve error message when entity sequence is inconsistent  
						
						
						
					 
					
						2018-03-28 18:36:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9e83513004 
							
						 
					 
					
						
						
							
							Add position of invalid token to error message  
						
						
						
					 
					
						2018-03-27 23:56:59 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							693971dd8f 
							
						 
					 
					
						
						
							
							Improve error message if token text is empty string (see  #2101 )  
						
						
						
					 
					
						2018-03-27 22:25:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0c829e6605 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2018-03-27 22:20:59 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							de9fd091ac 
							
						 
					 
					
						
						
							
							Fix   #2014 : token.pos_ not writeable  
						
						
						
					 
					
						2018-03-27 21:21:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1f7229f40f 
							
						 
					 
					
						
						
							
							Revert "Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop"  
						
						... 
						
						
						
						This reverts commit c9ba3d3c2d92c26a35d4 
						
					 
					
						2018-03-27 19:23:02 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d2118792e7 
							
						 
					 
					
						
						
							
							Merge changes from master  
						
						
						
					 
					
						2018-03-27 13:38:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							63a267b34d 
							
						 
					 
					
						
						
							
							Fix   #2073 : Token.set_extension not working  
						
						
						
					 
					
						2018-03-27 13:36:20 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a3d0cb15d3 
							
						 
					 
					
						
						
							
							Fix ent_iob tags in doc.merge to avoid inconsistent sequences  
						
						
						
					 
					
						2018-03-26 07:16:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							514d89a3ae 
							
						 
					 
					
						
						
							
							Set missing label for non-specified entities when setting doc.ents  
						
						
						
					 
					
						2018-03-26 07:14:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							54d7a1c916 
							
						 
					 
					
						
						
							
							Improve error message when entity sequence is inconsistent  
						
						
						
					 
					
						2018-03-26 07:13:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8e08c378fe 
							
						 
					 
					
						
						
							
							Fix entity IOB and tag in span merging  
						
						
						
					 
					
						2018-03-25 22:16:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bede11b67c 
							
						 
					 
					
						
						
							
							Improve label management in parser and NER ( #2108 )  
						
						... 
						
						
						
						This patch does a few smallish things that tighten up the training workflow a little, and allow memory use during training to be reduced by letting the GoldCorpus stream data properly.
Previously, the parser and entity recognizer read and saved labels as lists, with extra labels noted separately. Lists were used becaue ordering is very important, to ensure that the label-to-class mapping is stable.
We now manage labels as nested dictionaries, first keyed by the action, and then keyed by the label. Values are frequencies. The trick is, how do we save new labels? We need to make sure we iterate over these in the same order they're added. Otherwise, we'll get different class IDs, and the model's predictions won't make sense.
To allow stable sorting, we map the new labels to negative values. If we have two new labels, they'll be noted as having "frequency" -1 and -2. The next new label will then have "frequency" -3. When we sort by (frequency, label), we then get a stable sort.
Storing frequencies then allows us to make the next nice improvement. Previously we had to iterate over the whole training set, to pre-process it for the deprojectivisation. This led to storing the whole training set in memory. This was most of the required memory during training.
To prevent this, we now store the frequencies as we stream in the data, and deprojectivize as we go. Once we've built the frequencies, we can then apply a frequency cut-off when we decide how many classes to make.
Finally, to allow proper data streaming, we also have to have some way of shuffling the iterator. This is awkward if the training files have multiple documents in them. To solve this, the GoldCorpus class now writes the training data to disk in msgpack files, one per document. We can then shuffle the data by shuffling the paths.
This is a squash merge, as I made a lot of very small commits. Individual commit messages below.
* Simplify label management for TransitionSystem and its subclasses
* Fix serialization for new label handling format in parser
* Simplify and improve GoldCorpus class. Reduce memory use, write to temp dir
* Set actions in transition system
* Require thinc 6.11.1.dev4
* Fix error in parser init
* Add unicode declaration
* Fix unicode declaration
* Update textcat test
* Try to get model training on less memory
* Print json loc for now
* Try rapidjson to reduce memory use
* Remove rapidjson requirement
* Try rapidjson for reduced mem usage
* Handle None heads when projectivising
* Stream json docs
* Fix train script
* Handle projectivity in GoldParse
* Fix projectivity handling
* Add minibatch_by_words util from ud_train
* Minibatch by number of words in spacy.cli.train
* Move minibatch_by_words util to spacy.util
* Fix label handling
* More hacking at label management in parser
* Fix encoding in msgpack serialization in GoldParse
* Adjust batch sizes in parser training
* Fix minibatch_by_words
* Add merge_subtokens function to pipeline.pyx
* Register merge_subtokens factory
* Restore use of msgpack tmp directory
* Use minibatch-by-words in train
* Handle retokenization in scorer
* Change back-off approach for missing labels. Use 'dep' label
* Update NER for new label management
* Set NER tags for over-segmented words
* Fix label alignment in gold
* Fix label back-off for infrequent labels
* Fix int type in labels dict key
* Fix int type in labels dict key
* Update feature definition for 8 feature set
* Update ud-train script for new label stuff
* Fix json streamer
* Print the line number if conll eval fails
* Update children and sentence boundaries after deprojectivisation
* Export set_children_from_heads from doc.pxd
* Render parses during UD training
* Remove print statement
* Require thinc 6.11.1.dev6. Try adding wheel as install_requires
* Set different dev version, to flush pip cache
* Update thinc version
* Update GoldCorpus docs
* Remove print statements
* Fix formatting and links [ci skip] 
						
					 
					
						2018-03-19 02:58:08 +01:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							fbf48b3f9f 
							
						 
					 
					
						
						
							
							lemma property to return hash instead of unicode  
						
						
						
					 
					
						2018-03-14 17:03:00 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a1be01185c 
							
						 
					 
					
						
						
							
							Fix array out of bounds error in Span  
						
						
						
					 
					
						2018-02-28 12:27:09 +01:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							8df9e52829 
							
						 
					 
					
						
						
							
							lemma property to return hash instead of unicode  
						
						
						
					 
					
						2018-02-27 19:50:01 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cf0e320f2b 
							
						 
					 
					
						
						
							
							Add doc.is_sentenced attribute, re  #1959  
						
						
						
					 
					
						2018-02-18 14:16:55 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1e5aeb4eec 
							
						 
					 
					
						
						
							
							Merge pull request  #1987  from thomasopsomer/span-sent  
						
						... 
						
						
						
						Make span.sent work when only manual / custom sbd 
						
					 
					
						2018-02-18 14:05:37 +01:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							deab391cbf 
							
						 
					 
					
						
						
							
							correct check on sent_start & raise if no boundaries  
						
						
						
					 
					
						2018-02-15 16:58:30 +01:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							b902731313 
							
						 
					 
					
						
						
							
							Find span sentence when only sentence boundaries (no parser)  
						
						
						
					 
					
						2018-02-14 22:18:54 +01:00 
						 
				 
			
				
					
						
							
							
								4altinok 
							
						 
					 
					
						
						
						
						
							
						
						
							ca8728035d 
							
						 
					 
					
						
						
							
							added new lex feat to token  
						
						
						
					 
					
						2018-02-11 18:55:48 +01:00 
						 
				 
			
				
					
						
							
							
								Thomas Opsomer 
							
						 
					 
					
						
						
						
						
							
						
						
							515e25910e 
							
						 
					 
					
						
						
							
							fix sent_start in serialization  
						
						
						
					 
					
						2018-01-28 19:50:42 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							56164ab688 
							
						 
					 
					
						
						
							
							Set l_edge and r_edge correctly for non-projective parses.  Fixes   #1799  
						
						
						
					 
					
						2018-01-22 20:18:04 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ccb51a9f36 
							
						 
					 
					
						
						
							
							Make .similarity() return 1.0 if all orth attrs match  
						
						
						
					 
					
						2018-01-15 16:29:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b904d81e9a 
							
						 
					 
					
						
						
							
							Fix rich comparison against None objects.  Closes   #1757  
						
						
						
					 
					
						2018-01-15 15:51:25 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ab7c45b12d 
							
						 
					 
					
						
						
							
							Fix error message and handling of doc.sents  
						
						
						
					 
					
						2018-01-15 15:21:11 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							465a6f6452 
							
						 
					 
					
						
						
							
							Add missing Span.vocab property.  Closes   #1633  
						
						
						
					 
					
						2018-01-14 15:06:30 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0cb090e526 
							
						 
					 
					
						
						
							
							Fix infinite recursion in token.sent_start.  Closes   #1640  
						
						
						
					 
					
						2018-01-14 15:02:15 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5cbe913b6f 
							
						 
					 
					
						
						
							
							Don't raise deprecation warning in property.  Closes   #1813 ,  #1712  
						
						
						
					 
					
						2018-01-14 14:55:58 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e10e9ad2c5 
							
						 
					 
					
						
						
							
							Improve efficiency of Doc.to_array  
						
						
						
					 
					
						2017-11-23 12:33:27 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fa62427300 
							
						 
					 
					
						
						
							
							Remove lookup-based lemmatization  
						
						
						
					 
					
						2017-11-23 12:32:22 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fb26b2cb12 
							
						 
					 
					
						
						
							
							Use lookup lemmatizer if lemma unset  
						
						
						
					 
					
						2017-11-23 12:31:58 +00:00 
						 
				 
			
				
					
						
							
							
								Burton DeWilde 
							
						 
					 
					
						
						
						
						
							
						
						
							a5c6869b2d 
							
						 
					 
					
						
						
							
							Fix bug where span.orth_ != span.text (see  #1612 )  
						
						
						
					 
					
						2017-11-20 12:05:43 -06:00 
						 
				 
			
				
					
						
							
							
								Motoki Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							a52e195a0a 
							
						 
					 
					
						
						
							
							Fixes Issue  #1207  where noun_chunks of Span gives an error.  
						
						... 
						
						
						
						Make sure to reference `self.doc` when getting the noun chunks.
Same fix as 9750a0128c 
						
					 
					
						2017-11-17 17:16:20 -08:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1c218397f6 
							
						 
					 
					
						
						
							
							Ensure path in Doc.to_disk/from_disk (resolves ##1521)  
						
						... 
						
						
						
						Also add Doc serialization tests with both Path and string path options 
						
					 
					
						2017-11-09 02:29:03 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							144a93c2a5 
							
						 
					 
					
						
						
							
							Back-off to tensor for similarity if no vectors  
						
						
						
					 
					
						2017-11-03 20:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62ed58935a 
							
						 
					 
					
						
						
							
							Add Doc.extend_tensor() method  
						
						
						
					 
					
						2017-11-03 11:20:31 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9659391944 
							
						 
					 
					
						
						
							
							Update deprecated methods and add warnings  
						
						
						
					 
					
						2017-11-01 16:49:42 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							705a4e3e4a 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-11-01 16:44:08 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e0ebee81c 
							
						 
					 
					
						
						
							
							Add Token.is_sent_start property, so can deprecate Token.sent_start  
						
						
						
					 
					
						2017-11-01 13:27:14 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7e7116cdf7 
							
						 
					 
					
						
						
							
							Fix Doc.to_array when only one string attr provided  
						
						
						
					 
					
						2017-11-01 13:26:43 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							301fb2bb60 
							
						 
					 
					
						
						
							
							Implement Span.n_lefts and Span.n_rights  
						
						
						
					 
					
						2017-11-01 13:25:12 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							86eba61fae 
							
						 
					 
					
						
						
							
							Fix token.vector when vectors are missing  
						
						
						
					 
					
						2017-11-01 00:47:35 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d96e72f656 
							
						 
					 
					
						
						
							
							Tidy up rest  
						
						
						
					 
					
						2017-10-27 21:07:59 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d2df81d907 
							
						 
					 
					
						
						
							
							Fix not implemented Span getters  
						
						
						
					 
					
						2017-10-27 18:09:28 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							544a407b93 
							
						 
					 
					
						
						
							
							Tidy up Doc, Token and Span and add missing docs  
						
						
						
					 
					
						2017-10-27 17:07:26 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6a0483b7aa 
							
						 
					 
					
						
						
							
							Tidy up and document Doc, Token and Span  
						
						
						
					 
					
						2017-10-27 15:41:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1a559d4c95 
							
						 
					 
					
						
						
							
							Remove old, unused file  
						
						
						
					 
					
						2017-10-27 15:34:35 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ea4a41c8fb 
							
						 
					 
					
						
						
							
							Tidy up util and helpers  
						
						
						
					 
					
						2017-10-27 14:39:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b66b8f028b 
							
						 
					 
					
						
						
							
							Fix   #1375  -- out-of-bounds on token.nbor()  
						
						
						
					 
					
						2017-10-24 12:10:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ccd2ab1a62 
							
						 
					 
					
						
						
							
							Merge pull request  #1443  from ramananbalakrishnan/develop-get-lca-matrix  
						
						... 
						
						
						
						Add LCA matrix for spans and docs 
						
					 
					
						2017-10-24 11:22:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fdf25d10ba 
							
						 
					 
					
						
						
							
							Merge pull request  #1440  from ramananbalakrishnan/develop  
						
						... 
						
						
						
						Support single value for attribute list in doc.to_array 
						
					 
					
						2017-10-24 10:23:12 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a31f048b4d 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-10-23 10:38:06 +02:00 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d2fe56a577 
							
						 
					 
					
						
						
							
							Add LCA matrix for spans and docs  
						
						
						
					 
					
						2017-10-20 23:58:00 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0726946563 
							
						 
					 
					
						
						
							
							cleanup to_array implementation using fixes on master  
						
						
						
					 
					
						2017-10-20 17:09:37 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b3ab124fc5 
							
						 
					 
					
						
						
							
							Support strings for attribute list in doc.to_array  
						
						
						
					 
					
						2017-10-20 11:46:57 +05:30 
						 
				 
			
				
					
						
							
							
								Ramanan Balakrishnan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7b9b1be44c 
							
						 
					 
					
						
						
							
							Support single value for attribute list in doc.to_array  
						
						
						
					 
					
						2017-10-19 17:00:41 +05:30 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							394633efce 
							
						 
					 
					
						
						
							
							Make doc pickling support hooks  
						
						
						
					 
					
						2017-10-17 19:44:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cdb0c426d8 
							
						 
					 
					
						
						
							
							Improve deserialization of user_data, esp. for Underscore  
						
						
						
					 
					
						2017-10-17 19:29:20 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							32a8564c79 
							
						 
					 
					
						
						
							
							Fix doc pickling  
						
						
						
					 
					
						2017-10-17 18:20:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							92c1eb2d6f 
							
						 
					 
					
						
						
							
							Fix Doc pickling. This also removes need for Binder class  
						
						
						
					 
					
						2017-10-17 16:11:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a002264fec 
							
						 
					 
					
						
						
							
							Remove caching of Token in Doc, as caused cycle.  
						
						
						
					 
					
						2017-10-16 19:34:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59c216196c 
							
						 
					 
					
						
						
							
							Allow weakrefs on Doc objects  
						
						
						
					 
					
						2017-10-16 19:22:11 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e0ff145a8b 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/dot-underscore  
						
						
						
					 
					
						2017-10-11 11:57:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3b527fa52b 
							
						 
					 
					
						
						
							
							Call morphology.assign_untagged when pushing token to Doc  
						
						
						
					 
					
						2017-10-11 03:23:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e0a9b02b67 
							
						 
					 
					
						
						
							
							Merge Span._ and Span.as_doc methods  
						
						
						
					 
					
						2017-10-09 22:00:15 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3fc4fe61d2 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2017-10-10 04:15:14 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							59c4f27499 
							
						 
					 
					
						
						
							
							Add get, set and has methods to Underscore  
						
						
						
					 
					
						2017-10-10 04:14:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							51d18937af 
							
						 
					 
					
						
						
							
							Partially apply doc/span/token into method  
						
						... 
						
						
						
						We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:
```
def partial(unbound_method, constant_arg):
    def bound_method(*args, **kwargs):
        return unbound_method(constant_arg, *args, **kwargs)
    return bound_method 
						
					 
					
						2017-10-10 02:21:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e938bce320 
							
						 
					 
					
						
						
							
							Adjust parsing transition system to allow preset sentence segments.  
						
						
						
					 
					
						2017-10-08 23:53:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							080afd4924 
							
						 
					 
					
						
						
							
							Add ternary value setting to Token.sent_start  
						
						
						
					 
					
						2017-10-08 23:51:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7ae67ec6a1 
							
						 
					 
					
						
						
							
							Add Span.as_doc method  
						
						
						
					 
					
						2017-10-08 23:50:20 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							668a0ea640 
							
						 
					 
					
						
						
							
							Pass extensions into Underscore class  
						
						
						
					 
					
						2017-10-07 18:56:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1289129fd9 
							
						 
					 
					
						
						
							
							Add Underscore class  
						
						
						
					 
					
						2017-10-07 18:00:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9bfd585a11 
							
						 
					 
					
						
						
							
							Fix parameter name in .pxd file  
						
						
						
					 
					
						2017-09-26 07:28:50 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2480f8f521 
							
						 
					 
					
						
						
							
							Add missing return in Doc.from_disk() ( closes   #1330 )  
						
						
						
					 
					
						2017-09-18 15:32:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							03b5b9727a 
							
						 
					 
					
						
						
							
							Fix Doc.vector for empty doc objects  
						
						
						
					 
					
						2017-08-22 19:52:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0551b7b03a 
							
						 
					 
					
						
						
							
							Fix doc.vector  
						
						
						
					 
					
						2017-08-22 19:46:52 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d55d6e1cfa 
							
						 
					 
					
						
						
							
							Fix comparison of Token from different docs.  Closes   #1257  
						
						
						
					 
					
						2017-08-19 16:39:32 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dea229c634 
							
						 
					 
					
						
						
							
							Fix Span.to_array method  
						
						
						
					 
					
						2017-08-19 16:24:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8b7ac77c23 
							
						 
					 
					
						
						
							
							Allow span label to be string in Doc.char_span  
						
						
						
					 
					
						2017-08-19 16:18:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							80236116a6 
							
						 
					 
					
						
						
							
							Add Doc.char_span method, to get a span by character offset  
						
						
						
					 
					
						2017-08-19 12:21:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							482bba1722 
							
						 
					 
					
						
						
							
							Add Span.to_array method  
						
						
						
					 
					
						2017-08-19 12:20:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6a2159969 
							
						 
					 
					
						
						
							
							Add slot for text categories to Doc  
						
						
						
					 
					
						2017-07-22 00:34:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2a3bd5ee90 
							
						 
					 
					
						
						
							
							Fix fetching of noun chunk iterator  
						
						
						
					 
					
						2017-06-04 15:53:05 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							92ae36f84e 
							
						 
					 
					
						
						
							
							Improve way noun chunks iterator is looked up  
						
						
						
					 
					
						2017-06-04 21:53:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							675f448313 
							
						 
					 
					
						
						
							
							Fix vector linkage on Doc  
						
						
						
					 
					
						2017-06-04 14:25:30 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f4662e9218 
							
						 
					 
					
						
						
							
							Fix vector linkage for token  
						
						
						
					 
					
						2017-06-04 14:19:58 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							459a1e8470 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2017-06-03 11:31:18 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5109bba910 
							
						 
					 
					
						
						
							
							Port over fix from  #1070  
						
						
						
					 
					
						2017-06-03 11:31:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							498ad85309 
							
						 
					 
					
						
						
							
							Try using tensor for vector/similarity methdos  
						
						
						
					 
					
						2017-05-30 23:35:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ddff020c3 
							
						 
					 
					
						
						
							
							Fix compile error  
						
						
						
					 
					
						2017-05-28 23:30:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6d3caeadd2 
							
						 
					 
					
						
						
							
							Fix type check for long  
						
						
						
					 
					
						2017-05-28 23:22:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7996d21717 
							
						 
					 
					
						
						
							
							Fixes for new StringStore  
						
						
						
					 
					
						2017-05-28 11:09:27 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fe11564b8e 
							
						 
					 
					
						
						
							
							Finish stringstore change. Also xfail vectors tests  
						
						
						
					 
					
						2017-05-28 15:10:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							84e66ca6d4 
							
						 
					 
					
						
						
							
							WIP on stringstore change. 27 failures  
						
						
						
					 
					
						2017-05-28 14:06:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							39293ab2ee 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-28 11:46:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2445707f3c 
							
						 
					 
					
						
						
							
							Re-delegate vectors to vocab  
						
						
						
					 
					
						2017-05-28 11:46:10 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66088851dc 
							
						 
					 
					
						
						
							
							Add Doc.to_disk() and Doc.from_disk() methods  
						
						
						
					 
					
						2017-05-24 11:58:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d44b1eafc4 
							
						 
					 
					
						
						
							
							Fix conflict artefacts  
						
						
						
					 
					
						2017-05-23 18:47:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01e59e4e6e 
							
						 
					 
					
						
						
							
							* Add Token.sent_start property, re Issue  #235  
						
						
						
					 
					
						2017-05-23 18:41:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d68dd1f251 
							
						 
					 
					
						
						
							
							Add SENT_START attribute, for custom sentence boundary detection  
						
						
						
					 
					
						2017-05-23 18:37:58 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							7ed8a92ed1 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Token  
						
						
						
					 
					
						2017-05-20 15:13:33 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a804045597 
							
						 
					 
					
						
						
							
							Use is_ancestor instead of deprecated is_ancestor_of  
						
						
						
					 
					
						2017-05-19 20:23:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e9e62b01b0 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Token  
						
						
						
					 
					
						2017-05-19 18:47:56 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							62ceec4fc6 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Span  
						
						
						
					 
					
						2017-05-19 18:47:46 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							23f9a3ccc8 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc  
						
						
						
					 
					
						2017-05-19 18:47:39 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0791f0aae6 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Span class  
						
						
						
					 
					
						2017-05-19 00:31:31 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8455cb1327 
							
						 
					 
					
						
						
							
							Update docstring for Doc.__getitem__  
						
						
						
					 
					
						2017-05-19 00:30:51 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b687ad109d 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc class  
						
						
						
					 
					
						2017-05-18 23:59:44 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							593361ee3c 
							
						 
					 
					
						
						
							
							Update docstrings for Span class  
						
						
						
					 
					
						2017-05-18 22:17:41 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b87066ff10 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Doc class  
						
						
						
					 
					
						2017-05-18 22:17:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4b9d69f428 
							
						 
					 
					
						
						
							
							Merge branch 'v2' into develop  
						
						... 
						
						
						
						* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module
Currently the two parsers live side-by-side, until we figure out how to
organize them. 
						
					 
					
						2017-05-14 01:10:23 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9d85cda8e4 
							
						 
					 
					
						
						
							
							Fix models error message and use about.__docs_models__ (see  #1051 )  
						
						
						
					 
					
						2017-05-13 13:05:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6b942763f0 
							
						 
					 
					
						
						
							
							Tidy up imports  
						
						
						
					 
					
						2017-05-13 13:04:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6129016e15 
							
						 
					 
					
						
						
							
							Replace deepcopy  
						
						
						
					 
					
						2017-05-13 12:32:37 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							df68bf45ce 
							
						 
					 
					
						
						
							
							Set defaults for light and flat kwargs  
						
						
						
					 
					
						2017-05-13 12:32:23 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b9dea345e5 
							
						 
					 
					
						
						
							
							Remove old import  
						
						
						
					 
					
						2017-05-13 12:32:11 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							293ee359c5 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-05-13 12:32:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ee1d35bdb0 
							
						 
					 
					
						
						
							
							Fix merge conflict  
						
						
						
					 
					
						2017-05-13 03:20:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b2540d2379 
							
						 
					 
					
						
						
							
							Merge Kengz's tree_print patch  
						
						
						
					 
					
						2017-05-13 03:18:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4efb391994 
							
						 
					 
					
						
						
							
							Fix serializer  
						
						
						
					 
					
						2017-05-09 18:45:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1166b0c491 
							
						 
					 
					
						
						
							
							Implement Doc.to_bytes and Doc.from_bytes methods  
						
						
						
					 
					
						2017-05-09 18:11:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e167b7bb6 
							
						 
					 
					
						
						
							
							Strip serializer from code  
						
						
						
					 
					
						2017-05-09 17:28:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62ecdea9f2 
							
						 
					 
					
						
						
							
							Add binder class for document serialization  
						
						
						
					 
					
						2017-05-09 17:21:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6782eedf9b 
							
						 
					 
					
						
						
							
							Tmp GPU code  
						
						
						
					 
					
						2017-05-07 11:04:24 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4d98511db7 
							
						 
					 
					
						
						
							
							Make Span hashable.  Closes   #1019  
						
						
						
					 
					
						2017-04-26 19:01:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6a4221a6de 
							
						 
					 
					
						
						
							
							Allow lemma to be set from Python. Re  #973  
						
						
						
					 
					
						2017-04-16 18:07:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0739ae7b76 
							
						 
					 
					
						
						
							
							Tidy up and fix formatting and imports  
						
						
						
					 
					
						2017-04-15 13:05:15 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3b667a24d4 
							
						 
					 
					
						
						
							
							Remove whitespace  
						
						
						
					 
					
						2017-04-01 10:21:08 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e71a1f4bd0 
							
						 
					 
					
						
						
							
							Fix download commands in error messages (see  #946 )  
						
						
						
					 
					
						2017-04-01 10:20:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							51882ee2b8 
							
						 
					 
					
						
						
							
							Fix check for setting ent_id in merge  
						
						
						
					 
					
						2017-03-31 19:32:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc3900e5b2 
							
						 
					 
					
						
						
							
							Allow ent_id to be set in Token  
						
						
						
					 
					
						2017-03-31 14:00:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9720103428 
							
						 
					 
					
						
						
							
							Improve attribute handlign in doc.merge(). Still unsatisfying  
						
						
						
					 
					
						2017-03-31 13:59:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0fefdfcbda 
							
						 
					 
					
						
						
							
							Merge pull request  #935  from ericzhao28/master  
						
						... 
						
						
						
						Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862 ) 
						
					 
					
						2017-03-30 02:51:24 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Zhao 
							
						 
					 
					
						
						
						
						
							
						
						
							aafdf6ffb8 
							
						 
					 
					
						
						
							
							Add option to use label karg to determine ent_type in doc.merge  
						
						
						
					 
					
						2017-03-28 23:35:03 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							28bb546939 
							
						 
					 
					
						
						
							
							Merge pull request  #883  from ericzhao28/master  
						
						... 
						
						
						
						Add `lower_` and `upper_` properties to `Span` class 
						
					 
					
						2017-03-16 23:35:47 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66c1f194f9 
							
						 
					 
					
						
						
							
							Use consistent unicode declarations  
						
						
						
					 
					
						2017-03-12 13:07:28 +01:00 
						 
				 
			
				
					
						
							
							
								Em 
							
						 
					 
					
						
						
						
						
							
						
						
							9c809efc25 
							
						 
					 
					
						
						
							
							Removed mapStr  
						
						
						
					 
					
						2017-03-11 16:23:26 -08:00 
						 
				 
			
				
					
						
							
							
								Em 
							
						 
					 
					
						
						
						
						
							
						
						
							426d17167f 
							
						 
					 
					
						
						
							
							Added string manipulation for spans  
						
						
						
					 
					
						2017-03-10 16:50:02 -08:00 
						 
				 
			
				
					
						
							
							
								Roman Inflianskas 
							
						 
					 
					
						
						
						
						
							
						
						
							66e1109b53 
							
						 
					 
					
						
						
							
							Add support for Universal Dependencies v2.0  
						
						
						
					 
					
						2017-03-03 13:17:34 +01:00 
						 
				 
			
				
					
						
							
							
								Matvey Ezhov 
							
						 
					 
					
						
						
						
						
							
						
						
							32a22291bc 
							
						 
					 
					
						
						
							
							Small Doc.count_by documentation update  
						
						... 
						
						
						
						Current example doesn't work 
						
					 
					
						2017-01-31 19:18:45 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6c665b81df 
							
						 
					 
					
						
						
							
							Fix redundant == TAG in from_array conditional  
						
						
						
					 
					
						2017-01-31 00:46:21 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e7f8e13cf3 
							
						 
					 
					
						
						
							
							Make Token hashable.  Fixes   #743  
						
						
						
					 
					
						2017-01-16 13:27:57 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							12cd27b821 
							
						 
					 
					
						
						
							
							Amend 8ae8b443f: Handle comparison with None tokens.  
						
						
						
					 
					
						2017-01-11 13:03:32 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44e2b0100d 
							
						 
					 
					
						
						
							
							Support TAG attribute in doc.from_array  
						
						
						
					 
					
						2017-01-10 22:47:07 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ae8b443f1 
							
						 
					 
					
						
						
							
							Add richcmp method to Token.  Closes   #631  
						
						
						
					 
					
						2017-01-09 19:30:31 +01:00 
						 
				 
			
				
					
						
							
							
								kengz 
							
						 
					 
					
						
						
						
						
							
						
						
							73a38bd4d1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/master'  
						
						
						
					 
					
						2016-12-30 12:19:59 -05:00 
						 
				 
			
				
					
						
							
							
								kengz 
							
						 
					 
					
						
						
						
						
							
						
						
							da44183ae1 
							
						 
					 
					
						
						
							
							move parse_tree logic to a new tokens/printers.py file  
						
						
						
					 
					
						2016-12-30 12:19:18 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							404019ad2f 
							
						 
					 
					
						
						
							
							Fix issue  #672 : ent_iob_ was a string, not unicode, due to missing unicode_literals statement.  
						
						
						
					 
					
						2016-12-18 22:33:53 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f6e356aada 
							
						 
					 
					
						
						
							
							Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue  #667  
						
						
						
					 
					
						2016-12-02 11:05:50 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							87613edf8f 
							
						 
					 
					
						
						
							
							Add set_struct_attr staticmethod to token  
						
						
						
					 
					
						2016-11-25 12:41:47 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fb69aa648f 
							
						 
					 
					
						
						
							
							Merge branch 'master' of ssh://github.com/explosion/spaCy  
						
						
						
					 
					
						2016-11-25 11:35:44 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9a03a3f85e 
							
						 
					 
					
						
						
							
							Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr.  
						
						
						
					 
					
						2016-11-25 11:35:17 +01:00 
						 
				 
			
				
					
						
							
							
								Pokey Rule 
							
						 
					 
					
						
						
						
						
							
						
						
							3e3bda142d 
							
						 
					 
					
						
						
							
							Add noun_chunks to Span  
						
						
						
					 
					
						2016-11-24 10:47:20 +00:00 
						 
				 
			
				
					
						
							
							
								tiago 
							
						 
					 
					
						
						
						
						
							
						
						
							b38cfd0ef9 
							
						 
					 
					
						
						
							
							now span.merge returns token like it says on documentation  
						
						
						
					 
					
						2016-11-09 14:58:19 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1fb09c3dc1 
							
						 
					 
					
						
						
							
							Fix morphology tagger  
						
						
						
					 
					
						2016-11-04 19:19:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							293c79c09a 
							
						 
					 
					
						
						
							
							Fix   #595 : Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly.  
						
						
						
					 
					
						2016-11-04 00:29:07 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f292f7f0e6 
							
						 
					 
					
						
						
							
							Fix Issue  #599 , by considering empty documents to be parsed and tagged. Implementation is a bit dodgy.  
						
						
						
					 
					
						2016-11-02 23:48:43 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							05a8b752a2 
							
						 
					 
					
						
						
							
							Fix Issue  #600 : Missing setters for Token attribute.  
						
						
						
					 
					
						2016-11-02 23:28:59 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							11664b9f20 
							
						 
					 
					
						
						
							
							Fix variable error in token  
						
						
						
					 
					
						2016-11-01 13:28:00 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8c4d1b46ce 
							
						 
					 
					
						
						
							
							Fix variable error in Span  
						
						
						
					 
					
						2016-11-01 13:27:44 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e7af6b937f 
							
						 
					 
					
						
						
							
							Fix syntax error while fixing doc strings  
						
						
						
					 
					
						2016-11-01 13:27:32 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b86f8af0c1 
							
						 
					 
					
						
						
							
							Fix doc strings  
						
						
						
					 
					
						2016-11-01 12:25:36 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ca31b4d87 
							
						 
					 
					
						
						
							
							Fix clobbering of 'missing' named ent values after assigning ents.  
						
						
						
					 
					
						2016-10-26 13:13:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							15c9b59f0e 
							
						 
					 
					
						
						
							
							Fix Issue  #461 : O tag was being clobbered by doc.ents.__set__  
						
						
						
					 
					
						2016-10-23 15:50:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c3a67b693 
							
						 
					 
					
						
						
							
							Fix calculation of vector norm, re Issue  #522 . Need to consolidate the calculations into a helper function.  
						
						
						
					 
					
						2016-10-23 14:49:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e80944276f 
							
						 
					 
					
						
						
							
							Fix Span.vector_norm  
						
						
						
					 
					
						2016-10-20 21:58:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3588a18fb8 
							
						 
					 
					
						
						
							
							Fix hook names in doc  
						
						
						
					 
					
						2016-10-19 21:15:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d5742b773 
							
						 
					 
					
						
						
							
							Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc.  
						
						
						
					 
					
						2016-10-19 20:54:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9b60186266 
							
						 
					 
					
						
						
							
							Fix doc class  
						
						
						
					 
					
						2016-10-17 15:23:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7fd98fc91c 
							
						 
					 
					
						
						
							
							Remove deprecation shim around str/bytes in Token.  
						
						
						
					 
					
						2016-10-17 14:02:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b67697a97b 
							
						 
					 
					
						
						
							
							Improve API for doc.merge() and span.merge(), to use keyword arguments.  
						
						
						
					 
					
						2016-10-17 14:02:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fbb7f3f15c 
							
						 
					 
					
						
						
							
							Add user_data attribute to Doc object.  
						
						
						
					 
					
						2016-10-17 11:43:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c1abc8f6ed 
							
						 
					 
					
						
						
							
							Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec  
						
						
						
					 
					
						2016-10-17 11:18:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							09ab447a18 
							
						 
					 
					
						
						
							
							Remove tensor property from token.  
						
						
						
					 
					
						2016-10-17 02:45:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5d10e2005c 
							
						 
					 
					
						
						
							
							Defer some attributes to Doc, via getters_for_tokens attribute.  
						
						
						
					 
					
						2016-10-17 02:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8829984efb 
							
						 
					 
					
						
						
							
							Remove tensor attribute from Span and Token.  
						
						
						
					 
					
						2016-10-17 02:44:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d15a88c66a 
							
						 
					 
					
						
						
							
							Defer some attributes to Doc via getters_for_spans  
						
						
						
					 
					
						2016-10-17 02:43:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62230dd13a 
							
						 
					 
					
						
						
							
							Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring  
						
						
						
					 
					
						2016-10-17 02:42:51 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ae11ea8240 
							
						 
					 
					
						
						
							
							Add getters_for_tokens and getters_for_spans attributes to Doc object.  
						
						
						
					 
					
						2016-10-17 02:42:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							311a985fe0 
							
						 
					 
					
						
						
							
							Add input error handling in Doc  
						
						
						
					 
					
						2016-10-16 18:16:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							06322ba99d 
							
						 
					 
					
						
						
							
							Add words and spaces keyword arguments to Doc.  
						
						
						
					 
					
						2016-10-16 18:13:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f3be9d0a9a 
							
						 
					 
					
						
						
							
							Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs  
						
						
						
					 
					
						2016-10-14 03:24:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ca32a1ab01 
							
						 
					 
					
						
						
							
							Revert "Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."  
						
						... 
						
						
						
						This reverts commit 8423e8627f 
						
					 
					
						2016-09-30 20:20:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6736977d82 
							
						 
					 
					
						
						
							
							Revert "Changes to Doc and Token for new string store scheme"  
						
						... 
						
						
						
						This reverts commit 99de44d864 
						
					 
					
						2016-09-30 20:11:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99de44d864 
							
						 
					 
					
						
						
							
							Changes to Doc and Token for new string store scheme  
						
						
						
					 
					
						2016-09-30 20:00:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8423e8627f 
							
						 
					 
					
						
						
							
							Work on Issue  #285 : intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.  
						
						
						
					 
					
						2016-09-30 10:14:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d3dc5718b2 
							
						 
					 
					
						
						
							
							Fix syntax error in Doc  
						
						
						
					 
					
						2016-09-28 11:39:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1b520e7bab 
							
						 
					 
					
						
						
							
							Improve docstrings for Doc object  
						
						
						
					 
					
						2016-09-28 11:15:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc4a7ad794 
							
						 
					 
					
						
						
							
							Test and fix Issue  #411 : IndexError when .sents property is used on empty string.  
						
						
						
					 
					
						2016-09-27 18:49:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							15e42a1ba9 
							
						 
					 
					
						
						
							
							Allow entities to be set by Span, or by 4-tuple (with entity ID)  
						
						
						
					 
					
						2016-09-24 01:17:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e48df859b5 
							
						 
					 
					
						
						
							
							Fix typedef import in span.pyx  
						
						
						
					 
					
						2016-09-23 16:02:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4de13606fd 
							
						 
					 
					
						
						
							
							Fix token.pyx  
						
						
						
					 
					
						2016-09-23 15:07:07 +02:00