Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e9022f7b33 
							
						 
					 
					
						
						
							
							Remove docstrings for deprecated arguments (see  #2703 )  
						
						
						
					 
					
						2018-08-26 14:23:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e7b075565d 
							
						 
					 
					
						
						
							
							💫  Rule-based NER component ( #2513 )  
						
						... 
						
						
						
						* Add helper function for reading in JSONL
* Add rule-based NER component
* Fix whitespace
* Add component to factories
* Add tests
* Add option to disable indent on json_dumps compat
Otherwise, reading JSONL back in line by line won't work
* Fix error code 
						
					 
					
						2018-07-18 19:43:16 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							f08c871adf 
							
						 
					 
					
						
						
							
							Fix typo in Language.from_disk  
						
						
						
					 
					
						2018-06-29 14:32:16 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							862da5e793 
							
						 
					 
					
						
						
							
							Support pipeline factories via entry points ( #2348 )  
						
						
						
					 
					
						2018-05-22 18:29:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c4a6d66fa 
							
						 
					 
					
						
						
							
							Merge master into develop. Big merge, many conflicts -- need to review  
						
						
						
					 
					
						2018-04-29 14:49:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a350be0601 
							
						 
					 
					
						
						
							
							Fix vector-name loading fix  
						
						
						
					 
					
						2018-04-04 01:31:25 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							81f4005f3d 
							
						 
					 
					
						
						
							
							Fix loading models with pretrained vectors  
						
						
						
					 
					
						2018-04-03 23:11:48 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e5f47cd82d 
							
						 
					 
					
						
						
							
							Update errors  
						
						
						
					 
					
						2018-04-03 21:40:29 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3141e04822 
							
						 
					 
					
						
						
							
							💫  New system for error messages and warnings ( #2163 )  
						
						... 
						
						
						
						* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None 
						
					 
					
						2018-04-03 15:50:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f3b7c5e537 
							
						 
					 
					
						
						
							
							Fix syntax error  
						
						
						
					 
					
						2018-03-29 21:50:32 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							23afa6429f 
							
						 
					 
					
						
						
							
							Add input length error, to address  #1826  
						
						
						
					 
					
						2018-03-29 21:45:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a7c5ae2beb 
							
						 
					 
					
						
						
							
							Avoid forcing a name on empty vectors, and remove print statement  
						
						
						
					 
					
						2018-03-28 21:08:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95a9615221 
							
						 
					 
					
						
						
							
							Fix loading of multiple pre-trained vectors  
						
						... 
						
						
						
						This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it. 
						
					 
					
						2018-03-28 16:02:59 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1f7229f40f 
							
						 
					 
					
						
						
							
							Revert "Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop"  
						
						... 
						
						
						
						This reverts commit c9ba3d3c2d92c26a35d4 
						
					 
					
						2018-03-27 19:23:02 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f57bfbccdc 
							
						 
					 
					
						
						
							
							Fix non-projective label filtering  
						
						
						
					 
					
						2018-03-27 13:41:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dd54511c4f 
							
						 
					 
					
						
						
							
							Pass data as a function in begin_training methods  
						
						
						
					 
					
						2018-03-27 09:39:59 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bede11b67c 
							
						 
					 
					
						
						
							
							Improve label management in parser and NER ( #2108 )  
						
						... 
						
						
						
						This patch does a few smallish things that tighten up the training workflow a little, and allow memory use during training to be reduced by letting the GoldCorpus stream data properly.
Previously, the parser and entity recognizer read and saved labels as lists, with extra labels noted separately. Lists were used becaue ordering is very important, to ensure that the label-to-class mapping is stable.
We now manage labels as nested dictionaries, first keyed by the action, and then keyed by the label. Values are frequencies. The trick is, how do we save new labels? We need to make sure we iterate over these in the same order they're added. Otherwise, we'll get different class IDs, and the model's predictions won't make sense.
To allow stable sorting, we map the new labels to negative values. If we have two new labels, they'll be noted as having "frequency" -1 and -2. The next new label will then have "frequency" -3. When we sort by (frequency, label), we then get a stable sort.
Storing frequencies then allows us to make the next nice improvement. Previously we had to iterate over the whole training set, to pre-process it for the deprojectivisation. This led to storing the whole training set in memory. This was most of the required memory during training.
To prevent this, we now store the frequencies as we stream in the data, and deprojectivize as we go. Once we've built the frequencies, we can then apply a frequency cut-off when we decide how many classes to make.
Finally, to allow proper data streaming, we also have to have some way of shuffling the iterator. This is awkward if the training files have multiple documents in them. To solve this, the GoldCorpus class now writes the training data to disk in msgpack files, one per document. We can then shuffle the data by shuffling the paths.
This is a squash merge, as I made a lot of very small commits. Individual commit messages below.
* Simplify label management for TransitionSystem and its subclasses
* Fix serialization for new label handling format in parser
* Simplify and improve GoldCorpus class. Reduce memory use, write to temp dir
* Set actions in transition system
* Require thinc 6.11.1.dev4
* Fix error in parser init
* Add unicode declaration
* Fix unicode declaration
* Update textcat test
* Try to get model training on less memory
* Print json loc for now
* Try rapidjson to reduce memory use
* Remove rapidjson requirement
* Try rapidjson for reduced mem usage
* Handle None heads when projectivising
* Stream json docs
* Fix train script
* Handle projectivity in GoldParse
* Fix projectivity handling
* Add minibatch_by_words util from ud_train
* Minibatch by number of words in spacy.cli.train
* Move minibatch_by_words util to spacy.util
* Fix label handling
* More hacking at label management in parser
* Fix encoding in msgpack serialization in GoldParse
* Adjust batch sizes in parser training
* Fix minibatch_by_words
* Add merge_subtokens function to pipeline.pyx
* Register merge_subtokens factory
* Restore use of msgpack tmp directory
* Use minibatch-by-words in train
* Handle retokenization in scorer
* Change back-off approach for missing labels. Use 'dep' label
* Update NER for new label management
* Set NER tags for over-segmented words
* Fix label alignment in gold
* Fix label back-off for infrequent labels
* Fix int type in labels dict key
* Fix int type in labels dict key
* Update feature definition for 8 feature set
* Update ud-train script for new label stuff
* Fix json streamer
* Print the line number if conll eval fails
* Update children and sentence boundaries after deprojectivisation
* Export set_children_from_heads from doc.pxd
* Render parses during UD training
* Remove print statement
* Require thinc 6.11.1.dev6. Try adding wheel as install_requires
* Set different dev version, to flush pip cache
* Update thinc version
* Update GoldCorpus docs
* Remove print statements
* Fix formatting and links [ci skip] 
						
					 
					
						2018-03-19 02:58:08 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							f3f8bfc367 
							
						 
					 
					
						
						
							
							Add built-in factories for merge_entities and merge_noun_chunks  
						
						... 
						
						
						
						Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')). 
						
					 
					
						2018-03-15 17:16:54 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d854f69fe3 
							
						 
					 
					
						
						
							
							Add built-in factories for merge_entities and merge_noun_chunks  
						
						... 
						
						
						
						Allows adding those components to the pipeline out-of-the-box if they're defined in a model's meta.json. Also allows usage as nlp.add_pipe(nlp.create_pipe('merge_entities')). 
						
					 
					
						2018-03-15 00:18:51 +01:00 
						 
				 
			
				
					
						
							
							
								Aaron Marquez 
							
						 
					 
					
						
						
						
						
							
						
						
							3765d84d57 
							
						 
					 
					
						
						
							
							Fix issue  #1959  
						
						
						
					 
					
						2018-02-15 12:51:49 -08:00 
						 
				 
			
				
					
						
							
							
								Claudiu-Vlad Ursache 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e28de12cbd 
							
						 
					 
					
						
						
							
							Ensure files opened in from_disk are closed  
						
						... 
						
						
						
						Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706 ). 
						
					 
					
						2018-02-13 20:49:43 +01:00 
						 
				 
			
				
					
						
							
							
								Motoki Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							f4a7d1a423 
							
						 
					 
					
						
						
							
							make to sure pass in **cfg to each component when training  
						
						
						
					 
					
						2018-01-30 18:29:54 -08:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							4046823699 
							
						 
					 
					
						
						
							
							Only check component in factories if string (see  #1911 )  
						
						
						
					 
					
						2018-01-30 16:29:07 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ce10d320c4 
							
						 
					 
					
						
						
							
							Fix component check in self.factories (see  #1911 )  
						
						
						
					 
					
						2018-01-30 16:09:37 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8901814248 
							
						 
					 
					
						
						
							
							Improve error handling if pipeline component is not callable ( resolves   #1911 )  
						
						... 
						
						
						
						Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name. 
						
					 
					
						2018-01-30 15:43:03 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a31506e060 
							
						 
					 
					
						
						
							
							Fix off-by-one error in nlp.add_pipe(after=name) ( fixes   #1654 )  
						
						
						
					 
					
						2017-11-28 20:37:55 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6362024cf8 
							
						 
					 
					
						
						
							
							Merge pull request  #1645  from GreenRiverRUS/fix_default_meta  
						
						... 
						
						
						
						Fixed spaCy version string in default meta 
						
					 
					
						2017-11-27 11:58:02 +00:00 
						 
				 
			
				
					
						
							
							
								Vadim Mazaev 
							
						 
					 
					
						
						
						
						
							
						
						
							59f03ab1d7 
							
						 
					 
					
						
						
							
							Fixed spacy version string in default meta  
						
						
						
					 
					
						2017-11-26 23:02:07 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8fec7268eb 
							
						 
					 
					
						
						
							
							Move string cleanup under a setting flag  
						
						
						
					 
					
						2017-11-23 12:19:18 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5949777b12 
							
						 
					 
					
						
						
							
							Fix misleading multi-threading docstring  
						
						
						
					 
					
						2017-11-23 12:18:59 +00:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							61d28d03e4 
							
						 
					 
					
						
						
							
							Try again to do selective remove cache  
						
						
						
					 
					
						2017-11-15 19:11:12 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							505c6a2f2f 
							
						 
					 
					
						
						
							
							Completely cleanup tokenizer cache  
						
						... 
						
						
						
						Tokenizer cache can have be different keys than string
That modification can slow down tokenizer and need to be measured 
						
					 
					
						2017-11-15 17:55:48 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							a33d5a068d 
							
						 
					 
					
						
						
							
							Try to hold origin data instead of restore it  
						
						
						
					 
					
						2017-11-14 22:40:03 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							91e2fa6561 
							
						 
					 
					
						
						
							
							Clean all caches  
						
						
						
					 
					
						2017-11-14 21:15:04 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							86ca434c93 
							
						 
					 
					
						
						
							
							Merge github.com:explosion/spaCy  
						
						
						
					 
					
						2017-11-14 17:46:22 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							a2745b0e84 
							
						 
					 
					
						
						
							
							StringStore now actually cleaned  
						
						... 
						
						
						
						Do not lose docs in ref tracking 
						
					 
					
						2017-11-14 17:45:50 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd1678eab3 
							
						 
					 
					
						
						
							
							Edit comment  
						
						
						
					 
					
						2017-11-11 18:37:08 +01:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							ee60a52ee7 
							
						 
					 
					
						
						
							
							Fix test imports and last batch cleanup  
						
						
						
					 
					
						2017-11-11 11:32:16 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							4a6b094e09 
							
						 
					 
					
						
						
							
							Remove unused import  
						
						
						
					 
					
						2017-11-11 03:13:05 +03:00 
						 
				 
			
				
					
						
							
							
								Roman Domrachev 
							
						 
					 
					
						
						
						
						
							
						
						
							3c600adf23 
							
						 
					 
					
						
						
							
							Try to fix StringStore clean up (see  #1506 )  
						
						
						
					 
					
						2017-11-11 03:11:27 +03:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							45e0617e61 
							
						 
					 
					
						
						
							
							Allow Language.update to take unicode text and dict objects  
						
						
						
					 
					
						2017-11-06 22:07:38 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5c85bf3791 
							
						 
					 
					
						
						
							
							Fix missing import  
						
						
						
					 
					
						2017-11-06 15:06:27 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							465adfee94 
							
						 
					 
					
						
						
							
							Remove unused resume_training method, and pass optimizer through  
						
						
						
					 
					
						2017-11-06 14:26:00 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							38109a0e4a 
							
						 
					 
					
						
						
							
							Register SentenceSegmenter in Language.factories  
						
						
						
					 
					
						2017-11-05 18:45:57 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d185927998 
							
						 
					 
					
						
						
							
							Undo harmful pickling hacks on Language class  
						
						
						
					 
					
						2017-11-04 23:07:03 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2bf21cbe29 
							
						 
					 
					
						
						
							
							Update model after optimising it instead of waiting  
						
						
						
					 
					
						2017-11-03 20:20:01 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5f661a1b3a 
							
						 
					 
					
						
						
							
							Remove tensorizer from pre-set pipe_names  
						
						
						
					 
					
						2017-11-01 19:48:33 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							bfe17b7df1 
							
						 
					 
					
						
						
							
							Fix begin_training if get_gold_tuples is None  
						
						
						
					 
					
						2017-11-01 13:14:31 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							37e62ab0e2 
							
						 
					 
					
						
						
							
							Update vector meta in meta.json  
						
						
						
					 
					
						2017-11-01 01:25:09 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8e02294241 
							
						 
					 
					
						
						
							
							Add vectors to Language.meta  
						
						
						
					 
					
						2017-10-30 18:39:48 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d96e72f656 
							
						 
					 
					
						
						
							
							Tidy up rest  
						
						
						
					 
					
						2017-10-27 21:07:59 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							91899d337b 
							
						 
					 
					
						
						
							
							Tidy up language, lemmatizer and scorer  
						
						
						
					 
					
						2017-10-27 14:40:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4033e70c71 
							
						 
					 
					
						
						
							
							Merge pull request  #1461  from explosion/feature/disable-pipes  
						
						... 
						
						
						
						💫  Add Language.disable_pipes(), to temporarily edit pipeline and update code examples 
					
						2017-10-27 12:21:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2d6ec99884 
							
						 
					 
					
						
						
							
							Set 'model' as default model name to prevent meta.json errors  
						
						
						
					 
					
						2017-10-26 16:12:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							90d1d9b230 
							
						 
					 
					
						
						
							
							Remove obsolete parser code  
						
						
						
					 
					
						2017-10-26 13:22:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b0f3ea2200 
							
						 
					 
					
						
						
							
							Fix names of pipeline components  
						
						... 
						
						
						
						NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective 
						
					 
					
						2017-10-26 12:38:23 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1a722dac31 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/disable-pipes  
						
						
						
					 
					
						2017-10-25 15:18:18 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6a00de4f77 
							
						 
					 
					
						
						
							
							Fix check of unexpected pipe names in restore()  
						
						
						
					 
					
						2017-10-25 14:56:35 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							7f03932477 
							
						 
					 
					
						
						
							
							Return self on __enter__  
						
						
						
					 
					
						2017-10-25 14:56:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e70f80f29e 
							
						 
					 
					
						
						
							
							Add Language.disable_pipes()  
						
						
						
					 
					
						2017-10-25 13:46:41 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3484174e48 
							
						 
					 
					
						
						
							
							Add Language.path  
						
						
						
					 
					
						2017-10-25 11:57:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							65bf5e85bd 
							
						 
					 
					
						
						
							
							Improve piping in language.pipe  
						
						
						
					 
					
						2017-10-18 21:46:12 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e35a83d142 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-10-17 18:22:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1cc85a89ef 
							
						 
					 
					
						
						
							
							Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes().  
						
						
						
					 
					
						2017-10-17 18:18:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							afa67de7ee 
							
						 
					 
					
						
						
							
							Merge pull request  #1428  from roanuz/develop  
						
						... 
						
						
						
						Fix trailing whitespace and Language.from_disk overwrites 
						
					 
					
						2017-10-17 16:29:15 +02:00 
						 
				 
			
				
					
						
							
							
								Anto Binish Kaspar 
							
						 
					 
					
						
						
						
						
							
						
						
							8f5b60c168 
							
						 
					 
					
						
						
							
							Fix Language.from_disk overwrites the meta.json file.  
						
						
						
					 
					
						2017-10-17 17:15:32 +05:30 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8ca344712d 
							
						 
					 
					
						
						
							
							Add Language.has_pipe method  
						
						
						
					 
					
						2017-10-17 11:20:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2bc06e4b22 
							
						 
					 
					
						
						
							
							Bump rolling buffer size to 10k  
						
						
						
					 
					
						2017-10-16 19:38:29 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5c14f3f033 
							
						 
					 
					
						
						
							
							Create a rolling buffer for the StringStore in Language.pipe()  
						
						
						
					 
					
						2017-10-16 19:22:40 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							37aa523a8e 
							
						 
					 
					
						
						
							
							Merge pull request  #1408  from explosion/feature/dot-underscore  
						
						... 
						
						
						
						💫  Custom attributes via Doc._, Token._ and Span._ 
					
						2017-10-11 18:35:56 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9620c1a640 
							
						 
					 
					
						
						
							
							Add lemma_lookup to Language defaults  
						
						
						
					 
					
						2017-10-11 13:26:05 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							67350fa496 
							
						 
					 
					
						
						
							
							Use better logic for auto-generating component name  
						
						... 
						
						
						
						Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component). 
						
					 
					
						2017-10-10 04:23:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0384f08218 
							
						 
					 
					
						
						
							
							Trigger nonproj.deprojectivize as a postprocess  
						
						
						
					 
					
						2017-10-07 02:00:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e43530269c 
							
						 
					 
					
						
						
							
							Update docstrings  
						
						
						
					 
					
						2017-10-07 01:04:50 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2586b61b15 
							
						 
					 
					
						
						
							
							Fix formatting, tidy up and remove unused imports  
						
						
						
					 
					
						2017-10-07 00:26:05 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							212c8f0711 
							
						 
					 
					
						
						
							
							Implement new Language methods and pipeline API  
						
						
						
					 
					
						2017-10-07 00:25:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							96da86b3e5 
							
						 
					 
					
						
						
							
							Add support for verbose flag to Language  
						
						
						
					 
					
						2017-10-03 09:14:57 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ae9ea7684 
							
						 
					 
					
						
						
							
							Remove unused argument in Language  
						
						
						
					 
					
						2017-09-26 05:41:35 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8716ffe57d 
							
						 
					 
					
						
						
							
							Serialize vocab last  
						
						
						
					 
					
						2017-09-24 05:01:45 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5a7fd0fd36 
							
						 
					 
					
						
						
							
							Fix vector linkage  
						
						
						
					 
					
						2017-09-22 20:11:52 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4348c479fc 
							
						 
					 
					
						
						
							
							Merge pre-trained vectors and noshare patches  
						
						
						
					 
					
						2017-09-22 20:07:28 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7dc61b3f43 
							
						 
					 
					
						
						
							
							Whitespace  
						
						
						
					 
					
						2017-09-22 20:00:50 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							20193371f5 
							
						 
					 
					
						
						
							
							Don't share CNN, to reduce complexities  
						
						
						
					 
					
						2017-09-21 14:59:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b832f89ff8 
							
						 
					 
					
						
						
							
							Add resume_training function  
						
						
						
					 
					
						2017-09-20 19:15:20 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c858927271 
							
						 
					 
					
						
						
							
							Copy vectors to GPU on begin training  
						
						
						
					 
					
						2017-09-18 18:04:16 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							43210abacc 
							
						 
					 
					
						
						
							
							Resolve fine-tuning conflict  
						
						
						
					 
					
						2017-09-17 05:30:04 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e37a50a436 
							
						 
					 
					
						
						
							
							Pass documents to tensorizer, not 'features'  
						
						
						
					 
					
						2017-09-16 12:46:36 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							70da88a3a7 
							
						 
					 
					
						
						
							
							Update comment on Language.begin_training  
						
						
						
					 
					
						2017-09-14 16:18:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							78a5f842e9 
							
						 
					 
					
						
						
							
							Fix update when update_shared=False  
						
						
						
					 
					
						2017-08-20 15:58:34 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8875590081 
							
						 
					 
					
						
						
							
							Add optimizer in Language.update if sgd=None  
						
						
						
					 
					
						2017-08-20 14:42:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a3c51a0355 
							
						 
					 
					
						
						
							
							Fix creation of pipeline  
						
						
						
					 
					
						2017-08-19 21:58:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							97aabafb5f 
							
						 
					 
					
						
						
							
							Document as_tuples keyword arg of Language.pipe  
						
						
						
					 
					
						2017-08-19 12:21:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							11c31d285c 
							
						 
					 
					
						
						
							
							Restore changes from nn-beam-parser  
						
						
						
					 
					
						2017-08-18 22:26:12 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							52c180ecf5 
							
						 
					 
					
						
						
							
							Revert "Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop"  
						
						... 
						
						
						
						This reverts commit ea8de11ad508e443e083 
						
					 
					
						2017-08-14 13:00:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4363b4aa4a 
							
						 
					 
					
						
						
							
							Fix redundant tokvecs updates during update  
						
						
						
					 
					
						2017-08-13 12:36:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0acce0521b 
							
						 
					 
					
						
						
							
							Fix Language.update for pipeline  
						
						
						
					 
					
						2017-08-06 14:13:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0eec7c9e9b 
							
						 
					 
					
						
						
							
							Fix Language.evaluate  
						
						
						
					 
					
						2017-08-06 02:18:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cc19ea0e7c 
							
						 
					 
					
						
						
							
							Add update_tensors flag to Language.update. Experimental, re  #1182  
						
						
						
					 
					
						2017-08-06 02:17:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e00361522 
							
						 
					 
					
						
						
							
							Fix update when 0 docs  
						
						
						
					 
					
						2017-08-01 22:10:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							523b0df2c9 
							
						 
					 
					
						
						
							
							Update text classification model  
						
						
						
					 
					
						2017-07-25 18:57:59 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d8aa721664 
							
						 
					 
					
						
						
							
							Compute Language.meta with a property  
						
						
						
					 
					
						2017-07-23 00:50:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							baa3d81c35 
							
						 
					 
					
						
						
							
							Add text categorizer to Language  
						
						
						
					 
					
						2017-07-22 01:13:36 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							836bfa2d0f 
							
						 
					 
					
						
						
							
							Add factory for experimental SimilarityHook component  
						
						
						
					 
					
						2017-06-05 15:40:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2479cde446 
							
						 
					 
					
						
						
							
							Support disable keyword in Language.__init__  
						
						
						
					 
					
						2017-06-05 13:13:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8f8f90b46b 
							
						 
					 
					
						
						
							
							Disable labeller if not parsing  
						
						
						
					 
					
						2017-06-04 20:18:54 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							939e8ed567 
							
						 
					 
					
						
						
							
							Add lookup properties for components in Language  
						
						
						
					 
					
						2017-06-04 15:52:09 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							92ae36f84e 
							
						 
					 
					
						
						
							
							Improve way noun chunks iterator is looked up  
						
						
						
					 
					
						2017-06-04 21:53:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							21eef90dbc 
							
						 
					 
					
						
						
							
							Support specifying which GPU  
						
						
						
					 
					
						2017-06-03 16:10:23 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fea1144e6d 
							
						 
					 
					
						
						
							
							Set max batch size in evaluate  
						
						
						
					 
					
						2017-06-03 13:31:33 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a3e4f91f4a 
							
						 
					 
					
						
						
							
							Only load vocab if it exists  
						
						
						
					 
					
						2017-06-01 14:38:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							33e5ec737f 
							
						 
					 
					
						
						
							
							Fix to/from disk methods  
						
						
						
					 
					
						2017-05-31 13:43:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1e6df0a2a1 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-29 14:30:12 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6145fe6a93 
							
						 
					 
					
						
						
							
							Catch all kwargs on Language  
						
						
						
					 
					
						2017-05-29 20:43:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9c9ee24411 
							
						 
					 
					
						
						
							
							Fix broken lambda scoping in Python 2  
						
						
						
					 
					
						2017-05-29 13:23:28 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aa4c33914b 
							
						 
					 
					
						
						
							
							Work on serialization  
						
						
						
					 
					
						2017-05-29 08:40:45 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7b06bb896e 
							
						 
					 
					
						
						
							
							Fix for serialization  
						
						
						
					 
					
						2017-05-29 13:42:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							74235587ef 
							
						 
					 
					
						
						
							
							Fix to serialization  
						
						
						
					 
					
						2017-05-29 13:40:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59f355d525 
							
						 
					 
					
						
						
							
							Fixes for serialization  
						
						
						
					 
					
						2017-05-29 13:38:20 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ff26aa6c37 
							
						 
					 
					
						
						
							
							Work on to/from bytes/disk serialization methods  
						
						
						
					 
					
						2017-05-29 11:45:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8a24c60c1e 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-28 08:12:05 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bc97bc292c 
							
						 
					 
					
						
						
							
							Fix __call__ method  
						
						
						
					 
					
						2017-05-28 08:11:58 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b082f76494 
							
						 
					 
					
						
						
							
							Randomize pipeline order during training  
						
						
						
					 
					
						2017-05-27 18:32:21 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							73a643d32a 
							
						 
					 
					
						
						
							
							Don't randomise pipeline for training, and don't update if no gradient  
						
						
						
					 
					
						2017-05-27 08:20:13 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8af3100143 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-26 11:31:41 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							353f0ef8d7 
							
						 
					 
					
						
						
							
							Use disable argument (list) for serialization  
						
						
						
					 
					
						2017-05-26 12:33:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dbf2a4cf57 
							
						 
					 
					
						
						
							
							Update all models on each epoch  
						
						
						
					 
					
						2017-05-25 19:46:56 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							82b11b0320 
							
						 
					 
					
						
						
							
							Remove print statement  
						
						
						
					 
					
						2017-05-25 17:15:59 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f403c2cd5f 
							
						 
					 
					
						
						
							
							Add env opts for optimizer  
						
						
						
					 
					
						2017-05-25 11:19:26 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8500d9b1da 
							
						 
					 
					
						
						
							
							Only train one task per iter, holding grads  
						
						
						
					 
					
						2017-05-25 06:47:42 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e6cc927ab1 
							
						 
					 
					
						
						
							
							Rearrange multi-task learning  
						
						
						
					 
					
						2017-05-24 20:10:54 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9adfe9e8fc 
							
						 
					 
					
						
						
							
							Don't hold gradient updates in language -- let the parser decide how to batch the updates.  
						
						
						
					 
					
						2017-05-23 04:29:10 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3959d778ac 
							
						 
					 
					
						
						
							
							Revert "Revert "WIP on improving parser efficiency""  
						
						... 
						
						
						
						This reverts commit 532afef4a8 
						
					 
					
						2017-05-23 03:06:53 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							532afef4a8 
							
						 
					 
					
						
						
							
							Revert "WIP on improving parser efficiency"  
						
						... 
						
						
						
						This reverts commit bdaac7ab44 
						
					 
					
						2017-05-23 03:05:25 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bdaac7ab44 
							
						 
					 
					
						
						
							
							WIP on improving parser efficiency  
						
						
						
					 
					
						2017-05-23 02:59:31 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							54f04a9fe0 
							
						 
					 
					
						
						
							
							Update API docs with changes in spacy.gold and spacy.language  
						
						
						
					 
					
						2017-05-22 12:29:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9262fc4829 
							
						 
					 
					
						
						
							
							Fix syntax error  
						
						
						
					 
					
						2017-05-22 05:14:59 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2a5eb9f61e 
							
						 
					 
					
						
						
							
							Make nonproj methods top-level functions, instead of class methods  
						
						
						
					 
					
						2017-05-22 04:51:08 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5738d373d5 
							
						 
					 
					
						
						
							
							Add deprojectivize to pipeline  
						
						
						
					 
					
						2017-05-22 04:51:08 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8d1e64be69 
							
						 
					 
					
						
						
							
							Add experimental NeuralLabeller  
						
						
						
					 
					
						2017-05-22 04:51:08 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5db89053aa 
							
						 
					 
					
						
						
							
							Merge docstrings  
						
						
						
					 
					
						2017-05-21 13:46:23 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							432b3499b3 
							
						 
					 
					
						
						
							
							Fix memory leak  
						
						
						
					 
					
						2017-05-21 13:38:46 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4c9202249d 
							
						 
					 
					
						
						
							
							Refactor training, to fix memory leak  
						
						
						
					 
					
						2017-05-21 09:07:06 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d82ae9a585 
							
						 
					 
					
						
						
							
							Change "function" to "callable" in docs  
						
						
						
					 
					
						2017-05-21 13:17:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3b7c108246 
							
						 
					 
					
						
						
							
							Pass tokvecs through as a list, instead of concatenated. Also fix padding  
						
						
						
					 
					
						2017-05-20 13:23:32 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							66ea9aebe7 
							
						 
					 
					
						
						
							
							Remove the state argument from Language  
						
						
						
					 
					
						2017-05-19 13:25:42 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2c8c9dc0c9 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Language  
						
						
						
					 
					
						2017-05-19 18:47:24 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d42bc16868 
							
						 
					 
					
						
						
							
							Update docstrings and API docs for Language class  
						
						
						
					 
					
						2017-05-18 23:57:38 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c2c825127a 
							
						 
					 
					
						
						
							
							Fix use_params and pipe methods  
						
						
						
					 
					
						2017-05-18 08:30:59 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2713041571 
							
						 
					 
					
						
						
							
							Fix GPU usage in Language  
						
						
						
					 
					
						2017-05-18 04:25:19 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							793430aa7a 
							
						 
					 
					
						
						
							
							Get spaCy train command working with neural network  
						
						... 
						
						
						
						* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab 
						
					 
					
						2017-05-17 12:04:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8cf097ca88 
							
						 
					 
					
						
						
							
							Redesign training to integrate NN components  
						
						... 
						
						
						
						* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly. 
						
					 
					
						2017-05-16 16:17:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5211645af3 
							
						 
					 
					
						
						
							
							Get data flowing through pipeline. Needs redesign  
						
						
						
					 
					
						2017-05-16 11:21:59 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a9edb3aa1d 
							
						 
					 
					
						
						
							
							Improve integration of NN parser, to support unified training API  
						
						
						
					 
					
						2017-05-15 21:53:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e167b7bb6 
							
						 
					 
					
						
						
							
							Strip serializer from code  
						
						
						
					 
					
						2017-05-09 17:28:50 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ea5fa46475 
							
						 
					 
					
						
						
							
							Import LEX_ATTRS from lang.lex_attrs  
						
						
						
					 
					
						2017-05-09 00:58:10 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6eb6306843 
							
						 
					 
					
						
						
							
							Fix language data imports  
						
						
						
					 
					
						2017-05-08 23:58:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d0e19267e8 
							
						 
					 
					
						
						
							
							Create directory if missing in save_to_directory  
						
						
						
					 
					
						2017-04-23 21:24:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4d2a659c52 
							
						 
					 
					
						
						
							
							Fix json dump for Python3  
						
						
						
					 
					
						2017-04-23 17:05:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ddd5194088 
							
						 
					 
					
						
						
							
							Update Language docs and docstrings  
						
						
						
					 
					
						2017-04-17 01:52:13 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							f62b740961 
							
						 
					 
					
						
						
							
							Use compat.json_dumps  
						
						
						
					 
					
						2017-04-17 01:46:14 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8e83f8e2fa 
							
						 
					 
					
						
						
							
							Update docstrings  
						
						
						
					 
					
						2017-04-17 01:40:26 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e2299dc389 
							
						 
					 
					
						
						
							
							Ensure path in save_to_directory  
						
						
						
					 
					
						2017-04-17 01:40:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4efd6fb9d6 
							
						 
					 
					
						
						
							
							Fix training  
						
						
						
					 
					
						2017-04-16 15:28:27 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							89a4f262fc 
							
						 
					 
					
						
						
							
							Fix training methods  
						
						
						
					 
					
						2017-04-16 13:00:37 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c05ec4b89a 
							
						 
					 
					
						
						
							
							Add compat functions and remove old workarounds  
						
						... 
						
						
						
						Add ensure_path util function to handle checking instance of path 
						
					 
					
						2017-04-15 12:11:16 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d24589aa72 
							
						 
					 
					
						
						
							
							Clean up imports, unused code, whitespace, docstrings  
						
						
						
					 
					
						2017-04-15 12:05:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							561f2a3eb4 
							
						 
					 
					
						
						
							
							Use consistent formatting for docstrings  
						
						
						
					 
					
						2017-04-15 11:59:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							33ba5066eb 
							
						 
					 
					
						
						
							
							Refactor Language.end_training, making new save_to_directory method  
						
						
						
					 
					
						2017-04-14 23:51:24 +02:00 
						 
				 
			
				
					
						
							
							
								oeg 
							
						 
					 
					
						
						
						
						
							
						
						
							010293fb2f 
							
						 
					 
					
						
						
							
							fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli  
						
						
						
					 
					
						2017-04-06 17:33:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							47a3ef06a6 
							
						 
					 
					
						
						
							
							Unhack deprojetivization, moving it into pipeline  
						
						... 
						
						
						
						Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes  #898 . 
						
					 
					
						2017-03-31 12:31:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83ba6c247c 
							
						 
					 
					
						
						
							
							Fix init of Language without model  
						
						
						
					 
					
						2017-03-26 16:46:00 +02:00 
						 
				 
			
				
					
						
							
							
								Raphaël Bournhonesque 
							
						 
					 
					
						
						
						
						
							
						
						
							f332bf05be 
							
						 
					 
					
						
						
							
							Remove unused import statements  
						
						
						
					 
					
						2017-03-21 21:08:54 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9605cf39cc 
							
						 
					 
					
						
						
							
							Handle default path in Language classes  
						
						
						
					 
					
						2017-03-18 12:58:45 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8843b84bd1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/develop-downloads'  
						
						
						
					 
					
						2017-03-16 12:00:42 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							618ce3b425 
							
						 
					 
					
						
						
							
							Add .meta to Language object  
						
						... 
						
						
						
						Allows getting the current model's meta data, e.g.:
nlp = spacy.load('my-model')
print(nlp.meta) 
						
					 
					
						2017-03-16 17:14:56 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b382dc902c 
							
						 
					 
					
						
						
							
							Add morph rules in Language  
						
						
						
					 
					
						2017-03-15 09:24:40 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f70be44746 
							
						 
					 
					
						
						
							
							Use lemmatizer in code, not from downloaded model.  
						
						
						
					 
					
						2017-03-15 04:52:50 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f71eeef9bb 
							
						 
					 
					
						
						
							
							Pass path argument to end_training  
						
						
						
					 
					
						2017-03-09 18:42:40 -06:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cd33b39a04 
							
						 
					 
					
						
						
							
							Fix 2/3 problem for json save/load  
						
						
						
					 
					
						2017-03-08 01:39:13 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							aa876884f0 
							
						 
					 
					
						
						
							
							Revert "Revert "Merge remote-tracking branch 'origin/master'""  
						
						... 
						
						
						
						This reverts commit fb9d3bb022 
						
					 
					
						2017-01-09 13:28:13 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3679fb43a3 
							
						 
					 
					
						
						
							
							Fix loading of lemmatizer  
						
						
						
					 
					
						2016-12-18 17:34:09 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b11d8cd3db 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data  
						
						
						
					 
					
						2016-12-18 16:57:12 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							753068f1d5 
							
						 
					 
					
						
						
							
							Use base language data as default  
						
						
						
					 
					
						2016-12-18 16:55:25 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bcc1d50d09 
							
						 
					 
					
						
						
							
							Remove trailing whitespace  
						
						
						
					 
					
						2016-12-18 16:54:52 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44f4f008bd 
							
						 
					 
					
						
						
							
							Wire up lemmatizer rules for English  
						
						
						
					 
					
						2016-12-18 15:50:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							296d33a4fc 
							
						 
					 
					
						
						
							
							Merge branch 'master' of ssh://github.com/explosion/spaCy  
						
						
						
					 
					
						2016-11-26 12:36:18 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1f6c37c6f5 
							
						 
					 
					
						
						
							
							Fix create_tokenizer when nlp is None  
						
						
						
					 
					
						2016-11-26 12:36:04 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c7889492f9 
							
						 
					 
					
						
						
							
							Fix model saving error for Python 3  
						
						
						
					 
					
						2016-11-25 18:04:30 -06:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							159e8c46e1 
							
						 
					 
					
						
						
							
							Merge old training fixes with newer state  
						
						
						
					 
					
						2016-11-25 09:16:36 -06:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2f55e7015 
							
						 
					 
					
						
						
							
							Pass cfg through loading, for training.  
						
						
						
					 
					
						2016-11-25 09:01:20 -06:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							09f68bc641 
							
						 
					 
					
						
						
							
							Fix Issue  #639 : stop words in language class not used. This patch is messy, but it's better not to change too much until the language data loading can be properly refactored.  
						
						
						
					 
					
						2016-11-24 00:13:55 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							48e1dc29d4 
							
						 
					 
					
						
						
							
							Fix default path loading.  
						
						
						
					 
					
						2016-11-23 23:48:55 +01:00 
						 
				 
			
				
					
						
							
							
								ExplodingCabbage 
							
						 
					 
					
						
						
						
						
							
						
						
							6c4f488e89 
							
						 
					 
					
						
						
							
							Fix syntax mistake  
						
						
						
					 
					
						2016-11-23 15:12:45 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							60eb2343ce 
							
						 
					 
					
						
						
							
							Only try to load vectors if they exist.  
						
						
						
					 
					
						2016-11-23 13:50:24 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							618ac36093 
							
						 
					 
					
						
						
							
							Fix use of path argument in Language.__init__. Needs to be keyword arg, not positional.  
						
						
						
					 
					
						2016-11-23 13:26:34 +01:00 
						 
				 
			
				
					
						
							
							
								Mark Amery 
							
						 
					 
					
						
						
						
						
							
						
						
							fbe19680a6 
							
						 
					 
					
						
						
							
							Fix another bug related to Language.__init__'s path parameter  
						
						
						
					 
					
						2016-11-20 20:31:34 +00:00 
						 
				 
			
				
					
						
							
							
								Mark Amery 
							
						 
					 
					
						
						
						
						
							
						
						
							b0a07c21a0 
							
						 
					 
					
						
						
							
							Fix path param of Language.__init__ always being ignored  
						
						... 
						
						
						
						There was an explicitly-declared `path` keyword argument, so 'path'
would never be present in `**overrides`. This line just overwrote
any manually-specified value the user might've passed to the `path`
parameter. 
						
					 
					
						2016-11-20 16:29:57 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							22647c2423 
							
						 
					 
					
						
						
							
							Check that patterns aren't null before compiling regex for tokenizer  
						
						
						
					 
					
						2016-11-02 20:35:29 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f7fee6c24b 
							
						 
					 
					
						
						
							
							Check for class-defined make_docs method before assigning one provided as an argument  
						
						
						
					 
					
						2016-11-02 19:57:13 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b86f8af0c1 
							
						 
					 
					
						
						
							
							Fix doc strings  
						
						
						
					 
					
						2016-11-01 12:25:36 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cb49189477 
							
						 
					 
					
						
						
							
							Remove dead code  
						
						
						
					 
					
						2016-10-26 13:11:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							150e02d72e 
							
						 
					 
					
						
						
							
							Fix Issue  #566  
						
						
						
					 
					
						2016-10-23 20:19:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							739213a8af 
							
						 
					 
					
						
						
							
							Fix create_pipeline keyword argument.  
						
						
						
					 
					
						2016-10-23 14:24:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5ec32f5d97 
							
						 
					 
					
						
						
							
							Fix loading of GloVe vectors, to address Issue  #541  
						
						
						
					 
					
						2016-10-20 18:27:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d4aaf2752c 
							
						 
					 
					
						
						
							
							Fix issue  #535 : Pipeline elements added even when data not installed.  
						
						
						
					 
					
						2016-10-19 19:55:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1b651db9c5 
							
						 
					 
					
						
						
							
							Fix parser creation in Language class.  
						
						
						
					 
					
						2016-10-18 19:36:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							45a6f9b9c7 
							
						 
					 
					
						
						
							
							Fix loading of tagger.  
						
						
						
					 
					
						2016-10-18 19:33:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7d5212f131 
							
						 
					 
					
						
						
							
							Refactor defaults  
						
						
						
					 
					
						2016-10-18 16:18:25 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f787cd29fe 
							
						 
					 
					
						
						
							
							Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.  
						
						
						
					 
					
						2016-10-16 21:34:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ca51f3b77e 
							
						 
					 
					
						
						
							
							Use DependencyParser and EntityRecognizer in the Language class.  
						
						
						
					 
					
						2016-10-16 17:58:12 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a81c5a7abf 
							
						 
					 
					
						
						
							
							Fix name of labels keyword to 'actions'.  
						
						
						
					 
					
						2016-10-16 12:00:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8a6b35d266 
							
						 
					 
					
						
						
							
							Delay binding in MakeDoc  
						
						
						
					 
					
						2016-10-16 11:41:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							08e9134760 
							
						 
					 
					
						
						
							
							Change default value of path to True  
						
						
						
					 
					
						2016-10-15 14:12:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6d8cb515ac 
							
						 
					 
					
						
						
							
							Break the tokenization stage out of the pipeline into a function 'make_doc'. This allows all pipeline methods to have the same signature.  
						
						
						
					 
					
						2016-10-14 17:38:29 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							41f88ce938 
							
						 
					 
					
						
						
							
							Fix dep model loading in parser  
						
						
						
					 
					
						2016-10-12 20:26:38 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0e2bedc373 
							
						 
					 
					
						
						
							
							Fix default labels for parser and NER  
						
						
						
					 
					
						2016-10-12 19:12:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							847a4a4182 
							
						 
					 
					
						
						
							
							Refactor Language, dropping Language.blank() method.  
						
						
						
					 
					
						2016-10-12 13:45:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ea23b64cc8 
							
						 
					 
					
						
						
							
							Refactor training, with new spacy.train module. Defaults still a little awkward.  
						
						
						
					 
					
						2016-10-09 12:24:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eceeaefe53 
							
						 
					 
					
						
						
							
							Fix defaults for Parser and Entity, adding a blank= argument.  
						
						
						
					 
					
						2016-09-30 19:56:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e382e48d9f 
							
						 
					 
					
						
						
							
							Temporarily patch handling of defaul templates for tagger. Need to move these to language_data.  
						
						
						
					 
					
						2016-09-27 13:21:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b14b9b096b 
							
						 
					 
					
						
						
							
							Return None if /deps directory not present, instead of trying to load the parser.  
						
						
						
					 
					
						2016-09-26 18:48:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0b2d7ae9d6 
							
						 
					 
					
						
						
							
							Fix Entity creation  
						
						
						
					 
					
						2016-09-26 15:41:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2debc4e0a2 
							
						 
					 
					
						
						
							
							Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class.  
						
						
						
					 
					
						2016-09-26 11:57:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							722199acb8 
							
						 
					 
					
						
						
							
							Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey  
						
						
						
					 
					
						2016-09-26 11:07:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7db956133e 
							
						 
					 
					
						
						
							
							Move tokenizer data for German into spacy.de.language_data  
						
						
						
					 
					
						2016-09-25 15:37:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95aaea0d3f 
							
						 
					 
					
						
						
							
							Refactor so that the tokenizer data is read from Python data, rather than from disk  
						
						
						
					 
					
						2016-09-25 14:49:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd58f7655a 
							
						 
					 
					
						
						
							
							Python 3 compatible basestring  
						
						
						
					 
					
						2016-09-24 22:16:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd65cf6cbb 
							
						 
					 
					
						
						
							
							Finish refactoring data loading  
						
						
						
					 
					
						2016-09-24 20:26:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83e364188c 
							
						 
					 
					
						
						
							
							Mostly finished loading refactoring. Design is in place, but doesn't work yet.  
						
						
						
					 
					
						2016-09-24 15:42:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9dc8043a7e 
							
						 
					 
					
						
						
							
							Refactor Language to use new Defaults class, and work on revised data loading. We're getting rid of sputnik's weird file-system wrapper, and using pathlib.  
						
						
						
					 
					
						2016-09-24 14:08:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4d7f5468bb 
							
						 
					 
					
						
						
							
							* Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded  
						
						
						
					 
					
						2016-05-17 16:55:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0f957dd586 
							
						 
					 
					
						
						
							
							Merge branch 'master' of ssh://github.com/honnibal/spaCy  
						
						
						
					 
					
						2016-04-14 10:37:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							61d20de35d 
							
						 
					 
					
						
						
							
							* Fix language.py docstring  
						
						
						
					 
					
						2016-04-14 10:36:57 +02:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							ff690f76ba 
							
						 
					 
					
						
						
							
							fix loading non-german models  
						
						
						
					 
					
						2016-04-12 16:00:56 +02:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							03fb498dbe 
							
						 
					 
					
						
						
							
							introduce lang field for LexemeC to hold language id  
						
						... 
						
						
						
						put noun_chunk logic into iterators.py for each language separately 
						
					 
					
						2016-03-10 13:01:34 +01:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							bc9c62e279 
							
						 
					 
					
						
						
							
							replace Language functions with corresponding orth functions  
						
						... 
						
						
						
						implement punctuation functions in orth 
						
					 
					
						2016-03-09 18:07:37 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							931c07a609 
							
						 
					 
					
						
						
							
							initial proposal for separate vector package  
						
						
						
					 
					
						2016-03-04 11:09:06 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a95974ad3f 
							
						 
					 
					
						
						
							
							* Fix oov probability  
						
						
						
					 
					
						2016-02-06 15:13:55 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1ef84a0557 
							
						 
					 
					
						
						
							
							* Merge master into rethinc2  
						
						
						
					 
					
						2016-02-05 12:55:59 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							249dccbe95 
							
						 
					 
					
						
						
							
							* Fix Language.pipe  
						
						
						
					 
					
						2016-02-05 12:47:57 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							af58f273b3 
							
						 
					 
					
						
						
							
							* Fix spacy.language.pipe  
						
						
						
					 
					
						2016-02-05 12:20:29 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							419edfab50 
							
						 
					 
					
						
						
							
							* Use generic flags for the new attributes until they're added  
						
						
						
					 
					
						2016-02-04 15:50:54 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e5c96c969f 
							
						 
					 
					
						
						
							
							* Wire up new attributes  
						
						
						
					 
					
						2016-02-04 13:04:58 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							84b247ef83 
							
						 
					 
					
						
						
							
							* Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading.  
						
						
						
					 
					
						2016-02-03 02:10:58 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fcfc17a164 
							
						 
					 
					
						
						
							
							Merge branch 'master' into rethinc2  
						
						
						
					 
					
						2016-02-02 23:05:34 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59123443e2 
							
						 
					 
					
						
						
							
							* Check for presence/absence of the different models in Language.end_training  
						
						
						
					 
					
						2016-02-02 22:49:55 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9e9d4c8706 
							
						 
					 
					
						
						
							
							* Fix stupid error in Language.batch  
						
						
						
					 
					
						2016-02-01 09:49:32 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							98fbdf2856 
							
						 
					 
					
						
						
							
							* Add Language.batch() method, to support multi-threaded jobs  
						
						
						
					 
					
						2016-02-01 09:01:13 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c4a89d56bd 
							
						 
					 
					
						
						
							
							* Automatically register any entity types pre-set on the tokens, so that the NER works with user-given entity types.  
						
						
						
					 
					
						2016-01-19 20:09:26 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bba0a5e078 
							
						 
					 
					
						
						
							
							* Handle string paths in default_vocab, default_parser, default_entity in Language class  
						
						
						
					 
					
						2016-01-18 22:37:24 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							41ea14a56f 
							
						 
					 
					
						
						
							
							fix pickling  
						
						
						
					 
					
						2016-01-16 13:23:11 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							235f094534 
							
						 
					 
					
						
						
							
							untangle data_path/via  
						
						
						
					 
					
						2016-01-16 12:23:45 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							846fa49b2a 
							
						 
					 
					
						
						
							
							distinct load() and from_package() methods  
						
						
						
					 
					
						2016-01-16 10:00:57 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							211913d689 
							
						 
					 
					
						
						
							
							add about.py, adapt setup.py  
						
						
						
					 
					
						2016-01-15 18:57:01 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							f8a8f97d25 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2016-01-15 18:13:37 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							780cb847c9 
							
						 
					 
					
						
						
							
							add default_model to about  
						
						
						
					 
					
						2016-01-15 18:07:15 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							788f734513 
							
						 
					 
					
						
						
							
							refactored data_dir->via, add zip_safe, add spacy.load()  
						
						
						
					 
					
						2016-01-15 18:01:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							bc229790ac 
							
						 
					 
					
						
						
							
							integrate with sputnik  
						
						
						
					 
					
						2016-01-13 19:46:17 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf2ad59f1 
							
						 
					 
					
						
						
							
							* Fix use of mock Package object  
						
						
						
					 
					
						2015-12-31 04:13:15 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6ba43ecaf 
							
						 
					 
					
						
						
							
							* Fix errors in packaging revision  
						
						
						
					 
					
						2015-12-29 18:37:26 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aec130af56 
							
						 
					 
					
						
						
							
							Use util.Package class for io  
						
						... 
						
						
						
						Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download 
						
					 
					
						2015-12-29 18:00:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f5dea1406d 
							
						 
					 
					
						
						
							
							* Fix silly mistake in Language.__init__  
						
						
						
					 
					
						2015-12-28 18:48:57 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							187960606f 
							
						 
					 
					
						
						
							
							* Fix pickle problems  
						
						
						
					 
					
						2015-12-28 16:54:03 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8c7e149ec9 
							
						 
					 
					
						
						
							
							* Replace kwargs argument of Language.__init__ with explicit arguments, to fix pickle bug  
						
						
						
					 
					
						2015-12-28 15:56:27 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							d8d348bb55 
							
						 
					 
					
						
						
							
							allow to specify version constraint within model name  
						
						
						
					 
					
						2015-12-18 19:12:08 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							cfa187aaf0 
							
						 
					 
					
						
						
							
							fix tests  
						
						
						
					 
					
						2015-12-18 10:58:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							8359bd4d93 
							
						 
					 
					
						
						
							
							strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible  
						
						
						
					 
					
						2015-12-18 09:52:55 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							345dda6f53 
							
						 
					 
					
						
						
							
							small fixes, add package build step  
						
						
						
					 
					
						2015-12-07 06:50:26 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9027cef3bc 
							
						 
					 
					
						
						
							
							access model via sputnik  
						
						
						
					 
					
						2015-12-07 06:01:28 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3c162dcac3 
							
						 
					 
					
						
						
							
							* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.  
						
						
						
					 
					
						2015-11-07 03:24:30 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							adc7bbd6cf 
							
						 
					 
					
						
						
							
							* Fix name of like_num in default_lex_attrs  
						
						
						
					 
					
						2015-11-04 22:02:47 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e96faf29e7 
							
						 
					 
					
						
						
							
							* Rename like_number to like_num, to fix inconsistency re Issue  #166  
						
						
						
					 
					
						2015-11-04 22:01:44 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f18fd8c659 
							
						 
					 
					
						
						
							
							* Fix language.py for change in StringStore load API  
						
						
						
					 
					
						2015-10-23 03:48:12 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2348a08481 
							
						 
					 
					
						
						
							
							* Load/dump strings with a json file, instead of the hacky strings file we were using.  
						
						
						
					 
					
						2015-10-22 21:13:03 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9baf0abd59 
							
						 
					 
					
						
						
							
							* Save vocab after training.  
						
						
						
					 
					
						2015-10-22 21:09:14 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							20fd36a0f7 
							
						 
					 
					
						
						
							
							* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue  #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.  
						
						
						
					 
					
						2015-10-13 13:44:41 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6ced80c0c 
							
						 
					 
					
						
						
							
							* Fix Issue  #116 : Misleading handling of True value in Language.__init__.  
						
						
						
					 
					
						2015-09-29 20:54:12 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							27f988b167 
							
						 
					 
					
						
						
							
							* Remove the vectors option to Vocab, preferring to either load vectors from disk, or set them on the Lexeme objects.  
						
						
						
					 
					
						2015-09-15 14:41:48 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e13e47e9e5 
							
						 
					 
					
						
						
							
							* Add English stop words  
						
						
						
					 
					
						2015-09-14 17:48:51 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d9f1fc2112 
							
						 
					 
					
						
						
							
							* Add deprecation warning for unused load_vectors argument.  
						
						
						
					 
					
						2015-09-09 14:31:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							534e3dda3c 
							
						 
					 
					
						
						
							
							* More work on language independent parsing  
						
						
						
					 
					
						2015-08-28 03:44:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c2307fa9ee 
							
						 
					 
					
						
						
							
							* More work on language-generic parsing  
						
						
						
					 
					
						2015-08-28 02:02:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0af139e183 
							
						 
					 
					
						
						
							
							* Tagger training now working. Still need to test load/save of model. Morphology still broken.  
						
						
						
					 
					
						2015-08-27 09:16:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							76996f4145 
							
						 
					 
					
						
						
							
							* Hack on generic Language class. Still needs work for morphology, defaults, etc  
						
						
						
					 
					
						2015-08-26 19:16:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f2f699ac18 
							
						 
					 
					
						
						
							
							* Add language base class  
						
						
						
					 
					
						2015-08-25 15:37:17 +02:00