Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							008e1ee1dd 
							
						 
					 
					
						
						
							
							Update pretrain command  
						
						
						
					 
					
						2018-11-29 12:36:43 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							61e435610e 
							
						 
					 
					
						
						
							
							💫  Feature/improve pretraining ( #2971 )  
						
						... 
						
						
						
						* Improve spacy pretrain script
* Implement BERT-style 'masked language model' objective. Much better
results.
* Improve logging.
* Add length cap for documents, to avoid memory errors.
* Require thinc 7.0.0.dev1
* Require thinc 7.0.0.dev1
* Add argument for using pretrained vectors
* Fix defaults
* Fix syntax error
* Improve spacy pretrain script
* Implement BERT-style 'masked language model' objective. Much better
results.
* Improve logging.
* Add length cap for documents, to avoid memory errors.
* Require thinc 7.0.0.dev1
* Require thinc 7.0.0.dev1
* Add argument for using pretrained vectors
* Fix defaults
* Fix syntax error
* Tweak pretraining script
* Fix data limits in spacy.gold
* Fix pretrain script 
						
					 
					
						2018-11-28 18:04:58 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ef0820827a 
							
						 
					 
					
						
						
							
							Update hyper-parameters after NER random search ( #2972 )  
						
						... 
						
						
						
						These experiments were completed a few weeks ago, but I didn't make the PR, pending model release.
    Token vector width: 128->96
    Hidden width: 128->64
    Embed size: 5000->2000
    Dropout: 0.2->0.1
    Updated optimizer defaults (unclear how important?)
This should improve speed, model size and load time, while keeping
similar or slightly better accuracy.
The tl;dr is we prefer to prevent over-fitting by reducing model size,
rather than using more dropout. 
						
					 
					
						2018-11-27 18:49:52 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b4581435f6 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2018-11-16 13:08:22 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e2f75eb492 
							
						 
					 
					
						
						
							
							Fix message formatting  
						
						
						
					 
					
						2018-11-16 13:08:20 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2874b8efd8 
							
						 
					 
					
						
						
							
							Fix tok2vec loading in spacy train  
						
						
						
					 
					
						2018-11-15 23:34:54 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2ddd428834 
							
						 
					 
					
						
						
							
							Fix pretrain script  
						
						
						
					 
					
						2018-11-15 23:34:35 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f8afaa0c1c 
							
						 
					 
					
						
						
							
							Fix pretrain  
						
						
						
					 
					
						2018-11-15 22:46:53 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6af6950e46 
							
						 
					 
					
						
						
							
							Fix pretrain  
						
						
						
					 
					
						2018-11-15 22:45:36 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3e7b214e57 
							
						 
					 
					
						
						
							
							Make pretrain script work with stream from stdin  
						
						
						
					 
					
						2018-11-15 22:44:07 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8fdb9bc278 
							
						 
					 
					
						
						
							
							💫  Add experimental ULMFit/BERT/Elmo-like pretraining  ( #2931 )  
						
						... 
						
						
						
						* Add 'spacy pretrain' command
* Fix pretrain command for Python 2
* Fix pretrain command
* Fix pretrain command 
						
					 
					
						2018-11-15 22:17:16 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8f2a6367e9 
							
						 
					 
					
						
						
							
							Fix usage of PyTorch BiLSTM in ud_train  
						
						
						
					 
					
						2018-09-13 22:54:59 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							445b81ce3f 
							
						 
					 
					
						
						
							
							Support bilstm_depth argument in ud-train  
						
						
						
					 
					
						2018-09-13 19:30:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3eb9f3e2b8 
							
						 
					 
					
						
						
							
							Fix defaults for ud-train  
						
						
						
					 
					
						2018-09-13 18:05:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59cf533879 
							
						 
					 
					
						
						
							
							Improve ud-train script. Make config optional  
						
						
						
					 
					
						2018-09-13 14:24:08 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							da7650e84b 
							
						 
					 
					
						
						
							
							Fix maximum doc length in ud_train script  
						
						
						
					 
					
						2018-09-13 14:10:25 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4d2d7d5866 
							
						 
					 
					
						
						
							
							Fix new feature flags  
						
						
						
					 
					
						2018-08-27 02:12:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9c33d4d1df 
							
						 
					 
					
						
						
							
							Add more hyper-parameters to spacy ud-train  
						
						... 
						
						
						
						* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.
* conv_depth: Depth of the convolutional layers. Defaults to 4. 
						
					 
					
						2018-08-27 01:48:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							595c893791 
							
						 
					 
					
						
						
							
							Expose noise_level option in train CLI  
						
						
						
					 
					
						2018-08-16 00:41:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6ea981c839 
							
						 
					 
					
						
						
							
							Add converter for jsonl NER data  
						
						
						
					 
					
						2018-08-14 14:04:32 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							02c5c114d0 
							
						 
					 
					
						
						
							
							Fix usage of deprecated freqs.txt in init-model  
						
						
						
					 
					
						2018-08-14 13:19:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4336397ecb 
							
						 
					 
					
						
						
							
							Update develop from master  
						
						
						
					 
					
						2018-08-14 03:04:28 +02:00 
						 
				 
			
				
					
						
							
							
								Xiaoquan Kong 
							
						 
					 
					
						
						
						
						
							
						
						
							f0c9652ed1 
							
						 
					 
					
						
						
							
							New Feature: display more detail when Error E067 ( #2639 )  
						
						... 
						
						
						
						* Fix off-by-one error
* Add verbose option
* Update verbose option
* Update documents for verbose option 
						
					 
					
						2018-08-07 10:45:29 +02:00 
						 
				 
			
				
					
						
							
							
								Kaisa (Katarzyna) Korsak 
							
						 
					 
					
						
						
						
						
							
						
						
							e531a827db 
							
						 
					 
					
						
						
							
							Changed conllu2json to be able to extract NER tags ( #2594 )  
						
						... 
						
						
						
						* extract ner tags from conllu file if available
* fixed a bug in regex 
						
					 
					
						2018-07-25 22:21:31 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d84b13e02c 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2018-07-18 18:57:00 +02:00 
						 
				 
			
				
					
						
							
							
								Ole Henrik Skogstrøm 
							
						 
					 
					
						
						
						
						
							
						
						
							6e2930a4a2 
							
						 
					 
					
						
						
							
							Conll(u)-bio converter ( #2525 )  
						
						... 
						
						
						
						* Started simple conllxbiluo converter
* Fix missing BIO to BILUO conversion 
						
					 
					
						2018-07-18 18:55:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ae1bec8bf 
							
						 
					 
					
						
						
							
							Fix init_model  
						
						
						
					 
					
						2018-07-05 14:02:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dee8bdb900 
							
						 
					 
					
						
						
							
							Fix init-model for npz vectors  
						
						
						
					 
					
						2018-07-04 02:29:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59d655e8d0 
							
						 
					 
					
						
						
							
							Fix model init from jsonl  
						
						
						
					 
					
						2018-07-04 01:30:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1e38bea6e9 
							
						 
					 
					
						
						
							
							Save vectors init  
						
						
						
					 
					
						2018-07-03 23:55:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6692833887 
							
						 
					 
					
						
						
							
							Fix init_model  
						
						
						
					 
					
						2018-07-03 23:24:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4a38a26cb5 
							
						 
					 
					
						
						
							
							Fix init_model  
						
						
						
					 
					
						2018-07-03 22:57:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							019d09e3c3 
							
						 
					 
					
						
						
							
							Fix init model  
						
						
						
					 
					
						2018-07-03 22:16:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2543f8c93a 
							
						 
					 
					
						
						
							
							Support .npz vectors in init-model command  
						
						
						
					 
					
						2018-07-03 21:42:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							86aad11939 
							
						 
					 
					
						
						
							
							Fix init_model arg  
						
						
						
					 
					
						2018-07-03 17:00:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eff42d36e3 
							
						 
					 
					
						
						
							
							Fix init model command  
						
						
						
					 
					
						2018-07-03 16:32:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6a89faf12e 
							
						 
					 
					
						
						
							
							Add support for jsonl-formatted lexical attributes to init-model command.  
						
						
						
					 
					
						2018-07-03 12:22:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c83fccfe2a 
							
						 
					 
					
						
						
							
							Fix output of best model  
						
						
						
					 
					
						2018-06-25 23:05:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							69c900f003 
							
						 
					 
					
						
						
							
							Fix init-model if no vectors provided  
						
						
						
					 
					
						2018-06-25 18:26:02 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							664f89327a 
							
						 
					 
					
						
						
							
							Fix init-model if no vectors provided  
						
						
						
					 
					
						2018-06-25 17:58:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c4698f5712 
							
						 
					 
					
						
						
							
							Don't collate model unless training succeeds  
						
						
						
					 
					
						2018-06-25 16:36:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							24dfbb8a28 
							
						 
					 
					
						
						
							
							Fix model collation  
						
						
						
					 
					
						2018-06-25 14:35:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							62237755a4 
							
						 
					 
					
						
						
							
							Import shutil  
						
						
						
					 
					
						2018-06-25 13:40:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a040fca99e 
							
						 
					 
					
						
						
							
							Import json into cli.train  
						
						
						
					 
					
						2018-06-25 11:50:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c703d99c2 
							
						 
					 
					
						
						
							
							Fix collation of best models  
						
						
						
					 
					
						2018-06-25 01:21:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c80b7c013 
							
						 
					 
					
						
						
							
							Collate best model after training  
						
						
						
					 
					
						2018-06-24 23:39:52 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							330c039106 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2018-05-26 18:30:52 +02:00 
						 
				 
			
				
					
						
							
							
								James Messinger 
							
						 
					 
					
						
						
						
						
							
						
						
							4515e96e90 
							
						 
					 
					
						
						
							
							Better formatting for spacy train CLI ( #2357 )  
						
						... 
						
						
						
						* Better formatting for `spacy train` CLI
Changed to use fixed-spaces rather than tabs to align table headers and data.
### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```
### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```
* Added contributor file 
						
					 
					
						2018-05-25 13:08:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ce458c2428 
							
						 
					 
					
						
						
							
							Fix spacy requirement constraint in package template  
						
						
						
					 
					
						2018-05-22 20:50:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f3b4f6a4ec 
							
						 
					 
					
						
						
							
							Merge setup.py  
						
						
						
					 
					
						2018-05-20 23:21:00 +02:00