mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-25 05:01:02 +03:00 
			
		
		
		
	Revert changes to optimizer default hyper-params (WIP) (#3415)
While developing v2.1, I ran a bunch of hyper-parameter search experiments to find settings that performed well for spaCy's NER and parser. I ended up changing the default Adam settings from beta1=0.9, beta2=0.999, eps=1e-8 to beta1=0.8, beta2=0.8, eps=1e-5. This was giving a small improvement in accuracy (like, 0.4%). Months later, I run the models with Prodigy, which uses beam-search decoding even when the model has been trained with a greedy objective. The new models performed terribly...So, wtf? After a couple of days debugging, I figured out that the new optimizer settings was causing the model to converge to solutions where the top-scoring class often had a score of like, -80. The variance on the weights had gone up enormously. I guess I needed to update the L2 regularisation as well? Anyway. Let's just revert the change --- if the optimizer is finding such extreme solutions, that seems bad, and not nearly worth the small improvement in accuracy. Currently training a slate of models, to verify the accuracy change is minimal. Once the training is complete, we can merge this. <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
This commit is contained in:
		
							parent
							
								
									62afa64a8d
								
							
						
					
					
						commit
						61617c64d5
					
				|  | @ -48,11 +48,11 @@ def cosine(vec1, vec2): | ||||||
| 
 | 
 | ||||||
| def create_default_optimizer(ops, **cfg): | def create_default_optimizer(ops, **cfg): | ||||||
|     learn_rate = util.env_opt("learn_rate", 0.001) |     learn_rate = util.env_opt("learn_rate", 0.001) | ||||||
|     beta1 = util.env_opt("optimizer_B1", 0.8) |     beta1 = util.env_opt("optimizer_B1", 0.9) | ||||||
|     beta2 = util.env_opt("optimizer_B2", 0.8) |     beta2 = util.env_opt("optimizer_B2", 0.999) | ||||||
|     eps = util.env_opt("optimizer_eps", 0.00001) |     eps = util.env_opt("optimizer_eps", 1e-8) | ||||||
|     L2 = util.env_opt("L2_penalty", 1e-6) |     L2 = util.env_opt("L2_penalty", 1e-6) | ||||||
|     max_grad_norm = util.env_opt("grad_norm_clip", 5.0) |     max_grad_norm = util.env_opt("grad_norm_clip", 1.0) | ||||||
|     optimizer = Adam(ops, learn_rate, L2=L2, beta1=beta1, beta2=beta2, eps=eps) |     optimizer = Adam(ops, learn_rate, L2=L2, beta1=beta1, beta2=beta2, eps=eps) | ||||||
|     optimizer.max_grad_norm = max_grad_norm |     optimizer.max_grad_norm = max_grad_norm | ||||||
|     optimizer.device = ops.device |     optimizer.device = ops.device | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user