Paul O'Leary McCann 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2fd8d616e7 
							
						 
					 
					
						
						
							
							Add docs section for spacy.cli.train.train ( #9545 )  
						
						... 
						
						
						
						* Add section for spacy.cli.train.train
* Add link from training page to train function
* Ensure path in train helper
* Update docs
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2021-10-29 10:36:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5003a9c3c7 
							
						 
					 
					
						
						
							
							Move core training logic in CLI into standalone function ( #9398 )  
						
						
						
					 
					
						2021-10-11 10:56:14 +02:00 
						 
				 
			
				
					
						
							
							
								Kabir Khan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1dfffe5fb4 
							
						 
					 
					
						
						
							
							No output info message in train ( #8885 )  
						
						... 
						
						
						
						* Add info message that no output directory was provided in train
* Update train.py
* Fix logging 
						
					 
					
						2021-08-05 09:21:22 +02:00 
						 
				 
			
				
					
						
							
							
								Santiago Castro 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ee63b2b199 
							
						 
					 
					
						
						
							
							Fix typo in train_cli docstring  
						
						
						
					 
					
						2021-06-25 22:45:03 -07:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d0c3775712 
							
						 
					 
					
						
						
							
							Replace links to nightly docs [ci skip]  
						
						
						
					 
					
						2021-01-30 20:09:38 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d25b1606d6 
							
						 
					 
					
						
						
							
							Allow reading config from sdtin in spacy train  
						
						
						
					 
					
						2020-12-08 18:01:40 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							75a202ce65 
							
						 
					 
					
						
						
							
							TextCat updates and fixes ( #6263 )  
						
						... 
						
						
						
						* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos 
						
					 
					
						2020-10-18 14:50:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							db419f6b2f 
							
						 
					 
					
						
						
							
							Improve control of training progress and logging ( #6184 )  
						
						... 
						
						
						
						* Make logging and progress easier to control
* Update docs
* Cleanup errors
* Fix ConfigValidationError
* Pass stdout/stderr, not wasabi.Printer
* Fix type
* Upd logging example
* Fix logger example
* Fix type 
						
					 
					
						2020-10-03 14:57:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							44160cd52f 
							
						 
					 
					
						
						
							
							Tidy up [ci skip]  
						
						
						
					 
					
						2020-10-01 10:41:19 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a5debb356d 
							
						 
					 
					
						
						
							
							Tidy up and adjust logging [ci skip]  
						
						
						
					 
					
						2020-09-30 01:22:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							56a2f778c4 
							
						 
					 
					
						
						
							
							Add logging [ci skip]  
						
						
						
					 
					
						2020-09-30 01:08:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0a1ee109db 
							
						 
					 
					
						
						
							
							Remove init form path  
						
						
						
					 
					
						2020-09-29 22:53:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0250bcf6a3 
							
						 
					 
					
						
						
							
							Show validation error during init  
						
						
						
					 
					
						2020-09-29 22:29:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e70a00fa76 
							
						 
					 
					
						
						
							
							Remove unnecessary warning from train  
						
						
						
					 
					
						2020-09-29 16:47:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3f0d61232d 
							
						 
					 
					
						
						
							
							Remove outdated arg from train  
						
						
						
					 
					
						2020-09-29 16:47:44 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a139fe672b 
							
						 
					 
					
						
						
							
							Fix typos and refactor CLI logging  
						
						
						
					 
					
						2020-09-28 21:17:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							822ea4ef61 
							
						 
					 
					
						
						
							
							Refactor CLI  
						
						
						
					 
					
						2020-09-28 15:09:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							c22ecc66bb 
							
						 
					 
					
						
						
							
							Don't support init path for now  
						
						
						
					 
					
						2020-09-28 12:46:28 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a5f2cc0509 
							
						 
					 
					
						
						
							
							Tidy up and remove raw text (rehearsal) for now  
						
						
						
					 
					
						2020-09-28 12:30:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e44a7519cd 
							
						 
					 
					
						
						
							
							Update CLI and add [initialize] block  
						
						
						
					 
					
						2020-09-28 11:56:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2fdb7285a0 
							
						 
					 
					
						
						
							
							Update CLI  
						
						
						
					 
					
						2020-09-28 11:06:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							553bfea641 
							
						 
					 
					
						
						
							
							Fix commands  
						
						
						
					 
					
						2020-09-28 10:53:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b886f53c31 
							
						 
					 
					
						
						
							
							init-pipeline runs (maybe doesnt work)  
						
						
						
					 
					
						2020-09-28 03:42:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ed2aff2db3 
							
						 
					 
					
						
						
							
							Remove unused train code  
						
						
						
					 
					
						2020-09-28 03:12:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3a0a3b8db6 
							
						 
					 
					
						
						
							
							Dont hard-code for 'corpora' name  
						
						
						
					 
					
						2020-09-28 03:06:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a3e1791c9c 
							
						 
					 
					
						
						
							
							Upd train  
						
						
						
					 
					
						2020-09-28 01:08:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b5556093e2 
							
						 
					 
					
						
						
							
							Start updating train script  
						
						
						
					 
					
						2020-09-27 23:59:44 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7e938ed63e 
							
						 
					 
					
						
						
							
							Update config resolution to use new Thinc  
						
						
						
					 
					
						2020-09-27 22:21:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							39b178999c 
							
						 
					 
					
						
						
							
							Tmp notes  
						
						
						
					 
					
						2020-09-27 20:13:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b4486d747d 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into fix/train-config-interpolation  
						
						
						
					 
					
						2020-09-26 15:32:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b2d07de786 
							
						 
					 
					
						
						
							
							Construct nlp from uninterpolated config before training  
						
						
						
					 
					
						2020-09-26 15:16:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ca3c997062 
							
						 
					 
					
						
						
							
							Improve CLI config validation with latest Thinc  
						
						
						
					 
					
						2020-09-26 13:13:57 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							009ba14aaf 
							
						 
					 
					
						
						
							
							Fix pretraining in train script ( #6143 )  
						
						... 
						
						
						
						* update pretraining API in train CLI
* bump thinc to 8.0.0a35
* bump to 3.0.0a26
* doc fixes
* small doc fix 
						
					 
					
						2020-09-25 15:47:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							be56c0994b 
							
						 
					 
					
						
						
							
							Add [training.before_to_disk] callback  
						
						
						
					 
					
						2020-09-24 12:40:25 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f69fea8b25 
							
						 
					 
					
						
						
							
							Improve error handling around non-number scores  
						
						
						
					 
					
						2020-09-24 11:29:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ae51f580c1 
							
						 
					 
					
						
						
							
							Fix handling of score_weights  
						
						
						
					 
					
						2020-09-24 10:27:33 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6435458d51 
							
						 
					 
					
						
						
							
							simplify expression  
						
						
						
					 
					
						2020-09-23 12:12:38 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							20b0ec5dcf 
							
						 
					 
					
						
						
							
							avoid logging performance of frozen components  
						
						
						
					 
					
						2020-09-23 10:37:12 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e863b3dc14 
							
						 
					 
					
						
						
							
							Merge pull request  #6092  from adrianeboyd/bugfix/load-vocab-lookups-2  
						
						
						
					 
					
						2020-09-19 12:33:38 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							39872de1f6 
							
						 
					 
					
						
						
							
							Introducing the gpu_allocator ( #6091 )  
						
						... 
						
						
						
						* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string 
						
					 
					
						2020-09-19 01:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							eed4b785f5 
							
						 
					 
					
						
						
							
							Load vocab lookups tables at beginning of training  
						
						... 
						
						
						
						Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
``` 
						
					 
					
						2020-09-18 15:59:16 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							427dbecdd6 
							
						 
					 
					
						
						
							
							cleanup and formatting  
						
						
						
					 
					
						2020-09-17 11:48:04 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							0c35885751 
							
						 
					 
					
						
						
							
							generalize corpora, dot notation for dev and train corpus  
						
						
						
					 
					
						2020-09-17 11:38:59 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							51fa929f47 
							
						 
					 
					
						
						
							
							rewrite train_corpus to corpus.train in config  
						
						
						
					 
					
						2020-09-15 21:58:04 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3216a33149 
							
						 
					 
					
						
						
							
							positive_label config for textcat ( #6062 )  
						
						... 
						
						
						
						* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit 
						
					 
					
						2020-09-14 17:08:00 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8e7557656f 
							
						 
					 
					
						
						
							
							Renaming gold & annotation_setter ( #6042 )  
						
						... 
						
						
						
						* version bump to 3.0.0a16
* rename "gold" folder to "training"
* rename 'annotation_setter' to 'set_extra_annotations'
* formatting 
						
					 
					
						2020-09-09 10:31:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ba5f4c9b32 
							
						 
					 
					
						
						
							
							Add words and seconds to train info  
						
						
						
					 
					
						2020-09-08 15:24:47 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ab1bb421ed 
							
						 
					 
					
						
						
							
							Update docs links in codebase  
						
						
						
					 
					
						2020-09-04 12:58:50 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b5a0657fd6 
							
						 
					 
					
						
						
							
							"model" terminology consistency in docs  
						
						
						
					 
					
						2020-09-03 13:13:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							122cb02001 
							
						 
					 
					
						
						
							
							Fix averages  
						
						
						
					 
					
						2020-09-02 19:37:43 +02:00