Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							526b416118 
							
						 
					 
					
						
						
							
							Tidy up comments  
						
						
						
					 
					
						2021-01-30 12:34:09 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							30765674d0 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2021-01-30 12:20:28 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2609ba4e89 
							
						 
					 
					
						
						
							
							Support building wheel in spacy package  
						
						
						
					 
					
						2021-01-30 11:54:02 +11:00 
						 
				 
			
				
					
						
							
							
								Pamphile ROY 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							41ee75ac6d 
							
						 
					 
					
						
						
							
							Remove --no-cache-dir when downloading models  
						
						... 
						
						
						
						When `--no-cache-dir` is present, it prevents caching to properly function.
If the user still wants to do this, there is the possibility to pass options with `user_pip_args`.
But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models. 
						
					 
					
						2021-01-29 15:37:44 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bbf080dfe5 
							
						 
					 
					
						
						
							
							Merge pull request  #6645  from bittlingmayer/patch-3  
						
						
						
					 
					
						2021-01-30 01:26:28 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bced6309e5 
							
						 
					 
					
						
						
							
							Add full exceptions with spaces  
						
						
						
					 
					
						2021-01-29 14:27:22 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7886d59c56 
							
						 
					 
					
						
						
							
							Add check for remove_listener method  
						
						
						
					 
					
						2021-01-29 23:47:30 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7694f76dd1 
							
						 
					 
					
						
						
							
							Update warning and mention replace_listeners  
						
						
						
					 
					
						2021-01-29 23:46:01 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							94232aea08 
							
						 
					 
					
						
						
							
							Improve E889  
						
						
						
					 
					
						2021-01-29 23:39:23 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							924396c20c 
							
						 
					 
					
						
						
							
							Merge branch 'feature/replace-listeners' of  https://github.com/explosion/spaCy  into feature/replace-listeners  
						
						
						
					 
					
						2021-01-29 21:43:10 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2102082478 
							
						 
					 
					
						
						
							
							Make Tok2Vec.remove_listener return bool  
						
						... 
						
						
						
						Whether listener was removed 
						
					 
					
						2021-01-29 21:41:38 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e766e8c56d 
							
						 
					 
					
						
						
							
							Apply suggestions from code review  
						
						... 
						
						
						
						Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> 
						
					 
					
						2021-01-29 21:41:17 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bc089b693c 
							
						 
					 
					
						
						
							
							Update tests  
						
						
						
					 
					
						2021-01-29 19:38:09 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							325f47500d 
							
						 
					 
					
						
						
							
							Move replacement logic to Language.from_config  
						
						
						
					 
					
						2021-01-29 19:37:04 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0f3e3eedc2 
							
						 
					 
					
						
						
							
							Add Tok2vec.remove_listener  
						
						
						
					 
					
						2021-01-29 19:36:38 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							99842387cb 
							
						 
					 
					
						
						
							
							Remove default value  
						
						
						
					 
					
						2021-01-29 18:45:37 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							44b5542d14 
							
						 
					 
					
						
						
							
							Change method order  
						
						
						
					 
					
						2021-01-29 18:42:41 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8c15d1daec 
							
						 
					 
					
						
						
							
							Update and validate config first and exit early if paths don't exist  
						
						
						
					 
					
						2021-01-29 18:24:47 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bbb94b37c6 
							
						 
					 
					
						
						
							
							Update error handling and docstring  
						
						
						
					 
					
						2021-01-29 16:27:49 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							01ecfbcc45 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/replace-listeners  
						
						
						
					 
					
						2021-01-29 15:57:32 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							911dfcccfc 
							
						 
					 
					
						
						
							
							Add option to replace listeners for sourced components  
						
						
						
					 
					
						2021-01-29 15:57:04 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fcce3600ed 
							
						 
					 
					
						
						
							
							Forbid OP matching 2+ tokens in DependencyMatcher ( #6824 )  
						
						... 
						
						
						
						Instead of silently using only the first token in each matched span:
* Forbid `OP: ?/*/+` through `DependencyMatcher` validation
* As a fail-safe, add warning if a token match that's not exactly one
token long is found by a token pattern. 
						
					 
					
						2021-01-29 08:52:01 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							24a697abb8 
							
						 
					 
					
						
						
							
							avoid empty aliases and improve UX and docs ( #6840 )  
						
						
						
					 
					
						2021-01-29 08:51:40 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							837a4f53c2 
							
						 
					 
					
						
						
							
							Error handling in nlp.pipe ( #6817 )  
						
						... 
						
						
						
						* add error handler for pipe methods
* add unit tests
* remove pipe method that are the same as their base class
* have Language keep track of a default error handler
* cleanup
* formatting
* small refactor
* add documentation 
						
					 
					
						2021-01-29 08:51:21 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cc18f3f23c 
							
						 
					 
					
						
						
							
							Improve Example error handling for NER data ( #6835 )  
						
						... 
						
						
						
						* Improve Example error handling for NER data
* Fix conditional 
						
					 
					
						2021-01-28 13:11:20 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							78d6ff4dd4 
							
						 
					 
					
						
						
							
							Update quickstart recommendations  
						
						
						
					 
					
						2021-01-28 11:14:49 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ec5f55aa5b 
							
						 
					 
					
						
						
							
							Update config generation defaults and transformers ( #6832 )  
						
						
						
					 
					
						2021-01-27 23:56:33 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4096a79de7 
							
						 
					 
					
						
						
							
							Add alignment mode error and fix Doc.char_span docs ( #6820 )  
						
						... 
						
						
						
						* Raise an error on an unrecognized alignment mode rather than
defaulting to `strict`
* Fix the `Doc.char_span` API doc alignment mode details 
						
					 
					
						2021-01-27 23:40:42 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6b68ad027b 
							
						 
					 
					
						
						
							
							Fix beam NER resizing ( #6834 )  
						
						... 
						
						
						
						* move label check to sub methods
* add tests 
						
					 
					
						2021-01-27 23:39:14 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5ed51c9dd2 
							
						 
					 
					
						
						
							
							Merge pull request  #6828  from explosion/master-tmp  
						
						
						
					 
					
						2021-01-27 23:05:46 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d17afb4826 
							
						 
					 
					
						
						
							
							Add Spanish rule-based lemmatizer ( #6833 )  
						
						... 
						
						
						
						* Initial Spanish lemmatizer
* Handle merged verb+pron(s) multi-word tokens
* Use VERB for AUX rule lookup
* Add morph to lemma cache key
* Fix aux lookups, minor refactoring
* Improve verb+pron handling
* Move verb+pron handling into its own method
* Check for exceptions (primarily for se)
* Collect pronouns in the same (not reversed) order
* Only add modified possible lemmas 
						
					 
					
						2021-01-27 19:21:35 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							615dba9d99 
							
						 
					 
					
						
						
							
							Fix tokenizer exceptions  
						
						
						
					 
					
						2021-01-27 22:11:42 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							abb24fdc0f 
							
						 
					 
					
						
						
							
							Merge pull request  #6827  from explosion/feature/add-labels-implicitly  
						
						
						
					 
					
						2021-01-27 21:34:58 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							80ba9eaf7d 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2021-01-27 21:29:02 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e3f8be9a94 
							
						 
					 
					
						
						
							
							Update language data  
						
						
						
					 
					
						2021-01-27 13:29:22 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							230e651ad6 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into master-tmp  
						
						
						
					 
					
						2021-01-27 13:26:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							05050210f3 
							
						 
					 
					
						
						
							
							Dont add labels implicitly for parser  
						
						
						
					 
					
						2021-01-27 13:04:47 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1d20e21f3e 
							
						 
					 
					
						
						
							
							Add labels implicitly for parser and ner  
						
						
						
					 
					
						2021-01-27 12:54:47 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							68b1c2984d 
							
						 
					 
					
						
						
							
							Test labels are added implicitly  
						
						
						
					 
					
						2021-01-27 12:52:29 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							fabd3a3394 
							
						 
					 
					
						
						
							
							Tidy up code comments [ci skip]  
						
						
						
					 
					
						2021-01-27 12:40:03 +11:00 
						 
				 
			
				
					
						
							
							
								Dhruv Naik 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e7db07a0b9 
							
						 
					 
					
						
						
							
							Fix Span.char_span bug ( #6816 )  
						
						... 
						
						
						
						* Create dhruvrnaik.md
* add test for issue #6815 
* bugfix for issue #6815 
* update dhruvrnaik.md
* add span.vector test for #6815  
						
					 
					
						2021-01-26 15:50:37 +08:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e8674c5c42 
							
						 
					 
					
						
						
							
							Set version to v3.0.0rc5  
						
						
						
					 
					
						2021-01-26 14:55:41 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							71a6350744 
							
						 
					 
					
						
						
							
							Implement overwrite param for all custom lemmatizers ( #6794 )  
						
						
						
					 
					
						2021-01-26 14:53:43 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2263bc7b28 
							
						 
					 
					
						
						
							
							Update develop from master for v3.0.0rc5 ( #6811 )  
						
						... 
						
						
						
						* Fix `spacy.util.minibatch` when the size iterator is finished (#6745 )
* Skip 0-length matches (#6759 )
Add hack to prevent matcher from returning 0-length matches.
* support IS_SENT_START in PhraseMatcher (#6771 )
* support IS_SENT_START in PhraseMatcher
* add unit test and friendlier error
* use IDS.get instead
* ensure span.text works for an empty span (#6772 )
* Remove unicode_literals
Co-authored-by: Santiago Castro <bryant@montevideo.com.uy>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> 
						
					 
					
						2021-01-26 14:52:45 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c0926c9088 
							
						 
					 
					
						
						
							
							WIP: Various small training changes ( #6818 )  
						
						... 
						
						
						
						* Allow output_path to be None during training
* Fix cat scoring (?)
* Improve error message for weighted None score
* Improve messages
So we can call this in other places etc.
* FIx output path check
* Use latest wasabi
* Revert "Improve error message for weighted None score"
This reverts commit 7059926763 
						
					 
					
						2021-01-26 14:51:52 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f049df1715 
							
						 
					 
					
						
						
							
							Revert "Set annotations in update" ( #6810 )  
						
						... 
						
						
						
						* Revert "Set annotations in update (#6767 )"
This reverts commit e680efc7cc 
						
					 
					
						2021-01-25 22:18:45 +08:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							42b117e561 
							
						 
					 
					
						
						
							
							Fix Doc.copy bugs ( #6809 )  
						
						... 
						
						
						
						* Dont let the Doc own LexemeC, to fix Doc.copy
* Copy doc.spans
* Copy doc.spans 
						
					 
					
						2021-01-25 21:40:18 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0f2de39efb 
							
						 
					 
					
						
						
							
							Fix types for exclude args in info CLI ( #6808 )  
						
						
						
					 
					
						2021-01-25 20:00:22 +08:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							2b19ebad59 
							
						 
					 
					
						
						
							
							Remove Kyrgyz chars fr. char_classes since Tatar ones already cover  
						
						
						
					 
					
						2021-01-25 00:46:45 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							87168eb81f 
							
						 
					 
					
						
						
							
							Add tests  
						
						
						
					 
					
						2021-01-24 20:56:16 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							53abf759ad 
							
						 
					 
					
						
						
							
							Fix punctuation  
						
						
						
					 
					
						2021-01-24 20:54:22 +06:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ffc371350a 
							
						 
					 
					
						
						
							
							Avoid assuming encode.get_dim('nO') is set in tok2vec ( #6800 )  
						
						
						
					 
					
						2021-01-24 14:37:33 +11:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							2a2646362b 
							
						 
					 
					
						
						
							
							Fix language subclass  
						
						
						
					 
					
						2021-01-23 22:00:50 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							fe3b5b8ff5 
							
						 
					 
					
						
						
							
							Add kyrgyz to char_classes  
						
						
						
					 
					
						2021-01-23 21:53:41 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							e30bbf5432 
							
						 
					 
					
						
						
							
							Add examples  
						
						
						
					 
					
						2021-01-23 21:49:08 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							2f385385a9 
							
						 
					 
					
						
						
							
							Remove comment  
						
						
						
					 
					
						2021-01-23 21:36:28 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							d53724ba1d 
							
						 
					 
					
						
						
							
							Add lex_attrs  
						
						
						
					 
					
						2021-01-23 21:35:25 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							4418ec2eee 
							
						 
					 
					
						
						
							
							Add punctuation  
						
						
						
					 
					
						2021-01-23 21:31:31 +06:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							101d265778 
							
						 
					 
					
						
						
							
							Add stopwords  
						
						
						
					 
					
						2021-01-23 21:25:28 +06:00 
						 
				 
			
				
					
						
							
							
								KeshavG-lb 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0a86d833d7 
							
						 
					 
					
						
						
							
							Spacy Cli info method causing backward compatibility issues  ( #6793 )  
						
						... 
						
						
						
						* Spacy Cli info method causing backward compatibility issues #6791 
fix backward compatibility by setting default value to exclude in info
method.
* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.
Reference : https://nikos7am.com/posts/mutable-default-arguments/  
						
					 
					
						2021-01-23 11:21:43 +01:00 
						 
				 
			
				
					
						
							
							
								muratjumashev 
							
						 
					 
					
						
						
						
						
							
						
						
							28d06ab860 
							
						 
					 
					
						
						
							
							Add tokenizer_exceptions  
						
						
						
					 
					
						2021-01-22 23:08:41 +06:00 
						 
				 
			
				
					
						
							
							
								Luigi Coniglio 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e83c818a78 
							
						 
					 
					
						
						
							
							DependencyMatcher improvements ( fix   #6678 ) ( #6744 )  
						
						... 
						
						
						
						* Adding contributor agreement for user werew
* [DependencyMatcher] Comment and clean code
* [DependencyMatcher] Use defaultdicts
* [DependencyMatcher] Simplify _retrieve_tree method
* [DependencyMatcher] Remove prepended underscores
* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop
* [DependencyMatcher] Remove _nodes attribute
* [DependencyMatcher] Use enumerate in _retrieve_tree method
* [DependencyMatcher] Clean unused vars and use camel_case naming
* [DependencyMatcher] Memoize node+operator map
* Add root property to Token
* [DependencyMatcher] Groups matches by root
* [DependencyMatcher] Remove unused _keys_to_token attribute
* [DependencyMatcher] Use a list to map tokens to matcher's keys
* [DependencyMatcher] Remove recursion
* [DependencyMatcher] Use a generator to retrieve matches
* [DependencyMatcher] Remove unused memory pool
* [DependencyMatcher] Hide private methods and attributes
* [DependencyMatcher] Improvements to the matches validation
* Apply suggestions from code review
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* [DependencyMatcher] Fix keys_to_position_maps
* Remove Token.root property
* [DependencyMatcher] Remove functools' lru_cache
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> 
						
					 
					
						2021-01-22 11:20:08 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5ace559201 
							
						 
					 
					
						
						
							
							ensure span.text works for an empty span ( #6772 )  
						
						
						
					 
					
						2021-01-21 23:18:46 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d93cd3b7c0 
							
						 
					 
					
						
						
							
							remove artificially duplicated test [ci skip]  
						
						
						
					 
					
						2021-01-21 10:53:16 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fdf8c77630 
							
						 
					 
					
						
						
							
							support IS_SENT_START in PhraseMatcher ( #6771 )  
						
						... 
						
						
						
						* support IS_SENT_START in PhraseMatcher
* add unit test and friendlier error
* use IDS.get instead 
						
					 
					
						2021-01-21 09:59:17 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e680efc7cc 
							
						 
					 
					
						
						
							
							Set annotations in update ( #6767 )  
						
						... 
						
						
						
						* bump to 3.0.0rc4
* do set_annotations in component update calls
* update docs and remove set_annotations flag
* fix EL test 
						
					 
					
						2021-01-20 11:49:25 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							57640aa838 
							
						 
					 
					
						
						
							
							warn when frozen components break listener pattern ( #6766 )  
						
						... 
						
						
						
						* warn when frozen components break listener pattern
* few notes in the documentation
* update arg name
* formatting
* cleanup
* specify listeners return type 
						
					 
					
						2021-01-20 11:12:35 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							88acbfc050 
							
						 
					 
					
						
						
							
							Copy the Example objects (and their predicted Doc) in nlp.evaluate() and nlp.update() ( #6765 )  
						
						... 
						
						
						
						* Make copy of examples in nlp.update and nlp.evaluate
* Avoid circular import
* Fix evaluate 
						
					 
					
						2021-01-19 16:47:44 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bfc212e68f 
							
						 
					 
					
						
						
							
							fix duplicate from merge [ci skip]  
						
						
						
					 
					
						2021-01-19 12:14:35 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc7d83d4be 
							
						 
					 
					
						
						
							
							Skip 0-length matches ( #6759 )  
						
						... 
						
						
						
						Add hack to prevent matcher from returning 0-length matches. 
						
					 
					
						2021-01-19 07:38:11 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c8761b0e6e 
							
						 
					 
					
						
						
							
							rewrite Maxout layer as separate layers to avoid shape inference trouble ( #6760 )  
						
						
						
					 
					
						2021-01-19 07:37:17 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							26c34ab8b0 
							
						 
					 
					
						
						
							
							Fix parser resizing for cupy ( #6758 )  
						
						
						
					 
					
						2021-01-18 20:43:15 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c2a18e4fa3 
							
						 
					 
					
						
						
							
							Update textcat ensemble model  
						
						
						
					 
					
						2021-01-19 02:53:02 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e697609fef 
							
						 
					 
					
						
						
							
							Update docstrings and types [ci skip]  
						
						
						
					 
					
						2021-01-18 22:31:26 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f4d547b73c 
							
						 
					 
					
						
						
							
							Fix error code  
						
						
						
					 
					
						2021-01-18 11:43:45 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1090d3d675 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/spacy-legacy  
						
						
						
					 
					
						2021-01-18 11:43:39 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fed8f48965 
							
						 
					 
					
						
						
							
							raise NotImplementedError when noun_chunks iterator is not implemented ( #6711 )  
						
						... 
						
						
						
						* raise NotImplementedError when noun_chunks iterator is not implemented
* bring back, fix and document span.noun_chunks
* formatting
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> 
						
					 
					
						2021-01-17 19:56:05 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bf0cdae8d4 
							
						 
					 
					
						
						
							
							Add token_splitter component ( #6726 )  
						
						... 
						
						
						
						* Add long_token_splitter component
Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.
The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.
Notes:
* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.
* Adjust API, add test
* Fix name in factory 
						
					 
					
						2021-01-17 19:54:41 +08:00 
						 
				 
			
				
					
						
							
							
								Santiago Castro 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							28256522c8 
							
						 
					 
					
						
						
							
							Fix spacy.util.minibatch when the size iterator is finished ( #6745 )  
						
						
						
					 
					
						2021-01-17 19:48:43 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							185fc62f4d 
							
						 
					 
					
						
						
							
							Remove unused is_base_form for mk lemmatizer ( #6743 )  
						
						... 
						
						
						
						Remove unimplemented/incorrect is_base_form for Macedonian lemmatizer. 
						
					 
					
						2021-01-17 09:41:35 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							43a752a2a0 
							
						 
					 
					
						
						
							
							Fix assertion in default get oracle sequence usage ( #6738 )  
						
						... 
						
						
						
						Remove assertion for default debug value in 
`get_oracle_sequence_from_state`. 
						
					 
					
						2021-01-16 16:07:39 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a552db2819 
							
						 
					 
					
						
						
							
							Include available registry names in error  
						
						
						
					 
					
						2021-01-16 14:35:03 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f0c696b4aa 
							
						 
					 
					
						
						
							
							Fix failed merge of  #6694  patch  
						
						
						
					 
					
						2021-01-16 13:44:11 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d12be459f6 
							
						 
					 
					
						
						
							
							Raise RegistryError  
						
						
						
					 
					
						2021-01-16 12:57:13 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c8b4370865 
							
						 
					 
					
						
						
							
							Add all strings from source models ( #6736 )  
						
						... 
						
						
						
						Add all strings from the source model when adding a pipe from a source
model.
Minor:
* Skip `disable=["vocab", "tokenizer"]` when loading a source model from
the config, since this doesn't do anything and is misleading. 
						
					 
					
						2021-01-16 12:26:15 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9328dd5625 
							
						 
					 
					
						
						
							
							Handle unset token.morph in Morphologizer ( #6704 )  
						
						... 
						
						
						
						* Handle unset token.morph in Morphologizer
Handle unset `token.morph` in `Morphologizer.initialize` and
`Morphologizer.get_loss`. If both `token.morph` and `token.pos` are
unset, treat the annotation as missing rather than empty.
* Add token.has_morph() 
						
					 
					
						2021-01-15 17:20:10 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7b3f0c6f1b 
							
						 
					 
					
						
						
							
							Questionable fix for parser training bug with misaligned sentences ( #6694 )  
						
						... 
						
						
						
						* Questionable fix for parser training bug with misaligned sentences
* Fix
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2021-01-15 14:18:24 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a203e3dbb8 
							
						 
					 
					
						
						
							
							Support spacy-legacy via the registry  
						
						
						
					 
					
						2021-01-15 21:42:40 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f9e4ac1283 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2021-01-15 12:51:02 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b0b743597c 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2021-01-15 11:57:36 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e8a97a2bd6 
							
						 
					 
					
						
						
							
							Merge pull request  #6720  from adrianeboyd/feature/improved-init-training-config-validation  
						
						
						
					 
					
						2021-01-15 11:45:24 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							57369909c0 
							
						 
					 
					
						
						
							
							Merge pull request  #6727  from adrianeboyd/chore/update-develop-from-master-rc3  
						
						
						
					 
					
						2021-01-15 11:44:28 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							681a6195f7 
							
						 
					 
					
						
						
							
							Validate seed and gpu_allocator manually  
						
						
						
					 
					
						2021-01-14 16:57:57 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							0c936004d1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3  
						
						
						
					 
					
						2021-01-14 11:49:58 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							92310a5e26 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/missing-dep  
						
						
						
					 
					
						2021-01-14 17:39:01 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e649242927 
							
						 
					 
					
						
						
							
							Prevent overlapping noun chunks for Spanish ( #6712 )  
						
						... 
						
						
						
						* Prevent overlapping noun chunks in Spanish noun chunk iterator
* Clean up similar code in Danish noun chunk iterator 
						
					 
					
						2021-01-14 17:33:31 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9957ed7897 
							
						 
					 
					
						
						
							
							Override language defaults for null token and URL match ( #6705 )  
						
						... 
						
						
						
						* Override language defaults for null token and URL match
When the serialized `token_match` or `url_match` is `None`, override the
language defaults to preserve `None` on deserialization.
* Fix fixtures in tests 
						
					 
					
						2021-01-14 17:31:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f277bfdf0f 
							
						 
					 
					
						
						
							
							Add SpanGroup and Graph container types to represent arbitrary annotations ( #6696 )  
						
						... 
						
						
						
						* Draft out initial Spans data structure
* Initial span group commit
* Basic span group support on Doc
* Basic test for span group
* Compile span_group.pyx
* Draft addition of SpanGroup to DocBin
* Add deserialization for SpanGroup
* Add tests for serializing SpanGroup
* Fix serialization of SpanGroup
* Add EdgeC and GraphC structs
* Add draft Graph data structure
* Compile graph
* More work on Graph
* Update GraphC
* Upd graph
* Fix walk functions
* Let Graph take nodes and edges on construction
* Fix walking and getting
* Add graph tests
* Fix import
* Add module with the SpanGroups dict thingy
* Update test
* Rename 'span_groups' attribute
* Try to fix c++11 compilation
* Fix test
* Update DocBin
* Try to fix compilation
* Try to fix graph
* Improve SpanGroup docstrings
* Add doc.spans to documentation
* Fix serialization
* Tidy up and add docs
* Update docs [ci skip]
* Add SpanGroup.has_overlap
* WIP updated Graph API
* Start testing new Graph API
* Update Graph tests
* Update Graph
* Add docstring
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2021-01-14 17:30:41 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							54e8e3c208 
							
						 
					 
					
						
						
							
							Update model-related dependencies ( #6725 )  
						
						... 
						
						
						
						* Update pymorphy2 error messages for Russian and Ukrainian
* Add pymorphy2 to pex
* Update spacy-pkuseg version for pex 
						
					 
					
						2021-01-14 17:29:44 +11:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							fec9b81aa2 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into feature/missing-dep  
						
						
						
					 
					
						2021-01-13 17:46:12 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ed53bb979d 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2021-01-13 14:20:05 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							86a4e316b8 
							
						 
					 
					
						
						
							
							fix sent_starts  
						
						
						
					 
					
						2021-01-13 13:47:25 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							31a92b28ae 
							
						 
					 
					
						
						
							
							Merge pull request  #6715  from adrianeboyd/feature/before-after-init-callbacks  
						
						... 
						
						
						
						Add initialize.before_init and after_init callbacks 
						
					 
					
						2021-01-13 12:17:00 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							97d5a7ba99 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2021-01-13 12:03:02 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8d6448ccf7 
							
						 
					 
					
						
						
							
							Add config resolver test  
						
						
						
					 
					
						2021-01-13 12:02:59 +11:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							232e953b14 
							
						 
					 
					
						
						
							
							pytest.approx with absolute eps  
						
						
						
					 
					
						2021-01-12 20:32:57 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							5b598bd1d5 
							
						 
					 
					
						
						
							
							formatting  
						
						
						
					 
					
						2021-01-12 17:28:41 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							a581d82f33 
							
						 
					 
					
						
						
							
							introduce token.has_head and refer to MISSING_DEP_ (WIP)  
						
						
						
					 
					
						2021-01-12 17:17:06 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							5fb8b7037a 
							
						 
					 
					
						
						
							
							Expand initialize/training config validation  
						
						... 
						
						
						
						Validate both `[initialize]` and `[training]` in `debug data` and
`nlp.initialize()` with separate config validation error blocks that
indicate which block of the config is being validated. 
						
					 
					
						2021-01-12 17:17:00 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							a45d89f09a 
							
						 
					 
					
						
						
							
							Add initialize.before_init and after_init callbacks  
						
						... 
						
						
						
						Add `initialize.before_init` and `initialize.after_init` callbacks to
the config. The `initialize.before_init` callback is a place to
implement one-time tokenizer customizations that are then saved with the
model. 
						
					 
					
						2021-01-12 13:07:44 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ad43cbb042 
							
						 
					 
					
						
						
							
							Sync missing and misaligned values in Tagger loss ( #6689 )  
						
						... 
						
						
						
						Use `None` for both missing and misaligned annotation in
`Tagger.get_loss`, reverting to the default missing value in the loss
function. 
						
					 
					
						2021-01-10 11:30:37 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c04bab6bae 
							
						 
					 
					
						
						
							
							Fix train loop to avoid swallowing tracebacks ( #6693 )  
						
						... 
						
						
						
						* Avoid swallowing tracebacks in train loop
* Format
* Handle first 
						
					 
					
						2021-01-09 08:25:47 +08:00 
						 
				 
			
				
					
						
							
							
								Alex Combessie 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9cc880014c 
							
						 
					 
					
						
						
							
							Remove questionable French stopwords ( #6310 )  
						
						... 
						
						
						
						* Remove questionable French stopwords
* Create alexcombessie.md 
						
					 
					
						2021-01-08 11:36:22 +11:00 
						 
				 
			
				
					
						
							
							
								Cristiana S Parada 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7a0222f260 
							
						 
					 
					
						
						
							
							Update stop_words.py in Portuguese (a,o,e) ( #6345 )  
						
						... 
						
						
						
						* Update stop_words.py
Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"
* Create cristianasp.md
* zero edit to push CI
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> 
						
					 
					
						2021-01-08 11:35:38 +11:00 
						 
				 
			
				
					
						
							
							
								Lorena Ciutacu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f11002f1f1 
							
						 
					 
					
						
						
							
							add new Romanian stopwords ( #6621 )  
						
						... 
						
						
						
						* add contributor agreement
* update ro stopwords list
* add new stopwords 
						
					 
					
						2021-01-08 11:34:47 +11:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							dd12c6c8fd 
							
						 
					 
					
						
						
							
							allow missing information in deps and heads annotations  
						
						
						
					 
					
						2021-01-07 19:10:32 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							1abeca90a6 
							
						 
					 
					
						
						
							
							refer to _parser_internals.nonproj.DELIMITER  
						
						
						
					 
					
						2021-01-07 18:58:13 +01:00 
						 
				 
			
				
					
						
							
							
								Yohei Tamura 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							411c842a71 
							
						 
					 
					
						
						
							
							convert tuple to list, because the type mismatches ( #6625 )  
						
						
						
					 
					
						2021-01-07 16:42:12 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							75d9019343 
							
						 
					 
					
						
						
							
							Fix types of Tok2Vec encoding architectures ( #6442 )  
						
						... 
						
						
						
						* fix TorchBiLSTMEncoder documentation
* ensure the types of the encoding Tok2vec layers are correct
* update references from v1 to v2 for the new architectures 
						
					 
					
						2021-01-07 16:39:27 +11:00 
						 
				 
			
				
					
						
							
							
								ophelielacroix 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e3222fdec9 
							
						 
					 
					
						
						
							
							Add (noun chunks) syntax iterators for Danish ( #6246 )  
						
						... 
						
						
						
						* add syntax iterators for danish
* add test noun chunks for danish syntax iterators
* add contributor agreement
* update da syntax iterators to remove nested chunks
* add tests for da noun chunks
* Fix test
* add missing import
* fix example
* Prevent overlapping noun chunks
Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2021-01-07 16:33:00 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8c1a23209f 
							
						 
					 
					
						
						
							
							Getting scores out of beam_parser ( #6684 )  
						
						... 
						
						
						
						* clean up of ner tests
* beam_parser tests
* implement get_beam_parses and scored_parses for the dep parser
* we don't have to add the parse if there are no arcs 
						
					 
					
						2021-01-07 16:28:27 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3983bc6b1e 
							
						 
					 
					
						
						
							
							Fix Transformer width in TextCatEnsemble ( #6431 )  
						
						... 
						
						
						
						* add convenience method to determine tok2vec width in a model
* fix transformer tok2vec dimensions in TextCatEnsemble architecture
* init function should not be nested to avoid pickle issues 
						
					 
					
						2021-01-06 12:44:04 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							402dbc5bae 
							
						 
					 
					
						
						
							
							Getting scores out of beam_ner ( #6575 )  
						
						... 
						
						
						
						* small fixes and formatting
* bring test_issue4313 up-to-date, currently fails
* formatting
* add get_beam_parses method back
* add scored_ents function
* delete tag map 
						
					 
					
						2021-01-06 12:02:32 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6f7e7d88b9 
							
						 
					 
					
						
						
							
							remove cause without apostrophe from norm exceptions ( #6636 )  
						
						
						
					 
					
						2021-01-06 12:30:30 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bf9096437e 
							
						 
					 
					
						
						
							
							Set default lemmas in retokenizer ( #6667 )  
						
						... 
						
						
						
						Instead of unsetting lemmas on retokenized tokens, set the default
lemmas to:
* merge: concatenate any existing lemmas with `SPACY` preserved
* split: use the new `ORTH` values if lemmas were previously set,
  otherwise leave unset 
						
					 
					
						2021-01-06 12:29:44 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0041dfbc7f 
							
						 
					 
					
						
						
							
							Use special matcher for exceptions with spaces ( #6668 )  
						
						... 
						
						
						
						Use the special cases phrase matcher for exceptions that include space
characters so that exceptions including spaces are supported. 
						
					 
					
						2021-01-06 12:05:10 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							afc5714d32 
							
						 
					 
					
						
						
							
							multi-label textcat component ( #6474 )  
						
						... 
						
						
						
						* multi-label textcat component
* formatting
* fix comment
* cleanup
* fix from #6481 
* random edit to push the tests
* add explicit error when textcat is called with multi-label gold data
* fix error nr
* small fix 
						
					 
					
						2021-01-06 13:07:14 +11:00 
						 
				 
			
				
					
						
							
							
								Bruno 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1a77607036 
							
						 
					 
					
						
						
							
							spaCy v3 is not saving the best version in training loop ( #6629 )  
						
						... 
						
						
						
						* Save best only if is the best and also respect the average config
* Create bratao.md
* Update loop.py
* Remove average check
* Keep before_to_disk 
						
					 
					
						2021-01-06 12:51:30 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							29b59086f9 
							
						 
					 
					
						
						
							
							Prevent 0-length mem alloc ( #6653 )  
						
						... 
						
						
						
						* prevent 0-length mem alloc by adding asserts
* fix lexeme mem allocation 
						
					 
					
						2021-01-06 12:50:17 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6f83abb971 
							
						 
					 
					
						
						
							
							Merge pull request  #6647  from svlandeg/feature/init_config_overwrite  
						
						
						
					 
					
						2021-01-05 14:59:04 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							81f018fb67 
							
						 
					 
					
						
						
							
							Merge pull request  #6671  from explosion/chore/tidy-autoformat  
						
						... 
						
						
						
						Tidy up and auto-format 
						
					 
					
						2021-01-05 14:45:31 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							224a3590e9 
							
						 
					 
					
						
						
							
							Merge pull request  #6654  from svlandeg/chore/tests-cleanup  
						
						... 
						
						
						
						Unskipping tests 
						
					 
					
						2021-01-05 13:53:40 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a9e845426f 
							
						 
					 
					
						
						
							
							Use --force for consistency and add docs  
						
						
						
					 
					
						2021-01-05 13:49:59 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c4993f16d0 
							
						 
					 
					
						
						
							
							Merge pull request  #6651  from svlandeg/bugfix/cli_info  
						
						
						
					 
					
						2021-01-05 13:44:26 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							991669c934 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2021-01-05 13:41:53 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b57be94c78 
							
						 
					 
					
						
						
							
							Fix memory issues in Language.evaluate ( #6386 )  
						
						... 
						
						
						
						* Fix memory issues in Language.evaluate
Reset annotation in predicted docs before evaluating and store all data
in `examples`.
* Minor refactor to docs generator init
* Fix generator expression
* Fix final generator check
* Refactor pipeline loop
* Handle examples generator in Language.evaluate
* Add test with generator
* Use make_doc 
						
					 
					
						2020-12-31 10:45:50 +11:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							a6a68da673 
							
						 
					 
					
						
						
							
							unskipping tests with python >= 3.6  
						
						
						
					 
					
						2020-12-30 18:46:43 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							d5ff0fecf8 
							
						 
					 
					
						
						
							
							add docs  
						
						
						
					 
					
						2020-12-30 14:01:13 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							c74ab6a313 
							
						 
					 
					
						
						
							
							fix imports  
						
						
						
					 
					
						2020-12-30 12:40:12 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							712a78b74a 
							
						 
					 
					
						
						
							
							add simple unit test  
						
						
						
					 
					
						2020-12-30 12:35:26 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4347e6d39b 
							
						 
					 
					
						
						
							
							fixes for CLI info command  
						
						
						
					 
					
						2020-12-30 12:05:58 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							62b4fe118f 
							
						 
					 
					
						
						
							
							prevent overwriting existing config file  
						
						
						
					 
					
						2020-12-29 15:40:22 +01:00 
						 
				 
			
				
					
						
							
							
								Adam Bittlingmayer 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f2fe60bacf 
							
						 
					 
					
						
						
							
							Update tokenizer_exceptions.py  
						
						... 
						
						
						
						See https://github.com/explosion/spaCy/pull/6643  
						
					 
					
						2020-12-29 16:05:11 +04:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5ca57d8221 
							
						 
					 
					
						
						
							
							Add logger warning when serializing user hooks ( #6595 )  
						
						... 
						
						
						
						Add a warning that user hooks are lost on serialization.
Add a `user_hooks` exclude to skip the warning with pickle. 
						
					 
					
						2020-12-29 11:54:32 +01:00 
						 
				 
			
				
					
						
							
							
								Yosi 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cf52510631 
							
						 
					 
					
						
						
							
							Add Amharic አማርኛ Language support ( #6583 )  
						
						... 
						
						
						
						* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com> 
						
					 
					
						2020-12-22 16:50:34 +01:00 
						 
				 
			
				
					
						
							
							
								Tim Gates 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							292c1d6a73 
							
						 
					 
					
						
						
							
							docs: fix simple typo, speficied -> specified ( #6611 )  
						
						... 
						
						
						
						There is a small typo in spacy/cli/info.py.
Should read `specified` rather than `speficied`. 
						
					 
					
						2020-12-22 09:14:10 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cabd4ae5b1 
							
						 
					 
					
						
						
							
							Use logger.warning instead of logger.warn ( #6596 )  
						
						... 
						
						
						
						Use `logger.warning` instead of deprecated `logger.warn`. 
						
					 
					
						2020-12-21 08:25:10 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							282a3b49ea 
							
						 
					 
					
						
						
							
							Fix  parser resizing when there is no upper layer ( #6460 )  
						
						... 
						
						
						
						* allow resizing of the parser model even when upper=False
* update from spacy.TransitionBasedParser.v1 to v2
* bugfix 
						
					 
					
						2020-12-18 18:56:57 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0a923a7915 
							
						 
					 
					
						
						
							
							Tagger robustness ( #6580 )  
						
						... 
						
						
						
						* require labels in taggers
* ensure tagger works with incomplete data 
						
					 
					
						2020-12-18 18:51:47 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e10295c9fd 
							
						 
					 
					
						
						
							
							Fix memory leak when adding empty morph ( #6581 )  
						
						... 
						
						
						
						Fix lookup of empty morph in the morphology table, which fixes a memory
leak where a new morphology tag was allocated each time the empty morph
tag was added. 
						
					 
					
						2020-12-18 18:51:01 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e9b0963827 
							
						 
					 
					
						
						
							
							Merge pull request  #6333  from adrianeboyd/chore/python39  
						
						
						
					 
					
						2020-12-17 22:11:57 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							47c1ec678b 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into pr/6333  
						
						
						
					 
					
						2020-12-17 10:19:28 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3f90bffa27 
							
						 
					 
					
						
						
							
							Merge pull request  #6571  from adrianeboyd/bugfix/debug-data-missing-vectors  
						
						... 
						
						
						
						Fix alignment and vector checks in debug data 
						
					 
					
						2020-12-17 10:10:47 +11:00 
						 
				 
			
				
					
						
							
							
								Thomas Bird 
							
						 
					 
					
						
						
						
						
							
						
						
							cbb8c66da3 
							
						 
					 
					
						
						
							
							prevent the root logger from inialising  
						
						
						
					 
					
						2020-12-15 19:50:34 +00:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1ddf2f39c7 
							
						 
					 
					
						
						
							
							Switch converters to generator functions ( #6547 )  
						
						... 
						
						
						
						* Switch converters to generator functions
To reduce the memory usage when converting large corpora, refactor the
convert methods to be generator functions.
* Update tests 
						
					 
					
						2020-12-15 16:47:16 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							20e18cc246 
							
						 
					 
					
						
						
							
							Fix alignment and vector checks in debug data  
						
						... 
						
						
						
						* Update token alignment check to use Example alignment
* Update missing vector check further related to changes in v3 
						
					 
					
						2020-12-15 09:43:14 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8656a08777 
							
						 
					 
					
						
						
							
							Add beam_parser and beam_ner components for v3 ( #6369 )  
						
						... 
						
						
						
						* Get basic beam tests working
* Get basic beam tests working
* Compile _beam_utils
* Remove prints
* Test beam density
* Beam parser seems to train
* Draft beam NER
* Upd beam
* Add hypothesis as dev dependency
* Implement missing is-gold-parse method
* Implement early update
* Fix state hashing
* Fix test
* Fix test
* Default to non-beam in parser constructor
* Improve oracle for beam
* Start refactoring beam
* Update test
* Refactor beam
* Update nn
* Refactor beam and weight by cost
* Update ner beam settings
* Update test
* Add __init__.pxd
* Upd test
* Fix test
* Upd test
* Fix test
* Remove ring buffer history from StateC
* WIP change arc-eager transitions
* Add state tests
* Support ternary sent start values
* Fix arc eager
* Fix NER
* Pass oracle cut size for beam
* Fix ner test
* Fix beam
* Improve StateC.clone
* Improve StateClass.borrow
* Work directly with StateC, not StateClass
* Remove print statements
* Fix state copy
* Improve state class
* Refactor parser oracles
* Fix arc eager oracle
* Fix arc eager oracle
* Use a vector to implement the stack
* Refactor state data structure
* Fix alignment of sent start
* Add get_aligned_sent_starts method
* Add test for ae oracle when bad sentence starts
* Fix sentence segment handling
* Avoid Reduce that inserts illegal sentence
* Update preset SBD test
* Fix test
* Remove prints
* Fix sent starts in Example
* Improve python API of StateClass
* Tweak comments and debug output of arc eager
* Upd test
* Fix state test
* Fix state test 
						
					 
					
						2020-12-13 09:08:32 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							513c4e332a 
							
						 
					 
					
						
						
							
							Include custom code via spacy package command ( #6531 )  
						
						
						
					 
					
						2020-12-10 20:36:46 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							7b277661f6 
							
						 
					 
					
						
						
							
							Set version to v2.3.5  
						
						
						
					 
					
						2020-12-10 13:32:10 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2a6043fabb 
							
						 
					 
					
						
						
							
							Merge pull request  #6530  from explosion/feature/init-config-cpu-gpu  
						
						
						
					 
					
						2020-12-10 09:38:46 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9d32e839d3 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/init-config-cpu-gpu  
						
						
						
					 
					
						2020-12-10 08:50:53 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							6ee6e41234 
							
						 
					 
					
						
						
							
							Update docstring for Language.evaluate  
						
						
						
					 
					
						2020-12-09 10:21:39 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							fa8fa474a3 
							
						 
					 
					
						
						
							
							Add nlp.batch_size setting  
						
						... 
						
						
						
						Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`. 
						
					 
					
						2020-12-09 09:13:26 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f2571b5ec4 
							
						 
					 
					
						
						
							
							Merge pull request  #6444  from adrianeboyd/chore/update-develop-from-master  
						
						
						
					 
					
						2020-12-09 13:09:58 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							90171f2031 
							
						 
					 
					
						
						
							
							Merge pull request  #6528  from svlandeg/feature/pipe_fill_config  
						
						
						
					 
					
						2020-12-09 12:01:22 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dfaef27f90 
							
						 
					 
					
						
						
							
							Merge pull request  #6503  from adrianeboyd/feature/lemmatizer-rule-warning-pos  
						
						... 
						
						
						
						Warn on empty POS for the rule-based lemmatizer 
						
					 
					
						2020-12-09 11:34:16 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							271923eaea 
							
						 
					 
					
						
						
							
							Fix retokenizer  
						
						
						
					 
					
						2020-12-09 11:29:55 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b85bd63eca 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-12-09 11:24:01 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							febf71af28 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-12-09 11:23:07 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1da1568110 
							
						 
					 
					
						
						
							
							Remove tag map  
						
						
						
					 
					
						2020-12-09 11:13:49 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1980203229 
							
						 
					 
					
						
						
							
							Merge branch 'master' into pr/6444  
						
						
						
					 
					
						2020-12-09 11:09:40 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							05a2812ae0 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into pr/6444  
						
						
						
					 
					
						2020-12-09 11:04:03 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							758ad6c3cd 
							
						 
					 
					
						
						
							
							Make CPU the default for init config  
						
						
						
					 
					
						2020-12-09 11:00:51 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5d605d539d 
							
						 
					 
					
						
						
							
							Remove output_file from init_config helper  
						
						
						
					 
					
						2020-12-09 10:57:55 +11:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cfc72c2995 
							
						 
					 
					
						
						
							
							Bugfix multi-label textcat reproducibility ( #6481 )  
						
						... 
						
						
						
						* add test for multi-label textcat reproducibility
* remove positive_label
* fix lengths dtype
* fix comments
* remove comment that we should not have forgotten :-) 
						
					 
					
						2020-12-09 06:29:15 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							de108ed3e8 
							
						 
					 
					
						
						
							
							Add specific error when StaticVectors can't read the vectors data ( #6450 )  
						
						
						
					 
					
						2020-12-09 06:16:07 +08:00 
						 
				 
			
				
					
						
							
							
								Koichi Yasuoka 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0afb54ac93 
							
						 
					 
					
						
						
							
							JapaneseTokenizer.pipe added ( #6515 )  
						
						... 
						
						
						
						* JapaneseTokenizer.pipe added
For [spacymoji](https://spacy.io/universe/project/spacymoji )  with `Japanese()`.
* DummyTokenizer.pipe added instead 
						
					 
					
						2020-12-08 20:02:23 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							8f8a7f1733 
							
						 
					 
					
						
						
							
							returning config in init_config  
						
						
						
					 
					
						2020-12-08 17:37:20 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8921364579 
							
						 
					 
					
						
						
							
							Merge pull request  #6521  from explosion/feature/config-stdin  
						
						... 
						
						
						
						Allow reading config from stdin in spacy train 
						
					 
					
						2020-12-08 22:07:43 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6c7a930ee8 
							
						 
					 
					
						
						
							
							Fix variable  
						
						
						
					 
					
						2020-12-08 20:44:59 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							94a5a9814f 
							
						 
					 
					
						
						
							
							Update argument handling and documentation  
						
						
						
					 
					
						2020-12-08 20:41:18 +11:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							6c221d4841 
							
						 
					 
					
						
						
							
							Fix subsequent pipe detection in EntityRuler  
						
						... 
						
						
						
						Fix subsequent pipe detection to detect the position of the current
object by comparing the component itself rather than from the factory
name. 
						
					 
					
						2020-12-08 10:01:30 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							5ceac425ee 
							
						 
					 
					
						
						
							
							Remove non-working --use-chars from train CLI  
						
						... 
						
						
						
						Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it. 
						
					 
					
						2020-12-08 08:30:00 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d25b1606d6 
							
						 
					 
					
						
						
							
							Allow reading config from sdtin in spacy train  
						
						
						
					 
					
						2020-12-08 18:01:40 +11:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6cfa66ed1c 
							
						 
					 
					
						
						
							
							Make training.loop return nlp object and path ( #6520 )  
						
						
						
					 
					
						2020-12-08 14:55:55 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2c27093c5f 
							
						 
					 
					
						
						
							
							require_cpu functionality ( #6336 )  
						
						... 
						
						
						
						* add require_cpu from Thinc 8.0.0rc2
* add docs
* fix test if cupy is not installed 
						
					 
					
						2020-12-08 14:42:40 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f98a04434a 
							
						 
					 
					
						
						
							
							pretrain architectures ( #6451 )  
						
						... 
						
						
						
						* define new architectures for the pretraining objective
* add loss function as attr of the omdel
* cleanup
* cleanup
* shorten name
* fix typo
* remove unused error 
						
					 
					
						2020-12-08 14:41:03 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							29b058ebdc 
							
						 
					 
					
						
						
							
							Fix spacy when retokenizing cases with affixes ( #6475 )  
						
						... 
						
						
						
						Preserve `token.spacy` corresponding to the span end token in the
original doc rather than adjusting for the current offset.
* If not modifying in place, this checks in the original document
(`doc.c` rather than `tokens`).
* If modifying in place, the document has not been modified past the
current span start position so the value at the current span end
position is valid. 
						
					 
					
						2020-12-08 14:25:56 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4448680750 
							
						 
					 
					
						
						
							
							Fix alignment for 1-to-1 tokens and lowercasing ( #6476 )  
						
						... 
						
						
						
						* When checking for token alignments, check not only that the tokens are
identical but that the character positions are both at the start of a
token.
  It's possible for the tokens to be identical even though the two
tokens aren't aligned one-to-one in a case like `["a'", "''"]` vs.
`["a", "''", "'"]`, where the middle tokens are identical but should not
be aligned on the token level at character position 2 since it's the
start of one token but the middle of another.
* Use the lowercased version of the token texts to create the
character-to-token alignment because lowercasing can change the string
length (e.g., for `İ`, see the not-a-bug bug report:
https://bugs.python.org/issue34723 ) 
						
					 
					
						2020-12-08 14:25:16 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e931d3f72b 
							
						 
					 
					
						
						
							
							Move max_length to nlp.make_doc() ( #6512 )  
						
						... 
						
						
						
						Move max_length check to `nlp.make_doc()` so that's it's also checked
for `nlp.pipe()`. 
						
					 
					
						2020-12-08 14:24:02 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ee2ec52f48 
							
						 
					 
					
						
						
							
							Merge pull request  #6409  from svlandeg/feature/trf-docs  
						
						
						
					 
					
						2020-12-08 06:32:10 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							82e88f0e3b 
							
						 
					 
					
						
						
							
							Merge pull request  #6379  from svlandeg/fix/labels-constructor  
						
						
						
					 
					
						2020-12-08 06:29:56 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							d70950605c 
							
						 
					 
					
						
						
							
							Warn on empty POS for the rule-based lemmatizer  
						
						... 
						
						
						
						Add a warning to the rule-based lemmatizer for any tokens without POS
annotation. 
						
					 
					
						2020-12-04 11:46:15 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							78085fab1f 
							
						 
					 
					
						
						
							
							Check for spacy-nightly package in download ( #6502 )  
						
						... 
						
						
						
						Also check for spacy-nightly in download so that `--no-deps` isn't set
for normal nightly installs. 
						
					 
					
						2020-12-04 09:40:03 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							63f83e7034 
							
						 
					 
					
						
						
							
							Merge pull request  #6470  from adrianeboyd/feature/license-in-package  
						
						
						
					 
					
						2020-12-04 03:55:54 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d6c616a125 
							
						 
					 
					
						
						
							
							Fixes in test suite ( #6457 )  
						
						... 
						
						
						
						* fix slow test for textcat readers
* cleanup test_issue5551
* add explicit score weight
* cleanup 
						
					 
					
						2020-12-02 12:57:08 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							31ec9a906e 
							
						 
					 
					
						
						
							
							Clean up 3rd party license info ( #6478 )  
						
						... 
						
						
						
						Move scikit-learn license from `Scorer` to
`licenses/3rd_party_licenses.txt`. 
						
					 
					
						2020-12-02 10:15:23 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							591cd48aa8 
							
						 
					 
					
						
						
							
							Remove config.cfg from MANIFEST  
						
						
						
					 
					
						2020-12-01 12:58:02 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							b0dd13e0ba 
							
						 
					 
					
						
						
							
							Support LICENSE in spacy package  
						
						... 
						
						
						
						If present, include the file `input_dir/LICENSE` at the top level of the
packaged model. 
						
					 
					
						2020-11-30 13:43:58 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							53c0fb7431 
							
						 
					 
					
						
						
							
							Only set NORM on Token in retokenizer ( #6464 )  
						
						... 
						
						
						
						* Only set NORM on Token in retokenizer
Instead of setting `NORM` on both the token and lexeme, set `NORM` only
on the token.
The retokenizer tries to set all possible attributes with
`Token/Lexeme.set_struct_attr` so that it doesn't have to enumerate
which attributes are available for each. `NORM` is the only attribute
that's stored on both and for most cases it doesn't make sense to set
the global norms based on a individual retokenization. For lexeme-only
attributes like `IS_STOP` there's no way to avoid the global side
effects, but I think that `NORM` would be better only on the token.
* Fix test 
						
					 
					
						2020-11-30 09:35:42 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							03ae77e603 
							
						 
					 
					
						
						
							
							Add SPACY as a Matcher attribute ( #6463 )  
						
						
						
					 
					
						2020-11-30 09:34:50 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							079f6ea474 
							
						 
					 
					
						
						
							
							avoid resolving the full config ( #6465 )  
						
						
						
					 
					
						2020-11-30 09:34:29 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9beba7164f 
							
						 
					 
					
						
						
							
							Make jinja2 top-level import  
						
						... 
						
						
						
						No problem anymore since it's now an official dependency 
						
					 
					
						2020-11-27 15:17:14 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							26296ab223 
							
						 
					 
					
						
						
							
							Add error message if DocBin zlib decompress fails ( #6394 )  
						
						... 
						
						
						
						Add a better error message if DocBin zlib decompress fails, indicating
that the data is not in `DocBin` format. 
						
					 
					
						2020-11-27 14:39:49 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							3a5cc5f8b4 
							
						 
					 
					
						
						
							
							Set version to v2.3.4  
						
						
						
					 
					
						2020-11-26 08:48:52 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e0f5646a4a 
							
						 
					 
					
						
						
							
							Restore cleanup_beam method ( #6446 )  
						
						
						
					 
					
						2020-11-25 13:21:48 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							cf693f0eae 
							
						 
					 
					
						
						
							
							Fix token_match in tokenizer  
						
						
						
					 
					
						2020-11-25 11:49:34 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							724831b066 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master  
						
						... 
						
						
						
						* Update Macedonian for v3
* Update Turkish for v3 
						
					 
					
						2020-11-25 11:49:34 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							573f5c863f 
							
						 
					 
					
						
						
							
							Fix tag map clobbering in spacy train ( #6437 )  
						
						... 
						
						
						
						Fix bug from #5768  where the tag map is clobbered if a custom tag map
isn't provided. 
						
					 
					
						2020-11-24 13:13:16 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							ce18fc6588 
							
						 
					 
					
						
						
							
							Set version to v2.3.3  
						
						
						
					 
					
						2020-11-24 10:03:45 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							cd61d264ef 
							
						 
					 
					
						
						
							
							Set version to v2.3.3.dev0  
						
						
						
					 
					
						2020-11-23 13:51:59 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2af31a8c8d 
							
						 
					 
					
						
						
							
							Bugfix textcat reproducibility on GPU ( #6411 )  
						
						... 
						
						
						
						* add seed argument to ParametricAttention layer
* bump thinc to 7.4.3
* set thinc version range
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-11-23 12:29:35 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3f61f5eb54 
							
						 
					 
					
						
						
							
							Use int8_t instead of char in Matcher ( #6413 )  
						
						... 
						
						
						
						* Use signed char instead of char in Matcher
Remove unused char* utf8_t typedef
* Use int8_t instead of signed char 
						
					 
					
						2020-11-23 10:26:47 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4284605683 
							
						 
					 
					
						
						
							
							Remove Beam cleanup ( #6414 )  
						
						... 
						
						
						
						Beam cleanup is handled through the Beam finalization method. 
						
					 
					
						2020-11-23 10:01:46 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a8c2dad466 
							
						 
					 
					
						
						
							
							Add all vectors to vocab before pruning ( #6408 )  
						
						... 
						
						
						
						Add all vectors to the vocab before pruning to correct the selection of
vectors to prioritize. 
						
					 
					
						2020-11-23 10:00:59 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							636be3c791 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into feature/trf-docs  
						
						
						
					 
					
						2020-11-19 14:15:35 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							73fc1ed963 
							
						 
					 
					
						
						
							
							remove labels from morphologizer constructor  
						
						
						
					 
					
						2020-11-11 21:48:50 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							d5a920325f 
							
						 
					 
					
						
						
							
							remove labels from constructor  
						
						
						
					 
					
						2020-11-11 21:34:12 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							320a8b1481 
							
						 
					 
					
						
						
							
							Add ent_id_ to strings serialized with Doc ( #6353 )  
						
						
						
					 
					
						2020-11-10 20:16:07 +08:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a7e7d6c6c9 
							
						 
					 
					
						
						
							
							Ignore misaligned in Morphologizer.get_loss ( #6363 )  
						
						... 
						
						
						
						Fix bug where `Morphologizer.get_loss` treated misaligned annotation as
`EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH`
mappings. 
						
					 
					
						2020-11-10 20:15:09 +08:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a0c899a0ff 
							
						 
					 
					
						
						
							
							Fix textcat + transformer architecture ( #6371 )  
						
						... 
						
						
						
						* add pooling to textcat TransformerListener
* maybe_get_dim in case it's null 
						
					 
					
						2020-11-10 20:14:47 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							de6453940e 
							
						 
					 
					
						
						
							
							Merge pull request  #6305  from svlandeg/feature/score-docs [ci skip]  
						
						
						
					 
					
						2020-11-10 02:52:11 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d7950c5ada 
							
						 
					 
					
						
						
							
							Merge pull request  #6297  from adrianeboyd/docs/nightly-conda-install [ci skip]  
						
						
						
					 
					
						2020-11-10 02:45:52 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							789fb3d124 
							
						 
					 
					
						
						
							
							add docs for upstream argument of TransformerListener  
						
						
						
					 
					
						2020-11-09 21:42:58 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							363ac73c72 
							
						 
					 
					
						
						
							
							Update docs [ci skip]  
						
						
						
					 
					
						2020-11-09 12:43:26 +08:00 
						 
				 
			
				
					
						
							
							
								Daniel Vasic 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							20d72de986 
							
						 
					 
					
						
						
							
							Added Multext-East V5 tagset for Croatian language ( #6248 )  
						
						... 
						
						
						
						* Added Multext-East V5 tagset for Croatian language
* Create danielvasic.md
* Update danielvasic.md
* Update danielvasic.md
* Add tag map to CroatianDefaults
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-11-05 12:19:22 +01:00 
						 
				 
			
				
					
						
							
							
								Robert Šípek 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6069efe57d 
							
						 
					 
					
						
						
							
							Add tag map to cs language ( #6284 )  
						
						
						
					 
					
						2020-11-05 10:13:11 +01:00 
						 
				 
			
				
					
						
							
							
								Vu Ha 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6d465ec52c 
							
						 
					 
					
						
						
							
							add oprd to the list of accepted deps for noun chunking ( #6302 )  
						
						... 
						
						
						
						* add oprd to the list of accepted deps for noun chunking
* add SCA 
						
					 
					
						2020-11-05 09:17:35 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							31de700b0f 
							
						 
					 
					
						
						
							
							Fix on_match callback and remove empty patterns ( #6312 )  
						
						... 
						
						
						
						For the `DependencyMatcher`:
* Fix on_match callback so that it is called once per matched pattern
* Fix results so that patterns with empty match lists are not returned 
						
					 
					
						2020-11-05 09:16:26 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8ef056cf98 
							
						 
					 
					
						
						
							
							fix embed_size in Entity Linker architecture ( #6343 )  
						
						
						
					 
					
						2020-11-04 22:20:13 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							084fc575aa 
							
						 
					 
					
						
						
							
							Set version to v3.0.0rc3  
						
						
						
					 
					
						2020-11-03 17:29:57 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1c4df8fd09 
							
						 
					 
					
						
						
							
							Replace pytokenizations with internal alignment ( #6293 )  
						
						... 
						
						
						
						* Replace pytokenizations with internal alignment
Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.
* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`
* Refactor trailing whitespace handling
* Remove unnecessary exception for empty docs
Allow a non-empty whitespace-only doc to be aligned with an empty doc
* Remove empty docs exceptions completely 
						
					 
					
						2020-11-03 16:24:38 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a4b32b9552 
							
						 
					 
					
						
						
							
							Handle missing reference values in scorer ( #6286 )  
						
						... 
						
						
						
						* Handle missing reference values in scorer
Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.
Attributes without unset states:
* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation
Additional changes:
* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`
* Fix import
* Update return types 
						
					 
					
						2020-11-03 15:47:18 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5d2cb86c34 
							
						 
					 
					
						
						
							
							Fix on_match callback for DependencyMatcher ( #6313 )  
						
						... 
						
						
						
						Fix `DependencyMatcher` so that the callback is called only once per
match. 
						
					 
					
						2020-10-31 12:20:27 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							45c9a68828 
							
						 
					 
					
						
						
							
							Identify final Matcher pattern node by quantifier ( #6317 )  
						
						... 
						
						
						
						Modify the internal pattern representation in `Matcher` patterns to
identify the final ID state using a unique quantifier rather than a
combination of other attributes.
It was insufficient to identify the final ID node based on an
uninitialized `quantifier` (coincidentally being the same as the `ZERO`)
with `nr_attr` as 0. (In addition, it was potentially bug-prone that
`nr_attr` was set to 0 even though attrs were allocated.)
In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr`
is 0 and the quantifier is ZERO, so the previous methods for
incrementing to the ID node at the end of the pattern weren't able to
distinguish the final ID node from the `{"OP": "!"}` pattern. 
						
					 
					
						2020-10-31 12:18:48 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2918923541 
							
						 
					 
					
						
						
							
							fix resolving of dot notation ( #6326 )  
						
						
						
					 
					
						2020-10-31 12:17:06 +01:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0e55f806dd 
							
						 
					 
					
						
						
							
							Turkish tokenization improvements ( #6268 )  
						
						... 
						
						
						
						* added single and paired orth variants
* added token match
* added long text tokenization test
* inverted init
* normalized lemmas to lowercase
* more abbrevs
* tests for ordinals and abbrevs
* separated period abbvrevs to another list
* fiex typo
* added ordinal and abbrev tests
* added number tests for dates
* minor refinement
* added inflected abbrevs regex
* added percentage and inflection
* cosmetics
* added token match
* added url inflection tests
* excluded url tokens from custom pattern
* removed url match import 
						
					 
					
						2020-10-29 09:43:17 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							080066ae74 
							
						 
					 
					
						
						
							
							remove TODO note  
						
						
						
					 
					
						2020-10-26 10:37:25 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2c9804038d 
							
						 
					 
					
						
						
							
							Fix success message [ci skip]  
						
						
						
					 
					
						2020-10-23 16:11:54 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							4299a7f654 
							
						 
					 
					
						
						
							
							Setup / install / quickstart updates  
						
						... 
						
						
						
						* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages 
						
					 
					
						2020-10-23 11:27:54 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							563a21834e 
							
						 
					 
					
						
						
							
							Save raw scores in evaluate output  
						
						
						
					 
					
						2020-10-19 15:49:09 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							dd207ca6d0 
							
						 
					 
					
						
						
							
							Add dep_las_per_type and more generic PRF printer  
						
						
						
					 
					
						2020-10-19 15:49:02 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							4300858ecb 
							
						 
					 
					
						
						
							
							Include per-type/feat scores in evaluate output  
						
						
						
					 
					
						2020-10-19 15:48:55 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							75a202ce65 
							
						 
					 
					
						
						
							
							TextCat updates and fixes ( #6263 )  
						
						... 
						
						
						
						* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos 
						
					 
					
						2020-10-18 14:50:41 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5a6ed01ce0 
							
						 
					 
					
						
						
							
							Merge pull request  #6262  from adrianeboyd/bugfix/template-en-vectors  
						
						
						
					 
					
						2020-10-16 15:38:08 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							c8d04b79e2 
							
						 
					 
					
						
						
							
							Sort and add vectors for langs without transformers  
						
						
						
					 
					
						2020-10-16 08:25:16 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							2fbd43c603 
							
						 
					 
					
						
						
							
							Use core lg models as vectors models in quickstart  
						
						
						
					 
					
						2020-10-16 08:17:53 +02:00 
						 
				 
			
				
					
						
							
							
								Jan Margeta 
							
						 
					 
					
						
						
						
						
							
						
						
							1ad2213349 
							
						 
					 
					
						
						
							
							Fix TokenPatternSchema pattern field validation  
						
						... 
						
						
						
						Empty pattern field should be considered invalid
This is fixed by replacing minItems with min_items
as described in Pydantic docs:
https://pydantic-docs.helpmanual.io/usage/schema/  
						
					 
					
						2020-10-16 00:41:21 +02:00 
						 
				 
			
				
					
						
							
							
								Borijan Georgievski 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2311192ba1 
							
						 
					 
					
						
						
							
							Include Macedonian language ( #6230 )  
						
						... 
						
						
						
						* Include Macedonian language
* Fix indentation at char_classes.py
* Fix indentation at char_classes.py
* Add Macedonian tests, update lex_attrs and char_classes
* Import unicode literals for python 2 
						
					 
					
						2020-10-15 15:55:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ff4267d181 
							
						 
					 
					
						
						
							
							Fix success message [ci skip]  
						
						
						
					 
					
						2020-10-15 14:42:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							10611bf56a 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-15 13:30:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4e17ddf75e 
							
						 
					 
					
						
						
							
							Merge pull request  #6256  from adrianeboyd/bugfix/docs-to-json-raw  
						
						
						
					 
					
						2020-10-15 10:35:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b1d568a4df 
							
						 
					 
					
						
						
							
							Tidy up tests  
						
						
						
					 
					
						2020-10-15 10:20:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d165af26be 
							
						 
					 
					
						
						
							
							Auto-format [ci skip]  
						
						
						
					 
					
						2020-10-15 10:08:53 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							a93d42861d 
							
						 
					 
					
						
						
							
							Use null raw for has_unknown_spaces in docs_to_json  
						
						
						
					 
					
						2020-10-15 09:57:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5665a21517 
							
						 
					 
					
						
						
							
							Tidy up  
						
						
						
					 
					
						2020-10-15 09:30:32 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5d62499266 
							
						 
					 
					
						
						
							
							Fix tests  
						
						
						
					 
					
						2020-10-15 09:29:15 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							178760855f 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into master-tmp  
						
						
						
					 
					
						2020-10-15 09:06:03 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc85b12e6d 
							
						 
					 
					
						
						
							
							Merge pull request  #6249  from svlandeg/feature/batch-tests  
						
						
						
					 
					
						2020-10-15 08:57:56 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							0796401c19 
							
						 
					 
					
						
						
							
							call NumpyOps instead of get_current_ops()  
						
						
						
					 
					
						2020-10-14 16:55:00 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							44e14ccae8 
							
						 
					 
					
						
						
							
							one more losses fix  
						
						
						
					 
					
						2020-10-14 15:11:34 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							0aa8851878 
							
						 
					 
					
						
						
							
							always return losses  
						
						
						
					 
					
						2020-10-14 15:00:49 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							e94a21638e 
							
						 
					 
					
						
						
							
							adding tests for trained models to ensure predict reproducibility  
						
						
						
					 
					
						2020-10-13 21:07:13 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ede979d42f 
							
						 
					 
					
						
						
							
							formattting  
						
						
						
					 
					
						2020-10-13 18:53:17 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ff83bfae3f 
							
						 
					 
					
						
						
							
							naming  
						
						
						
					 
					
						2020-10-13 18:52:37 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6ccacff54e 
							
						 
					 
					
						
						
							
							add tests for individual spacy layers  
						
						
						
					 
					
						2020-10-13 18:50:07 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							c23041ae60 
							
						 
					 
					
						
						
							
							component tests single or multiple prediction  
						
						
						
					 
					
						2020-10-13 16:26:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1f49300862 
							
						 
					 
					
						
						
							
							Update transformer recommendations [ci skip]  
						
						
						
					 
					
						2020-10-13 15:41:17 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f8a1c1afd6 
							
						 
					 
					
						
						
							
							avoid dropout at runtime ( #6247 )  
						
						
						
					 
					
						2020-10-13 14:39:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							86d648740f 
							
						 
					 
					
						
						
							
							Fix morph representation in Doc.to_json  
						
						
						
					 
					
						2020-10-13 11:39:03 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7f92a5ee6a 
							
						 
					 
					
						
						
							
							Update spacy/lang/ta/examples.py  
						
						
						
					 
					
						2020-10-13 11:03:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a0e12c136b 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-13 10:00:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f090f39f17 
							
						 
					 
					
						
						
							
							Merge pull request  #6245  from svlandeg/bugfix/else  
						
						... 
						
						
						
						bugfix in _pipe 
						
					 
					
						2020-10-13 09:59:06 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							1f465bea18 
							
						 
					 
					
						
						
							
							if-else  
						
						
						
					 
					
						2020-10-13 09:27:19 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							40276fd3be 
							
						 
					 
					
						
						
							
							update NEL docs after latest refactor  
						
						
						
					 
					
						2020-10-12 11:41:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4fa967ea84 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-11 13:10:58 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ab890a35f9 
							
						 
					 
					
						
						
							
							Make console logger table more compact  
						
						
						
					 
					
						2020-10-11 12:55:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							99606e46fe 
							
						 
					 
					
						
						
							
							Relax meta.json schema [ci skip]  
						
						
						
					 
					
						2020-10-11 12:30:57 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3a505e7e14 
							
						 
					 
					
						
						
							
							small edit to ensure the new word was indeed new  
						
						
						
					 
					
						2020-10-10 21:05:28 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							68d79796c6 
							
						 
					 
					
						
						
							
							add test for vocab after serializing KB  
						
						
						
					 
					
						2020-10-10 20:59:48 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							539b0c10da 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2020-10-10 19:14:48 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bfa3931c9d 
							
						 
					 
					
						
						
							
							Revert added_strings change ( #6236 )  
						
						
						
					 
					
						2020-10-10 18:55:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							796f8b9424 
							
						 
					 
					
						
						
							
							Increment version  
						
						
						
					 
					
						2020-10-09 18:00:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							525f798841 
							
						 
					 
					
						
						
							
							Fix typo in test  
						
						
						
					 
					
						2020-10-09 18:00:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8ac5f22253 
							
						 
					 
					
						
						
							
							Adjust error message  
						
						
						
					 
					
						2020-10-09 18:00:16 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							08cb085f6c 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into fix/various  
						
						
						
					 
					
						2020-10-09 17:01:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b7cb9d95e4 
							
						 
					 
					
						
						
							
							Merge pull request  #6229  from svlandeg/bugfix/disabled  
						
						
						
					 
					
						2020-10-09 16:05:11 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							e972ecba72 
							
						 
					 
					
						
						
							
							add utf8 encoding for opening file  
						
						
						
					 
					
						2020-10-09 16:03:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9fb3244672 
							
						 
					 
					
						
						
							
							Merge pull request  #6231  from adrianeboyd/feature/include-static-vectors  
						
						
						
					 
					
						2020-10-09 15:54:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							040c7c0541 
							
						 
					 
					
						
						
							
							fix get_dim calls in build_simple_cnn_text_classifier  
						
						
						
					 
					
						2020-10-09 15:40:58 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							727370c633 
							
						 
					 
					
						
						
							
							Remove Span._recalculate_indices  
						
						... 
						
						
						
						Remove `Span._recalculate_indices`, which is a remnant from the
deprecated `Span.merge`. 
						
					 
					
						2020-10-09 14:42:51 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							853edace37 
							
						 
					 
					
						
						
							
							fix MultiHashEmbed example in documentation  
						
						
						
					 
					
						2020-10-09 14:11:06 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							06b9d213fd 
							
						 
					 
					
						
						
							
							formatting  
						
						
						
					 
					
						2020-10-09 12:19:47 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							2cafba5f50 
							
						 
					 
					
						
						
							
							shorten error message for clarity  
						
						
						
					 
					
						2020-10-09 12:17:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4771a10503 
							
						 
					 
					
						
						
							
							Make test more explicit [ci skip]  
						
						
						
					 
					
						2020-10-09 12:15:26 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							cc3646b06c 
							
						 
					 
					
						
						
							
							Add xfailing test for peculiar spans failure [ci skip]  
						
						
						
					 
					
						2020-10-09 12:10:25 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							8316bc7d4a 
							
						 
					 
					
						
						
							
							bugfix DisabledPipes  
						
						
						
					 
					
						2020-10-09 12:06:20 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							18dfb27985 
							
						 
					 
					
						
						
							
							Add custom error when evaluation throws a KeyError  
						
						
						
					 
					
						2020-10-09 12:05:33 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							39aabf50ab 
							
						 
					 
					
						
						
							
							Also rename to include_static_vectors in CharEmbed  
						
						
						
					 
					
						2020-10-09 11:54:48 +02:00 
						 
				 
			
				
					
						
							
							
								Florijan Stamenković 
							
						 
					 
					
						
						
						
						
							
						
						
							18f5c309dc 
							
						 
					 
					
						
						
							
							Fix Issue 6207 ( #6208 )  
						
						... 
						
						
						
						* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-10-09 10:14:40 +02:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
						
						
							
						
						
							80fb1bffc9 
							
						 
					 
					
						
						
							
							Ordinal numbers for Turkish ( #6142 )  
						
						... 
						
						
						
						* minor ordinal number addition
* fixed typo
* added corresponding lexical test 
						
					 
					
						2020-10-09 10:13:15 +02:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
						
						
							
						
						
							2fad279a44 
							
						 
					 
					
						
						
							
							Turkish language syntax iterators ( #6191 )  
						
						... 
						
						
						
						* added tr_vocab to config
* basic test
* added syntax iterator to Turkish lang class
* first version for Turkish syntax iter, without flat
* added simple tests with nmod, amod, det
* more tests to amod and nmod
* separated noun chunks and parser test
* rearrangement after nchunk parser separation
* added recursive NPs
* tests with complicated recursive NPs
* tests with conjed NPs
* additional tests for conj NP
* small modification for shaving off conj from NP
* added tests with flat
* more tests with flat
* added examples with flats conjed
* added inner func for flat trick
* corrected parse
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-10-09 10:10:22 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d093d6343b 
							
						 
					 
					
						
						
							
							TrainablePipe ( #6213 )  
						
						... 
						
						
						
						* rename Pipe to TrainablePipe
* split functionality between Pipe and TrainablePipe
* remove unnecessary methods from certain components
* cleanup
* hasattr(component, "pipe") should be sufficient again
* remove serialization and vocab/cfg from Pipe
* unify _ensure_examples and validate_examples
* small fixes
* hasattr checks for self.cfg and self.vocab
* make is_resizable and is_trainable properties
* serialize strings.json instead of vocab
* fix KB IO + tests
* fix typos
* more typos
* _added_strings as a set
* few more tests specifically for _added_strings field
* bump to 3.0.0a36 
						
					 
					
						2020-10-08 21:33:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8ff73f04db 
							
						 
					 
					
						
						
							
							Fix morph in Doc.to_json  
						
						
						
					 
					
						2020-10-08 14:44:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							064575d79d 
							
						 
					 
					
						
						
							
							Merge pull request  #6216  from svlandeg/feature/nel-initialize  
						
						
						
					 
					
						2020-10-08 11:14:12 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3e2e1fd323 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2020-10-08 10:37:32 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf5c265cb 
							
						 
					 
					
						
						
							
							set_kb method for entity_linker  
						
						
						
					 
					
						2020-10-08 10:34:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							010956d493 
							
						 
					 
					
						
						
							
							Clear rule-based components on initialize  
						
						
						
					 
					
						2020-10-08 09:51:31 +02:00 
						 
				 
			
				
					
						
							
							
								Baranitharan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d6037c1860 
							
						 
					 
					
						
						
							
							added sentence  
						
						
						
					 
					
						2020-10-08 08:22:58 +05:30 
						 
				 
			
				
					
						
							
							
								Baranitharan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							81afe9b19d 
							
						 
					 
					
						
						
							
							Update examples.py  
						
						
						
					 
					
						2020-10-08 08:17:25 +05:30 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							241cd112f5 
							
						 
					 
					
						
						
							
							add reenabled pipe names back to the meta before serializing ( #6219 )  
						
						
						
					 
					
						2020-10-08 00:44:16 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2998131416 
							
						 
					 
					
						
						
							
							Reproducibility for TextCat and Tok2Vec ( #6218 )  
						
						... 
						
						
						
						* ensure fixed seed in HashEmbed layers
* forgot about the joys of python 2 
						
					 
					
						2020-10-08 00:43:46 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							efedccea8d 
							
						 
					 
					
						
						
							
							fix tests  
						
						
						
					 
					
						2020-10-07 15:29:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6b8bdb2d39 
							
						 
					 
					
						
						
							
							add init_config to nlp.create_pipe  
						
						
						
					 
					
						2020-10-07 14:58:16 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							33c2d4af16 
							
						 
					 
					
						
						
							
							move kb_loader to initialize for NEL instead of constructor  
						
						
						
					 
					
						2020-10-07 14:56:00 +02:00 
						 
				 
			
				
					
						
							
							
								Wannaphong Phatthiyaphaibun 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9fc8392b38 
							
						 
					 
					
						
						
							
							Add Thai tag map (LST20 Corpus) ( #6163 )  
						
						... 
						
						
						
						* Add Thai tag map (LST20 Corpus)
By @korakot
* Update tag_map.py
* Update tag_map.py
* Update tag_map.py 
						
					 
					
						2020-10-07 11:12:01 +02:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7e821c2776 
							
						 
					 
					
						
						
							
							Turkish language syntax iterators ( #6191 )  
						
						... 
						
						
						
						* added tr_vocab to config
* basic test
* added syntax iterator to Turkish lang class
* first version for Turkish syntax iter, without flat
* added simple tests with nmod, amod, det
* more tests to amod and nmod
* separated noun chunks and parser test
* rearrangement after nchunk parser separation
* added recursive NPs
* tests with complicated recursive NPs
* tests with conjed NPs
* additional tests for conj NP
* small modification for shaving off conj from NP
* added tests with flat
* more tests with flat
* added examples with flats conjed
* added inner func for flat trick
* corrected parse
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-10-07 11:07:52 +02:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2ce6fc2611 
							
						 
					 
					
						
						
							
							Turkish tag map and morph rules addition ( #6141 )  
						
						... 
						
						
						
						* feat: added turkish tag map
* feat: morph rules cconj and sconj
* feat: more conjuncts
* feat: added popular postpositions
* feat: added adverbs
* feat: added personal pronouns
* feat: added reflexive pronouns
* minor: corrected case capital
* minor: fixed comma typo
* feat: added indef pronouns
* feat: added dict iter
* fixed comma typo
* updated language class with tag map and morph
* use default tag map instead
* removed tag map 
						
					 
					
						2020-10-07 10:27:36 +02:00 
						 
				 
			
				
					
						
							
							
								Duygu Altinok 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b95a11dd95 
							
						 
					 
					
						
						
							
							Ordinal numbers for Turkish ( #6142 )  
						
						... 
						
						
						
						* minor ordinal number addition
* fixed typo
* added corresponding lexical test 
						
					 
					
						2020-10-07 10:25:37 +02:00 
						 
				 
			
				
					
						
							
							
								Rahul Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1a00bff06d 
							
						 
					 
					
						
						
							
							Hindi: Adds tests for lexical attributes (norm and like_num) ( #5829 )  
						
						... 
						
						
						
						* Hindi: Adds tests for lexical attributes (norm and like_num)
* Signs and sdds the contributor agreement
* Add ordinal numbers to be tagged as like_num
* Adds alternate pronunciation for 31 and 39 
						
					 
					
						2020-10-07 10:23:32 +02:00 
						 
				 
			
				
					
						
							
							
								Nuccy90 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c809b2c8e7 
							
						 
					 
					
						
						
							
							Update morph_rules.py ( #6102 )  
						
						... 
						
						
						
						* Update morph_rules.py
Added "dig" and "dej" ("you" in accusative form)
* Create Nuccy90.md
* Update Nuccy90.md 
						
					 
					
						2020-10-06 15:14:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1a500f9717 
							
						 
					 
					
						
						
							
							Set version to v3.0.0a35  
						
						
						
					 
					
						2020-10-06 14:19:07 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fff3f8ccfa 
							
						 
					 
					
						
						
							
							Fix packaging pin ( #6212 )  
						
						... 
						
						
						
						* pin packaging to >=20.0
* ignore spacy-pkuseg in requirements unit test 
						
					 
					
						2020-10-06 14:16:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cfb9770a94 
							
						 
					 
					
						
						
							
							Fix empty input into StaticVectors layer ( #6211 )  
						
						... 
						
						
						
						* Add test for empty doc(s)
* Fix empty check in staticvectors
* Remove xfail
* Update spacy/ml/staticvectors.py 
						
					 
					
						2020-10-06 14:15:41 +02:00 
						 
				 
			
				
					
						
							
							
								Florijan Stamenković 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9db670b996 
							
						 
					 
					
						
						
							
							Fix Issue 6207 ( #6208 )  
						
						... 
						
						
						
						* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-10-06 11:17:37 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							568e12215d 
							
						 
					 
					
						
						
							
							Merge pull request  #6206  from svlandeg/fix/patterns-init  
						
						
						
					 
					
						2020-10-06 10:27:23 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9b4cf7b0b6 
							
						 
					 
					
						
						
							
							update output of debug config command  
						
						
						
					 
					
						2020-10-06 09:47:23 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ff9ac39c88 
							
						 
					 
					
						
						
							
							read entity_ruler patterns with srsly.read_jsonl.v1  
						
						
						
					 
					
						2020-10-05 22:50:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							126268ce50 
							
						 
					 
					
						
						
							
							Auto-format [ci skip]  
						
						
						
					 
					
						2020-10-05 21:58:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1a554bdcb1 
							
						 
					 
					
						
						
							
							Update docs and docstring [ci skip]  
						
						
						
					 
					
						2020-10-05 21:55:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9614e53b02 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2020-10-05 21:55:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							181039bd17 
							
						 
					 
					
						
						
							
							Merge pull request  #6205  from explosion/feature/embed-features  
						
						
						
					 
					
						2020-10-05 21:49:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5ba418b08c 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-10-05 21:44:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							568617af58 
							
						 
					 
					
						
						
							
							Merge pull request  #6202  from explosion/feature/project-spacy-version  
						
						
						
					 
					
						2020-10-05 21:40:52 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2d0c0134bc 
							
						 
					 
					
						
						
							
							Adjust message [ci skip]  
						
						
						
					 
					
						2020-10-05 21:38:23 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6abfc2911d 
							
						 
					 
					
						
						
							
							Merge pull request  #6203  from adrianeboyd/feature/zh-spacy-pkuseg  
						
						
						
					 
					
						2020-10-05 21:35:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b7e01d2024 
							
						 
					 
					
						
						
							
							Fix quickstart  
						
						
						
					 
					
						2020-10-05 21:21:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ff8b980775 
							
						 
					 
					
						
						
							
							Upd quickstart template  
						
						
						
					 
					
						2020-10-05 21:19:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							91d0fbb588 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-10-05 21:13:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9ca283a899 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/project-spacy-version  
						
						
						
					 
					
						2020-10-05 21:06:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0135f6ed95 
							
						 
					 
					
						
						
							
							Enable commit check via env var  
						
						
						
					 
					
						2020-10-05 20:51:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b392d48e76 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-10-05 20:17:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							be99f1e4de 
							
						 
					 
					
						
						
							
							Remove output dirs before training ( #6204 )  
						
						... 
						
						
						
						* Remove output dirs before training
* Re-raise error if cleaning fails 
						
					 
					
						2020-10-05 20:11:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e50047f1c5 
							
						 
					 
					
						
						
							
							Check lengths match  
						
						
						
					 
					
						2020-10-05 20:02:45 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							582701519e 
							
						 
					 
					
						
						
							
							Remove __release__ flag  
						
						
						
					 
					
						2020-10-05 20:00:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d58fb42707 
							
						 
					 
					
						
						
							
							Add spacy_version option and validation for project.yml  
						
						
						
					 
					
						2020-10-05 20:00:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							db84d175c3 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-10-05 19:59:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cdd2b79b6d 
							
						 
					 
					
						
						
							
							Remove deprecated MultiHashEmbed  
						
						
						
					 
					
						2020-10-05 19:58:18 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6dcc4a0ba6 
							
						 
					 
					
						
						
							
							Simplify MultiHashEmbed signature  
						
						
						
					 
					
						2020-10-05 19:57:45 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							193e0d5a98 
							
						 
					 
					
						
						
							
							add docs for entity_ruler.initialize  
						
						
						
					 
					
						2020-10-05 18:04:08 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3ac3447eee 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2020-10-05 17:50:37 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9eb813a35d 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into fix/patterns-init  
						
						
						
					 
					
						2020-10-05 17:49:44 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							f102ef6b54 
							
						 
					 
					
						
						
							
							Read features.msgpack instead of features.pkl  
						
						
						
					 
					
						2020-10-05 17:47:39 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4e3ace4b8c 
							
						 
					 
					
						
						
							
							is_trainable method  
						
						
						
					 
					
						2020-10-05 17:43:42 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							84fedcebab 
							
						 
					 
					
						
						
							
							Make args keyword-only [ci skip]  
						
						... 
						
						
						
						Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> 
						
					 
					
						2020-10-05 17:07:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							71e73ed0a6 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/embed-features  
						
						
						
					 
					
						2020-10-05 17:00:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3ee3649b52 
							
						 
					 
					
						
						
							
							Fix augment  
						
						
						
					 
					
						2020-10-05 16:59:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							22937d25a9 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/embed-features  
						
						
						
					 
					
						2020-10-05 16:42:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8deed614e9 
							
						 
					 
					
						
						
							
							Fix augment  
						
						
						
					 
					
						2020-10-05 16:41:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ed3e037df 
							
						 
					 
					
						
						
							
							Fix augment  
						
						
						
					 
					
						2020-10-05 16:40:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9f1bc3f24c 
							
						 
					 
					
						
						
							
							Fix augment  
						
						
						
					 
					
						2020-10-05 16:40:23 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							dc06912c76 
							
						 
					 
					
						
						
							
							prevent loss keyerror for non-trainable components  
						
						
						
					 
					
						2020-10-05 16:33:28 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							187234648c 
							
						 
					 
					
						
						
							
							Revert back to "default" as default for pkuseg_user_dict  
						
						
						
					 
					
						2020-10-05 16:24:28 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							65abd77779 
							
						 
					 
					
						
						
							
							add finish_update to Pipe  
						
						
						
					 
					
						2020-10-05 16:23:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							90040aacec 
							
						 
					 
					
						
						
							
							Fix merge  
						
						
						
					 
					
						2020-10-05 16:12:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							93a98e8c3e 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/embed-features  
						
						
						
					 
					
						2020-10-05 15:51:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eb9ba61517 
							
						 
					 
					
						
						
							
							Format  
						
						
						
					 
					
						2020-10-05 15:29:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7d93575f35 
							
						 
					 
					
						
						
							
							spacy/tests/  
						
						
						
					 
					
						2020-10-05 15:28:12 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f4ca9a39cb 
							
						 
					 
					
						
						
							
							spacy/tests/  
						
						
						
					 
					
						2020-10-05 15:27:06 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f2f1deca66 
							
						 
					 
					
						
						
							
							spacy/tests/  
						
						
						
					 
					
						2020-10-05 15:24:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ec79ad3fa 
							
						 
					 
					
						
						
							
							Allow configuration of MultiHashEmbed features  
						
						... 
						
						
						
						Update arguments to MultiHashEmbed layer so that the attributes can be
controlled. A kind of tricky scheme is used to allow optional
specification of the rows. I think it's an okay balance between
flexibility and convenience. 
						
					 
					
						2020-10-05 15:22:00 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7946fd84bb 
							
						 
					 
					
						
						
							
							Merge pull request  #6200  from adrianeboyd/bugfix/vocab-disk-lookups-vectors  
						
						... 
						
						
						
						Always serialize lookups and vectors to disk 
						
					 
					
						2020-10-05 15:15:25 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8171e28b20 
							
						 
					 
					
						
						
							
							Remove logging [ci skip]  
						
						... 
						
						
						
						This would be fired on each example, which is wrong 
						
					 
					
						2020-10-05 15:09:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							251b3eb4e5 
							
						 
					 
					
						
						
							
							add initialize method for entity_ruler  
						
						
						
					 
					
						2020-10-05 14:59:13 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f4f49f5877 
							
						 
					 
					
						
						
							
							update blis ( #6198 )  
						
						... 
						
						
						
						* allow higher blis version
* fix typo
* bump to 3.0.0a34
* fix pins in other files 
						
					 
					
						2020-10-05 14:58:56 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							5d19dfc9d3 
							
						 
					 
					
						
						
							
							Update Chinese tokenizer for spacy-pkuseg fork  
						
						
						
					 
					
						2020-10-05 14:21:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6a9d14e35a 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-10-05 14:17:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d2b9aafb8c 
							
						 
					 
					
						
						
							
							Fix augmenter  
						
						
						
					 
					
						2020-10-05 14:14:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6260fa3c10 
							
						 
					 
					
						
						
							
							Merge pull request  #6201  from svlandeg/fix/error_nr  
						
						
						
					 
					
						2020-10-05 14:00:57 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6958510bda 
							
						 
					 
					
						
						
							
							Include spaCy version check in project CLI  
						
						
						
					 
					
						2020-10-05 13:53:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							20f2a17a09 
							
						 
					 
					
						
						
							
							Merge test_misc and test_util  
						
						
						
					 
					
						2020-10-05 13:45:57 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							fd2d48556c 
							
						 
					 
					
						
						
							
							fix E902 and E903 numbering  
						
						
						
					 
					
						2020-10-05 13:43:32 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1c641e41c3 
							
						 
					 
					
						
						
							
							Remove unused import [ci skip]  
						
						
						
					 
					
						2020-10-05 11:50:11 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							03cfb2d2f4 
							
						 
					 
					
						
						
							
							Always serialize lookups and vectors to disk  
						
						
						
					 
					
						2020-10-05 09:40:20 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							b0b93854cb 
							
						 
					 
					
						
						
							
							Update ru/uk lemmatizers for new nlp.initialize  
						
						
						
					 
					
						2020-10-05 09:27:16 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							549758f67d 
							
						 
					 
					
						
						
							
							Adjust test for now  
						
						
						
					 
					
						2020-10-04 23:16:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4b15ff7504 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-04 22:47:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f1d1f78636 
							
						 
					 
					
						
						
							
							Make warning debug log [ci skip]  
						
						
						
					 
					
						2020-10-04 22:44:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3c36a57e84 
							
						 
					 
					
						
						
							
							Update data augmenters ( #6196 )  
						
						... 
						
						
						
						* Draft lower-case augmenter
* Make warning a debug log
* Update lowercase augmenter, docs and tests
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> 
						
					 
					
						2020-10-04 17:46:29 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d38dc466c5 
							
						 
					 
					
						
						
							
							Adjust error [ci skip]  
						
						
						
					 
					
						2020-10-04 15:26:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							496228771d 
							
						 
					 
					
						
						
							
							Merge pull request  #6194  from explosion/master-tmp  
						
						
						
					 
					
						2020-10-04 15:25:41 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0307a228c8 
							
						 
					 
					
						
						
							
							Merge pull request  #6193  from explosion/fix/adjust-pipe-init  
						
						... 
						
						
						
						Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe 
						
					 
					
						2020-10-04 15:20:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							59deeb7da6 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into master-tmp  
						
						
						
					 
					
						2020-10-04 14:52:20 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							43d7652635 
							
						 
					 
					
						
						
							
							Merge pull request  #6192  from explosion/feature/init-attr-ruler  
						
						
						
					 
					
						2020-10-04 14:46:37 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8f018e47f8 
							
						 
					 
					
						
						
							
							Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe  
						
						
						
					 
					
						2020-10-04 14:43:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							84ae197dd6 
							
						 
					 
					
						
						
							
							Fix logger  
						
						
						
					 
					
						2020-10-04 14:16:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							11347f34da 
							
						 
					 
					
						
						
							
							Tidy up, tests and docs  
						
						
						
					 
					
						2020-10-04 13:54:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							96b636c2d3 
							
						 
					 
					
						
						
							
							Update attribute ruler  
						
						
						
					 
					
						2020-10-04 13:08:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bcd52e5486 
							
						 
					 
					
						
						
							
							Tidy up errors and warnings  
						
						
						
					 
					
						2020-10-04 11:16:31 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ff914f4e6f 
							
						 
					 
					
						
						
							
							Lazy-load xx  
						
						
						
					 
					
						2020-10-04 11:10:26 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d3b3663942 
							
						 
					 
					
						
						
							
							Adjust error message and add test  
						
						
						
					 
					
						2020-10-04 10:11:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2110e8f86d 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2020-10-04 10:06:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cc08c88a89 
							
						 
					 
					
						
						
							
							Merge pull request  #6187  from svlandeg/fix/begin_training_pipe  
						
						
						
					 
					
						2020-10-04 10:01:02 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3f657ed3a1 
							
						 
					 
					
						
						
							
							implement warning in __init_subclass__ instead  
						
						
						
					 
					
						2020-10-03 22:34:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3b2a78720c 
							
						 
					 
					
						
						
							
							Upd morphologizer  
						
						
						
					 
					
						2020-10-03 19:35:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							835070cedc 
							
						 
					 
					
						
						
							
							Upd test  
						
						
						
					 
					
						2020-10-03 19:35:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							70b9de8e58 
							
						 
					 
					
						
						
							
							Set version to v3.0.0a32  
						
						
						
					 
					
						2020-10-03 19:26:52 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							85ede32680 
							
						 
					 
					
						
						
							
							Format  
						
						
						
					 
					
						2020-10-03 19:26:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b305f2ff5a 
							
						 
					 
					
						
						
							
							Fix loggers  
						
						
						
					 
					
						2020-10-03 19:26:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4fccd2ceaf 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-10-03 19:13:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ea8b7d940 
							
						 
					 
					
						
						
							
							Support loading labels in morphologizer  
						
						
						
					 
					
						2020-10-03 19:13:42 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							c2401fca41 
							
						 
					 
					
						
						
							
							Add tests for Pipe.label_data  
						
						
						
					 
					
						2020-10-03 19:12:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							80603f0fa5 
							
						 
					 
					
						
						
							
							Make SentenceRecognizer.label_data return None  
						
						... 
						
						
						
						Overwrite the method from the base class (Tagger) but don't export anything in "init labels" 
						
					 
					
						2020-10-03 18:54:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d6c967401f 
							
						 
					 
					
						
						
							
							Increment version  
						
						
						
					 
					
						2020-10-03 17:20:47 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							3bc3c05fcc 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2020-10-03 17:20:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7c4ab7e82c 
							
						 
					 
					
						
						
							
							Fix Lemmatizer.get_lookups_config  
						
						
						
					 
					
						2020-10-03 17:16:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd542ec6a4 
							
						 
					 
					
						
						
							
							Fix label initialization of textcat component ( #6190 )  
						
						
						
					 
					
						2020-10-03 17:07:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							989a96308f 
							
						 
					 
					
						
						
							
							Tidy up, auto-format, types  
						
						
						
					 
					
						2020-10-03 16:31:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7b127f307e 
							
						 
					 
					
						
						
							
							Set version to v3.0.0a30  
						
						
						
					 
					
						2020-10-03 16:06:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							db419f6b2f 
							
						 
					 
					
						
						
							
							Improve control of training progress and logging ( #6184 )  
						
						... 
						
						
						
						* Make logging and progress easier to control
* Update docs
* Cleanup errors
* Fix ConfigValidationError
* Pass stdout/stderr, not wasabi.Printer
* Fix type
* Upd logging example
* Fix logger example
* Fix type 
						
					 
					
						2020-10-03 14:57:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ae15c9de79 
							
						 
					 
					
						
						
							
							Raise error from caught KeyError to preserve traceback  
						
						
						
					 
					
						2020-10-03 11:43:56 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f758804401 
							
						 
					 
					
						
						
							
							Save one line of code  
						
						
						
					 
					
						2020-10-03 11:41:28 +02:00 
						 
				 
			
				
					
						
							
							
								Stanislav Schmidt 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3589a64d44 
							
						 
					 
					
						
						
							
							Change type of texts argument in pipe to iterable ( #6186 )  
						
						... 
						
						
						
						* Change type of texts argument in pipe to iterable
* Add contributor agreement 
						
					 
					
						2020-10-02 21:00:11 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							02247cccaf 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into feature/small-fixes  
						
						
						
					 
					
						2020-10-02 20:48:11 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							fb48de349c 
							
						 
					 
					
						
						
							
							bwd compat for pipe.begin_training  
						
						
						
					 
					
						2020-10-02 20:31:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6965cdf16d 
							
						 
					 
					
						
						
							
							Fix comment  
						
						
						
					 
					
						2020-10-02 17:26:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3cf10a0729 
							
						 
					 
					
						
						
							
							Merge pull request  #6183  from adrianeboyd/feature/quickstart-morphologizer  
						
						... 
						
						
						
						Add morphologizer to quickstart template 
						
					 
					
						2020-10-02 17:08:01 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							62ccd5c4df 
							
						 
					 
					
						
						
							
							Relax model meta performance schema ( #6185 )  
						
						... 
						
						
						
						Allow more embedded per_x in `ModelMetaSchema` 
						
					 
					
						2020-10-02 16:37:21 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							09dcb75076 
							
						 
					 
					
						
						
							
							small UX fix for DocBin ( #6167 )  
						
						... 
						
						
						
						* add informative warning when messing up store_user_data DocBin flags
* add informative warning when messing up store_user_data DocBin flags
* cleanup test
* rename to patterns_path 
						
					 
					
						2020-10-02 15:43:32 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f0b30aedad 
							
						 
					 
					
						
						
							
							Make lemmatizers use initialize logic ( #6182 )  
						
						... 
						
						
						
						* Make lemmatizer use initialize logic and tidy up
* Fix typo
* Raise for uninitialized tables 
						
					 
					
						2020-10-02 15:42:36 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							22158dc24a 
							
						 
					 
					
						
						
							
							Add morphologizer to quickstart template  
						
						
						
					 
					
						2020-10-02 15:06:16 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d2aa662ab2 
							
						 
					 
					
						
						
							
							Merge pull request  #6179  from adrianeboyd/feature/token-morph-refactor-2 [ci skip]  
						
						
						
					 
					
						2020-10-02 12:10:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							c41a4332e4 
							
						 
					 
					
						
						
							
							Add test for custom data augmentation  
						
						
						
					 
					
						2020-10-02 11:37:56 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							acc391c2a8 
							
						 
					 
					
						
						
							
							remove redundant str() call  
						
						
						
					 
					
						2020-10-02 11:05:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3856048437 
							
						 
					 
					
						
						
							
							Merge pull request  #6178  from explosion/feature/file-readers  
						
						... 
						
						
						
						Integrate file readers via srsly, update orth_variants loading 
						
					 
					
						2020-10-02 10:26:09 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							f83dfe62da 
							
						 
					 
					
						
						
							
							Fix test  
						
						
						
					 
					
						2020-10-02 10:17:26 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							65dfaa4f4b 
							
						 
					 
					
						
						
							
							Also accept MorphAnalysis in set_morph  
						
						
						
					 
					
						2020-10-02 08:33:43 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							77e08c398f 
							
						 
					 
					
						
						
							
							Switch reset value for set_morph to None  
						
						
						
					 
					
						2020-10-02 08:25:15 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							568768643e 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-02 01:50:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							01c1538c72 
							
						 
					 
					
						
						
							
							Integrate file readers  
						
						
						
					 
					
						2020-10-02 01:36:06 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							af282ae732 
							
						 
					 
					
						
						
							
							Fix import  
						
						
						
					 
					
						2020-10-02 01:12:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e59ecb12c0 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2020-10-02 01:12:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							75a1569908 
							
						 
					 
					
						
						
							
							Merge  
						
						
						
					 
					
						2020-10-01 23:07:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							300e5a9928 
							
						 
					 
					
						
						
							
							Avoid relying on NORM in default v3 models ( #6176 )  
						
						... 
						
						
						
						* Allow CharacterEmbed to specify feature
* Default to LOWER in character embed
* Update tok2vec
* Use LOWER, not NORM 
						
					 
					
						2020-10-01 23:05:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5762876dcc 
							
						 
					 
					
						
						
							
							Update default config [ci skip]  
						
						
						
					 
					
						2020-10-01 22:27:37 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							86c3ec9c2b 
							
						 
					 
					
						
						
							
							Refactor Token morph setting ( #6175 )  
						
						... 
						
						
						
						* Refactor Token morph setting
* Remove `Token.morph_`
* Add `Token.set_morph()`
  * `0` resets `token.c.morph` to unset
  * Any other values are passed to `Morphology.add`
* Add token.morph setter to set from MorphAnalysis 
						
					 
					
						2020-10-01 22:21:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b854bca15c 
							
						 
					 
					
						
						
							
							Default to LOWER in character embed  
						
						
						
					 
					
						2020-10-01 22:17:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							684a77870b 
							
						 
					 
					
						
						
							
							Allow CharacterEmbed to specify feature  
						
						
						
					 
					
						2020-10-01 22:17:26 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							da30701cd1 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-01 21:58:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d48ddd6c9a 
							
						 
					 
					
						
						
							
							Remove default initialize lookups  
						
						
						
					 
					
						2020-10-01 21:54:33 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1700c8541e 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-10-01 17:57:16 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f2627157c8 
							
						 
					 
					
						
						
							
							Update docs [ci skip]  
						
						
						
					 
					
						2020-10-01 17:38:17 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7f68f4bd92 
							
						 
					 
					
						
						
							
							Hide jsonl_loc on init vectors and tidy up [ci skip]  
						
						
						
					 
					
						2020-10-01 16:44:17 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							27cbffff1b 
							
						 
					 
					
						
						
							
							Minor edit to CoNLL-U converter ( #6172 )  
						
						... 
						
						
						
						This doesn't make a difference given how the `merged_morph` values
override the `morph` values for all the final docs, but could have led
to unexpected bugs in the future if the converter is modified. 
						
					 
					
						2020-10-01 16:23:42 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a22215f427 
							
						 
					 
					
						
						
							
							Add FeatureExtractor from Thinc ( #6170 )  
						
						... 
						
						
						
						* move featureextractor from Thinc
* Update website/docs/api/architectures.md
Co-authored-by: Ines Montani <ines@ines.io>
* Update website/docs/api/architectures.md
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2020-10-01 16:22:48 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							73538782a0 
							
						 
					 
					
						
						
							
							Switch Doc.__init__(ents=) to IOB tags ( #6173 )  
						
						... 
						
						
						
						* Switch Doc.__init__(ents=) to IOB tags
* Fix check for "-"
* Allow "" or None as missing IOB tag 
						
					 
					
						2020-10-01 16:22:18 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							df98d3ef9f 
							
						 
					 
					
						
						
							
							Update import from collections.abc ( #6174 )  
						
						
						
					 
					
						2020-10-01 16:21:49 +02:00 
						 
				 
			
				
					
						
							
							
								Yohei Tamura 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3243ddac8f 
							
						 
					 
					
						
						
							
							Fix/span.sent ( #6083 )  
						
						... 
						
						
						
						* add fail test
* fix test
* fix span.sent
* Remove incorrect implicit check
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-10-01 14:01:52 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0a8a124a6e 
							
						 
					 
					
						
						
							
							Update docs [ci skip]  
						
						
						
					 
					
						2020-10-01 12:15:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							44160cd52f 
							
						 
					 
					
						
						
							
							Tidy up [ci skip]  
						
						
						
					 
					
						2020-10-01 10:41:19 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							381258b75b 
							
						 
					 
					
						
						
							
							Merge pull request  #6165  from explosion/feature/update-tokenizers-initialize  
						
						
						
					 
					
						2020-10-01 09:49:47 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6787e56315 
							
						 
					 
					
						
						
							
							print debugging warning before raising error if model not properly initialized  
						
						
						
					 
					
						2020-10-01 09:21:00 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							5121972930 
							
						 
					 
					
						
						
							
							add types of Tok2Vec embedding layers  
						
						
						
					 
					
						2020-10-01 09:20:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4b6afd3611 
							
						 
					 
					
						
						
							
							Remove English [initialize] default block for now to get tests to pass  
						
						
						
					 
					
						2020-09-30 23:49:29 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6f29f68f69 
							
						 
					 
					
						
						
							
							Update errors and make Tokenizer.initialize args less strict  
						
						
						
					 
					
						2020-09-30 23:48:47 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a103ab5f1a 
							
						 
					 
					
						
						
							
							Update augmenter lookups and docs  
						
						
						
					 
					
						2020-09-30 23:03:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5128298964 
							
						 
					 
					
						
						
							
							Add missing augmenter  
						
						
						
					 
					
						2020-09-30 20:18:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							59294e91aa 
							
						 
					 
					
						
						
							
							Restore the 'jsonl' arg for init vectors  
						
						... 
						
						
						
						The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option. 
						
					 
					
						2020-09-30 19:06:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c379a4274a 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-09-30 16:52:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e58dca3028 
							
						 
					 
					
						
						
							
							Add read_labels  
						
						
						
					 
					
						2020-09-30 16:52:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							23c63eefaf 
							
						 
					 
					
						
						
							
							Tidy up env vars [ci skip]  
						
						
						
					 
					
						2020-09-30 15:15:11 +02:00 
						 
				 
			
				
					
						
							
							
								Elijah Rippeth 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4cbb954281 
							
						 
					 
					
						
						
							
							reorder so tagmap is replaced only if a custom file is provided. ( #6164 )  
						
						... 
						
						
						
						* reorder so tagmap is replaced only if a custom file is provided.
* Remove unneeded variable initialization
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> 
						
					 
					
						2020-09-30 13:26:06 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							6b7bb32834 
							
						 
					 
					
						
						
							
							Refactor Chinese initialization  
						
						
						
					 
					
						2020-09-30 11:46:45 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							34f9c26c62 
							
						 
					 
					
						
						
							
							Add lexeme norm defaults  
						
						
						
					 
					
						2020-09-30 10:20:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a5debb356d 
							
						 
					 
					
						
						
							
							Tidy up and adjust logging [ci skip]  
						
						
						
					 
					
						2020-09-30 01:22:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							56a2f778c4 
							
						 
					 
					
						
						
							
							Add logging [ci skip]  
						
						
						
					 
					
						2020-09-30 01:08:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fe3f111c37 
							
						 
					 
					
						
						
							
							Merge pull request  #6168  from explosion/fix/default-corpus-values  
						
						
						
					 
					
						2020-09-30 00:24:02 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b799af16de 
							
						 
					 
					
						
						
							
							Don't raise in Pipe.initialize if not implemented  
						
						
						
					 
					
						2020-09-30 00:05:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bc61691f6f 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-09-29 23:41:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f52249fe2e 
							
						 
					 
					
						
						
							
							Fix data augmentation  
						
						
						
					 
					
						2020-09-29 23:40:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							14c4da547f 
							
						 
					 
					
						
						
							
							Try to fix augmentation  
						
						
						
					 
					
						2020-09-29 23:08:56 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ae51843468 
							
						 
					 
					
						
						
							
							Remove augmenter from jinja template [ci skip]  
						
						
						
					 
					
						2020-09-29 23:08:50 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9bb958fd0a 
							
						 
					 
					
						
						
							
							Fix debug data [ci skip]  
						
						
						
					 
					
						2020-09-29 23:07:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2aa1f6882 
							
						 
					 
					
						
						
							
							Disable the OVL augmentation by default  
						
						
						
					 
					
						2020-09-29 23:02:40 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							df8dd91b6f 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into fix/default-corpus-values  
						
						
						
					 
					
						2020-09-29 22:55:39 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0a1ee109db 
							
						 
					 
					
						
						
							
							Remove init form path  
						
						
						
					 
					
						2020-09-29 22:53:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ad6d40d028 
							
						 
					 
					
						
						
							
							Add logging  
						
						
						
					 
					
						2020-09-29 22:53:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							c334a7d45f 
							
						 
					 
					
						
						
							
							Remove  
						
						
						
					 
					
						2020-09-29 22:38:39 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1aeef3bfbb 
							
						 
					 
					
						
						
							
							Make corpus paths default to None and improve errors  
						
						
						
					 
					
						2020-09-29 22:33:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0250bcf6a3 
							
						 
					 
					
						
						
							
							Show validation error during init  
						
						
						
					 
					
						2020-09-29 22:29:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							da30bae8a6 
							
						 
					 
					
						
						
							
							Use __pyx_vtable__ instead of __reduce_cython__  
						
						
						
					 
					
						2020-09-29 22:04:17 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							43c92ec8c9 
							
						 
					 
					
						
						
							
							Resolve dir for better output [ci skip]  
						
						
						
					 
					
						2020-09-29 22:01:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							fa47f87924 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2020-09-29 21:39:28 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							604be54a5c 
							
						 
					 
					
						
						
							
							Support --code in evaluate CLI [ci skip]  
						
						
						
					 
					
						2020-09-29 21:20:56 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6467a560e3 
							
						 
					 
					
						
						
							
							WIP: Test updating Chinese tokenizer  
						
						
						
					 
					
						2020-09-29 21:10:22 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4f3102d09c 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2020-09-29 21:09:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							798040bc1d 
							
						 
					 
					
						
						
							
							Fix language detection  
						
						
						
					 
					
						2020-09-29 21:08:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							78021089f9 
							
						 
					 
					
						
						
							
							Merge pull request  #6160  from explosion/feature/prepare  
						
						
						
					 
					
						2020-09-29 20:55:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c3f8c09d7d 
							
						 
					 
					
						
						
							
							Merge pull request  #6154  from adrianeboyd/bugfix/chinese-tokenizer-pickle  
						
						
						
					 
					
						2020-09-29 20:54:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d3c63b7965 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/prepare  
						
						
						
					 
					
						2020-09-29 20:53:05 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2be80379ec 
							
						 
					 
					
						
						
							
							Fix small issues, resolve_dot_names and debug model  
						
						
						
					 
					
						2020-09-29 20:38:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a4da3120b4 
							
						 
					 
					
						
						
							
							Fix multitasks  
						
						
						
					 
					
						2020-09-29 18:33:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0b5c72fce2 
							
						 
					 
					
						
						
							
							Fix incorrect docstrings  
						
						
						
					 
					
						2020-09-29 18:30:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7851020653 
							
						 
					 
					
						
						
							
							Update tests  
						
						
						
					 
					
						2020-09-29 18:14:15 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							71a0ee274a 
							
						 
					 
					
						
						
							
							Move init labels to init pipeline module  
						
						
						
					 
					
						2020-09-29 18:09:33 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							dba26186ef 
							
						 
					 
					
						
						
							
							Handle None default args in Cython methods  
						
						
						
					 
					
						2020-09-29 18:08:02 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9353a82076 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2020-09-29 18:07:48 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							534e1ef498 
							
						 
					 
					
						
						
							
							Fix template  
						
						
						
					 
					
						2020-09-29 17:02:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f2352eb701 
							
						 
					 
					
						
						
							
							Test with default value  
						
						
						
					 
					
						2020-09-29 17:00:40 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8ce9f44433 
							
						 
					 
					
						
						
							
							Merge branch 'feature/prepare' of  https://github.com/explosion/spaCy  into feature/prepare  
						
						
						
					 
					
						2020-09-29 16:57:38 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e4f535a964 
							
						 
					 
					
						
						
							
							Fix Pipe.labels  
						
						
						
					 
					
						2020-09-29 16:55:07 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4ad26f4a2f 
							
						 
					 
					
						
						
							
							Move reader  
						
						
						
					 
					
						2020-09-29 16:54:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							30c76dbd67 
							
						 
					 
					
						
						
							
							Merge branch 'feature/prepare' of  https://github.com/explosion/spaCy  into feature/prepare  
						
						
						
					 
					
						2020-09-29 16:53:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							43fc7a316d 
							
						 
					 
					
						
						
							
							Add registry function for reading jsonl  
						
						
						
					 
					
						2020-09-29 16:49:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1fd002180e 
							
						 
					 
					
						
						
							
							Allow more components to use labels  
						
						
						
					 
					
						2020-09-29 16:48:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99bff78617 
							
						 
					 
					
						
						
							
							Use labels in tagger  
						
						
						
					 
					
						2020-09-29 16:48:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ca72608059 
							
						 
					 
					
						
						
							
							Fix language  
						
						
						
					 
					
						2020-09-29 16:48:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							10847c7f4e 
							
						 
					 
					
						
						
							
							Fix arg  
						
						
						
					 
					
						2020-09-29 16:48:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							fd594cfb9b 
							
						 
					 
					
						
						
							
							Tighten up format  
						
						
						
					 
					
						2020-09-29 16:47:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e70a00fa76 
							
						 
					 
					
						
						
							
							Remove unnecessary warning from train  
						
						
						
					 
					
						2020-09-29 16:47:54 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3f0d61232d 
							
						 
					 
					
						
						
							
							Remove outdated arg from train  
						
						
						
					 
					
						2020-09-29 16:47:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e957d66b92 
							
						 
					 
					
						
						
							
							Merge branch 'feature/prepare' of  https://github.com/explosion/spaCy  into feature/prepare  
						
						
						
					 
					
						2020-09-29 16:22:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							978ab54a84 
							
						 
					 
					
						
						
							
							Fix logging  
						
						
						
					 
					
						2020-09-29 16:22:41 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							45daf5c9fe 
							
						 
					 
					
						
						
							
							Add init labels command  
						
						
						
					 
					
						2020-09-29 16:22:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							58c8d4b414 
							
						 
					 
					
						
						
							
							Add label_data property to pipeline  
						
						
						
					 
					
						2020-09-29 16:22:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							aa2a6882d0 
							
						 
					 
					
						
						
							
							Fix logging  
						
						
						
					 
					
						2020-09-29 16:08:39 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							63d1598137 
							
						 
					 
					
						
						
							
							Simplify config use in Language.initialize  
						
						
						
					 
					
						2020-09-29 16:05:48 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							56f8bc73ef 
							
						 
					 
					
						
						
							
							Add more tests  
						
						
						
					 
					
						2020-09-29 15:23:34 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6a04e5adea 
							
						 
					 
					
						
						
							
							encoding UTF8 ( #6161 )  
						
						
						
					 
					
						2020-09-29 14:49:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							591038b1a4 
							
						 
					 
					
						
						
							
							Add test  
						
						
						
					 
					
						2020-09-29 12:54:52 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							adca08a12f 
							
						 
					 
					
						
						
							
							Pass nlp forward  
						
						
						
					 
					
						2020-09-29 12:21:52 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f171903139 
							
						 
					 
					
						
						
							
							Clean up sgd and pipeline -> nlp  
						
						
						
					 
					
						2020-09-29 12:20:26 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							612bbf85ab 
							
						 
					 
					
						
						
							
							Update initialize.py  
						
						
						
					 
					
						2020-09-29 12:14:47 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							42f0e4c946 
							
						 
					 
					
						
						
							
							Clean up  
						
						
						
					 
					
						2020-09-29 12:14:08 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9c8b2524fe 
							
						 
					 
					
						
						
							
							Upd initialize args  
						
						
						
					 
					
						2020-09-29 12:08:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e1fdf2b7c5 
							
						 
					 
					
						
						
							
							Upd tests  
						
						
						
					 
					
						2020-09-29 12:05:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							50410c17ac 
							
						 
					 
					
						
						
							
							Update schemas.py  
						
						
						
					 
					
						2020-09-29 12:05:38 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f2d1b7feb5 
							
						 
					 
					
						
						
							
							Clean up sgd  
						
						
						
					 
					
						2020-09-29 12:00:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							78396d137f 
							
						 
					 
					
						
						
							
							Integrate initialize settings  
						
						
						
					 
					
						2020-09-29 11:57:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							dec984a9c1 
							
						 
					 
					
						
						
							
							Update Language.initialize and support components/tokenizer settings  
						
						
						
					 
					
						2020-09-29 11:52:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b3b6868639 
							
						 
					 
					
						
						
							
							Remove 'sgd' arg from component initialize  
						
						
						
					 
					
						2020-09-29 11:42:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5276db6f3f 
							
						 
					 
					
						
						
							
							Remove 'device' argument from Language, clean up 'sgd' arg  
						
						
						
					 
					
						2020-09-29 11:42:19 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4925ad760a 
							
						 
					 
					
						
						
							
							Add init vectors  
						
						
						
					 
					
						2020-09-29 10:58:50 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							64d90039a1 
							
						 
					 
					
						
						
							
							encoding UTF8  
						
						
						
					 
					
						2020-09-29 10:54:42 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ff9a63bfbd 
							
						 
					 
					
						
						
							
							begin_training -> initialize  
						
						
						
					 
					
						2020-09-28 21:35:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							046f655d86 
							
						 
					 
					
						
						
							
							Fix error  
						
						
						
					 
					
						2020-09-28 21:17:45 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a139fe672b 
							
						 
					 
					
						
						
							
							Fix typos and refactor CLI logging  
						
						
						
					 
					
						2020-09-28 21:17:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2e9c9e74af 
							
						 
					 
					
						
						
							
							Fix config resolution and interpolation  
						
						... 
						
						
						
						TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection) 
						
					 
					
						2020-09-28 15:34:00 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							02838a1d47 
							
						 
					 
					
						
						
							
							Fix resolve_dot_names  
						
						
						
					 
					
						2020-09-28 15:27:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							822ea4ef61 
							
						 
					 
					
						
						
							
							Refactor CLI  
						
						
						
					 
					
						2020-09-28 15:09:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a89e0ff7cb 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2020-09-28 12:55:21 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a62337b3f3 
							
						 
					 
					
						
						
							
							Tidy up vocab init  
						
						
						
					 
					
						2020-09-28 12:53:06 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							c22ecc66bb 
							
						 
					 
					
						
						
							
							Don't support init path for now  
						
						
						
					 
					
						2020-09-28 12:46:28 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f49288ab81 
							
						 
					 
					
						
						
							
							Update default_config_pretraining.cfg  
						
						
						
					 
					
						2020-09-28 12:31:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a5f2cc0509 
							
						 
					 
					
						
						
							
							Tidy up and remove raw text (rehearsal) for now  
						
						
						
					 
					
						2020-09-28 12:30:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1590de11b1 
							
						 
					 
					
						
						
							
							Update config  
						
						
						
					 
					
						2020-09-28 12:05:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9f6ad06452 
							
						 
					 
					
						
						
							
							Upd default config  
						
						
						
					 
					
						2020-09-28 12:00:23 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e44a7519cd 
							
						 
					 
					
						
						
							
							Update CLI and add [initialize] block  
						
						
						
					 
					
						2020-09-28 11:56:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d5155376fd 
							
						 
					 
					
						
						
							
							Update vocab init  
						
						
						
					 
					
						2020-09-28 11:30:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8b74fd19df 
							
						 
					 
					
						
						
							
							init pipeline -> init nlp  
						
						
						
					 
					
						2020-09-28 11:13:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2fdb7285a0 
							
						 
					 
					
						
						
							
							Update CLI  
						
						
						
					 
					
						2020-09-28 11:06:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							553bfea641 
							
						 
					 
					
						
						
							
							Fix commands  
						
						
						
					 
					
						2020-09-28 10:53:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44bad1474c 
							
						 
					 
					
						
						
							
							Add init_pipeline file  
						
						
						
					 
					
						2020-09-28 09:47:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							65448b2e34 
							
						 
					 
					
						
						
							
							Remove schema=None until Optional  
						
						
						
					 
					
						2020-09-28 03:42:58 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b886f53c31 
							
						 
					 
					
						
						
							
							init-pipeline runs (maybe doesnt work)  
						
						
						
					 
					
						2020-09-28 03:42:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ed2aff2db3 
							
						 
					 
					
						
						
							
							Remove unused train code  
						
						
						
					 
					
						2020-09-28 03:12:31 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3a0a3b8db6 
							
						 
					 
					
						
						
							
							Dont hard-code for 'corpora' name  
						
						
						
					 
					
						2020-09-28 03:06:33 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a023cf3ecc 
							
						 
					 
					
						
						
							
							Add (untested) resolve_dot_names util  
						
						
						
					 
					
						2020-09-28 03:06:12 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a976da168c 
							
						 
					 
					
						
						
							
							Support data augmentation in Corpus ( #6155 )  
						
						... 
						
						
						
						* Support data augmentation in Corpus
* Note initial docs for data augmentation
* Add augmenter to quickstart
* Fix flake8
* Format
* Fix test
* Update spacy/tests/training/test_training.py
* Improve data augmentation arguments
* Update templates
* Move randomization out into caller
* Refactor
* Update spacy/training/augment.py
* Update spacy/tests/training/test_training.py
* Fix augment
* Fix test 
						
					 
					
						2020-09-28 03:03:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							13b1605ee6 
							
						 
					 
					
						
						
							
							Add init script  
						
						
						
					 
					
						2020-09-28 01:08:49 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a3e1791c9c 
							
						 
					 
					
						
						
							
							Upd train  
						
						
						
					 
					
						2020-09-28 01:08:30 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b5556093e2 
							
						 
					 
					
						
						
							
							Start updating train script  
						
						
						
					 
					
						2020-09-27 23:59:44 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							9016d23cc5 
							
						 
					 
					
						
						
							
							Fix exclude and add test  
						
						
						
					 
					
						2020-09-27 23:34:03 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							658fad428a 
							
						 
					 
					
						
						
							
							Fix base schema integration  
						
						
						
					 
					
						2020-09-27 22:50:36 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e04bd16f7f 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/new-thinc-config-resolution  
						
						
						
					 
					
						2020-09-27 22:34:46 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d7ad65a9bb 
							
						 
					 
					
						
						
							
							Fix handling of error description [ci skip]  
						
						
						
					 
					
						2020-09-27 22:31:57 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7e938ed63e 
							
						 
					 
					
						
						
							
							Update config resolution to use new Thinc  
						
						
						
					 
					
						2020-09-27 22:21:31 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							013b66de05 
							
						 
					 
					
						
						
							
							Add tokenizer scoring to ja / ko / zh ( #6152 )  
						
						
						
					 
					
						2020-09-27 22:20:45 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a6548ead17 
							
						 
					 
					
						
						
							
							Add _ as a symbol ( #6153 )  
						
						... 
						
						
						
						* Add _ to StringStore in Morphology
* Add _ as a symbol
Add `_` as a symbol instead of adding to the `StringStore`. 
						
					 
					
						2020-09-27 22:20:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							39b178999c 
							
						 
					 
					
						
						
							
							Tmp notes  
						
						
						
					 
					
						2020-09-27 20:13:38 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							8393dbedad 
							
						 
					 
					
						
						
							
							Minor fixes  
						
						... 
						
						
						
						* Put `cfg` back in serialization
* Add `pickle5` to pytest conf 
						
					 
					
						2020-09-27 15:15:53 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							54fe871935 
							
						 
					 
					
						
						
							
							Fix formatting, refactor pickle5 exceptions  
						
						
						
					 
					
						2020-09-27 14:37:28 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							11e195d3ed 
							
						 
					 
					
						
						
							
							Update ChineseTokenizer  
						
						... 
						
						
						
						* Allow `pkuseg_model` to be set to `None` on initialization
* Don't save config within tokenizer
* Force convert pkuseg_model to use pickle protocol 4 by reencoding with
`pickle5` on serialization
* Update pkuseg serialization test 
						
					 
					
						2020-09-27 14:00:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b4486d747d 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into fix/train-config-interpolation  
						
						
						
					 
					
						2020-09-26 15:32:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8fea06d55e 
							
						 
					 
					
						
						
							
							Merge pull request  #6149  from adrianeboyd/feature/attributeruler-match-ids  
						
						... 
						
						
						
						Simplify string match IDs for AttributeRuler 
						
					 
					
						2020-09-26 15:31:30 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b2d07de786 
							
						 
					 
					
						
						
							
							Construct nlp from uninterpolated config before training  
						
						
						
					 
					
						2020-09-26 15:16:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ca3c997062 
							
						 
					 
					
						
						
							
							Improve CLI config validation with latest Thinc  
						
						
						
					 
					
						2020-09-26 13:13:57 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							6c25e60089 
							
						 
					 
					
						
						
							
							Simplify string match IDs for AttributeRuler  
						
						
						
					 
					
						2020-09-26 11:12:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							702edf52a0 
							
						 
					 
					
						
						
							
							Fix attributeruler  
						
						
						
					 
					
						2020-09-26 00:30:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							821f37254c 
							
						 
					 
					
						
						
							
							Fix attributeruler  
						
						
						
					 
					
						2020-09-26 00:19:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							98327f66a9 
							
						 
					 
					
						
						
							
							Fix attributeruler key  
						
						
						
					 
					
						2020-09-25 23:20:50 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							092ce4648e 
							
						 
					 
					
						
						
							
							Make DocBin output stable data (set iteration)  
						
						
						
					 
					
						2020-09-25 22:20:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							26afd3bd90 
							
						 
					 
					
						
						
							
							Fix iteration order  
						
						
						
					 
					
						2020-09-25 21:47:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3d8388969e 
							
						 
					 
					
						
						
							
							Sort paths for cache consistency  
						
						
						
					 
					
						2020-09-25 19:07:26 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c3b5a3cfff 
							
						 
					 
					
						
						
							
							Clean up MorphAnalysisC struct ( #6146 )  
						
						
						
					 
					
						2020-09-25 15:56:48 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							009ba14aaf 
							
						 
					 
					
						
						
							
							Fix pretraining in train script ( #6143 )  
						
						... 
						
						
						
						* update pretraining API in train CLI
* bump thinc to 8.0.0a35
* bump to 3.0.0a26
* doc fixes
* small doc fix 
						
					 
					
						2020-09-25 15:47:10 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							50f20cf722 
							
						 
					 
					
						
						
							
							Revert changes to Scorer.score_spans  
						
						
						
					 
					
						2020-09-25 08:21:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							93d7ff309f 
							
						 
					 
					
						
						
							
							Remove print  
						
						
						
					 
					
						2020-09-24 21:05:27 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							16475528f7 
							
						 
					 
					
						
						
							
							Fix skipped documents in entity scorer ( #6137 )  
						
						... 
						
						
						
						* Fix skipped documents in entity scorer
* Add back the skipping of unannotated entities
* Update spacy/scorer.py
* Use more specific NER scorer
* Fix import
* Fix get_ner_prf
* Add scorer
* Fix scorer
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2020-09-24 20:38:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2abb4ba9db 
							
						 
					 
					
						
						
							
							Make a pre-check to speed up alignment cache ( #6139 )  
						
						... 
						
						
						
						* Dirty trick to fast-track alignment cache
* Improve alignment cache check
* Fix header
* Fix align cache
* Fix align logic 
						
					 
					
						2020-09-24 18:13:39 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							26e28ed413 
							
						 
					 
					
						
						
							
							Fix combined scores if multiple components report it  
						
						
						
					 
					
						2020-09-24 17:11:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0b52b6904c 
							
						 
					 
					
						
						
							
							Update entity_linker.py  
						
						
						
					 
					
						2020-09-24 17:10:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							20b89a9717 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-09-24 16:57:02 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3c062b3911 
							
						 
					 
					
						
						
							
							Add MORPH handling to Matcher ( #6107 )  
						
						... 
						
						
						
						* Add MORPH handling to Matcher
* Add `MORPH` to `Matcher` schema
* Rename `_SetMemberPredicate` to `_SetPredicate`
* Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate`
  * Add special handling for normalization and conversion of morph
    values into sets
  * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only
    matches for 0 or 1 values
* Update test
* Rename to IS_SUBSET and IS_SUPERSET 
						
					 
					
						2020-09-24 16:55:09 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							59340606b7 
							
						 
					 
					
						
						
							
							Add option to disable Matcher errors ( #6125 )  
						
						... 
						
						
						
						* Add option to disable Matcher errors
* Add option to disable Matcher errors when a doc doesn't contain a
particular type of annotation
Minor additional change:
* Update `AttributeRuler.load_from_morph_rules` to allow direct `MORPH`
values
* Rename suppress_errors to allow_missing
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* Refactor annotation checks in Matcher and PhraseMatcher
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> 
						
					 
					
						2020-09-24 16:54:39 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c7eedd3534 
							
						 
					 
					
						
						
							
							updates to NEL functionality ( #6132 )  
						
						... 
						
						
						
						* NEL: read sentences and ents from reference
* fiddling with sent_start annotations
* add KB serialization test
* KB write additional file with strings.json
* score_links function to calculate NEL P/R/F
* formatting
* documentation 
						
					 
					
						2020-09-24 16:53:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d0ef4a4cf5 
							
						 
					 
					
						
						
							
							Prevent division by zero in score weights  
						
						
						
					 
					
						2020-09-24 16:42:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							74ee456374 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2020-09-24 16:11:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0bc214c102 
							
						 
					 
					
						
						
							
							Fix pull  
						
						
						
					 
					
						2020-09-24 16:11:33 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							3f751e68f5 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-09-24 14:45:41 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							58dde293ce 
							
						 
					 
					
						
						
							
							Merge pull request  #6089  from adrianeboyd/feature/doc-ents-v3-2  
						
						
						
					 
					
						2020-09-24 14:44:42 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							74e1f192b4 
							
						 
					 
					
						
						
							
							Merge pull request  #6134  from explosion/feature/training_before_to_disk  
						
						
						
					 
					
						2020-09-24 14:44:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							24e7ac3f2b 
							
						 
					 
					
						
						
							
							Fix download CLI [ci skip]  
						
						
						
					 
					
						2020-09-24 14:43:56 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							88e54caa12 
							
						 
					 
					
						
						
							
							accuracy -> performance  
						
						
						
					 
					
						2020-09-24 14:32:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							92f8b6959a 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2020-09-24 13:48:41 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							5c13e0cf1b 
							
						 
					 
					
						
						
							
							Remove unused error  
						
						
						
					 
					
						2020-09-24 13:41:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							be56c0994b 
							
						 
					 
					
						
						
							
							Add [training.before_to_disk] callback  
						
						
						
					 
					
						2020-09-24 12:40:25 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							8eaacaae97 
							
						 
					 
					
						
						
							
							Refactor Doc.ents setter to use Doc.set_ents  
						
						... 
						
						
						
						Additional changes:
* Entity spans with missing labels are ignored
* Fix ent_kb_id setting in `Doc.set_ents` 
						
					 
					
						2020-09-24 12:36:51 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c6c67b606e 
							
						 
					 
					
						
						
							
							Merge pull request  #6133  from explosion/fix/score_weights  
						
						
						
					 
					
						2020-09-24 12:00:57 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f69fea8b25 
							
						 
					 
					
						
						
							
							Improve error handling around non-number scores  
						
						
						
					 
					
						2020-09-24 11:29:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4eb39b5c43 
							
						 
					 
					
						
						
							
							Fix logging  
						
						
						
					 
					
						2020-09-24 11:04:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4bbe41f017 
							
						 
					 
					
						
						
							
							Fix combined scores and update test  
						
						
						
					 
					
						2020-09-24 10:42:47 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c645c4e7ce 
							
						 
					 
					
						
						
							
							fix micro PRF for textcat ( #6130 )  
						
						... 
						
						
						
						* fix micro PRF for textcat
* small fix 
						
					 
					
						2020-09-24 10:31:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							17a6b0a173 
							
						 
					 
					
						
						
							
							Make project pull order insensitive ( #6131 )  
						
						
						
					 
					
						2020-09-24 10:30:42 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ae51f580c1 
							
						 
					 
					
						
						
							
							Fix handling of score_weights  
						
						
						
					 
					
						2020-09-24 10:27:33 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f25f05c503 
							
						 
					 
					
						
						
							
							Adjust sort order [ci skip]  
						
						
						
					 
					
						2020-09-23 20:03:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							3f77eb749c 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-09-23 19:50:15 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							b816ace4bb 
							
						 
					 
					
						
						
							
							format  
						
						
						
					 
					
						2020-09-23 17:33:13 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							5a9fdbc8ad 
							
						 
					 
					
						
						
							
							state_type as Literal  
						
						
						
					 
					
						2020-09-23 17:32:14 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							35dbc63578 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into fix/nr_features  
						
						... 
						
						
						
						# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md 
						
					 
					
						2020-09-23 17:01:13 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							25b34bba94 
							
						 
					 
					
						
						
							
							throw custom error when state_type is invalid  
						
						
						
					 
					
						2020-09-23 16:57:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							916050bf2f 
							
						 
					 
					
						
						
							
							Merge pull request  #6127  from explosion/feature/literal-nr_feature_tokens  
						
						
						
					 
					
						2020-09-23 16:56:08 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							3c3863654e 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-09-23 16:54:43 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							dd2292793f 
							
						 
					 
					
						
						
							
							'parser' instead of 'deps' for state_type  
						
						
						
					 
					
						2020-09-23 16:53:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							50a4425cda 
							
						 
					 
					
						
						
							
							Adjust docs  
						
						
						
					 
					
						2020-09-23 16:03:32 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							76bbed3466 
							
						 
					 
					
						
						
							
							Use Literal type for nr_feature_tokens  
						
						
						
					 
					
						2020-09-23 16:00:03 +02:00 
						 
				 
			
				
					
						
							
							
								Muhammad Fahmi Rasyid 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7489d02dea 
							
						 
					 
					
						
						
							
							Update Indonesian Example Phrases   ( #6124 )  
						
						... 
						
						
						
						* create contributor agreement
* Update Indonesian example. (see  #1107 )
Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words. 
						
					 
					
						2020-09-23 14:02:26 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6c85fab316 
							
						 
					 
					
						
						
							
							state_type and extra_state_tokens instead of nr_feature_tokens  
						
						
						
					 
					
						2020-09-23 13:35:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7745d77a38 
							
						 
					 
					
						
						
							
							Fix whitespace in template [ci skip]  
						
						
						
					 
					
						2020-09-23 13:21:42 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6435458d51 
							
						 
					 
					
						
						
							
							simplify expression  
						
						
						
					 
					
						2020-09-23 12:12:38 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							20b0ec5dcf 
							
						 
					 
					
						
						
							
							avoid logging performance of frozen components  
						
						
						
					 
					
						2020-09-23 10:37:12 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ae5dacf75f 
							
						 
					 
					
						
						
							
							Tidy up and add types  
						
						
						
					 
					
						2020-09-23 10:14:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6ca06cb62c 
							
						 
					 
					
						
						
							
							Update docs and formatting [ci skip]  
						
						
						
					 
					
						2020-09-23 10:14:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							888f936a73 
							
						 
					 
					
						
						
							
							Merge pull request  #6106  from svlandeg/feature/textcat-quickstart  
						
						
						
					 
					
						2020-09-23 10:11:45 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							60a317520a 
							
						 
					 
					
						
						
							
							Merge pull request  #6109  from svlandeg/feature/2rename  
						
						
						
					 
					
						2020-09-23 09:47:12 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f976bab710 
							
						 
					 
					
						
						
							
							Remove empty file [ci skip]  
						
						
						
					 
					
						2020-09-23 09:30:09 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							556f3e4652 
							
						 
					 
					
						
						
							
							add pooling to NEL's TransformerListener  
						
						
						
					 
					
						2020-09-23 09:24:28 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4a56ea72b5 
							
						 
					 
					
						
						
							
							fallbacks for old names  
						
						
						
					 
					
						2020-09-23 09:15:07 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							86a08f819d 
							
						 
					 
					
						
						
							
							tok2vec.update instead of predict ( #6113 )  
						
						
						
					 
					
						2020-09-22 21:54:52 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e4acb28658 
							
						 
					 
					
						
						
							
							Fix norm in retokenizer split ( #6111 )  
						
						... 
						
						
						
						Parallel to behavior in merge, reset norm on original token in
retokenizer split. 
						
					 
					
						2020-09-22 21:53:33 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e0e793be4d 
							
						 
					 
					
						
						
							
							fix KB IO ( #6118 )  
						
						
						
					 
					
						2020-09-22 21:53:06 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9b4979407d 
							
						 
					 
					
						
						
							
							Fix overlapping German noun chunks ( #6112 )  
						
						... 
						
						
						
						Add a similar fix as in #5470  to prevent the German noun chunks iterator
from producing overlapping spans. 
						
					 
					
						2020-09-22 21:52:42 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							b1a7d6c528 
							
						 
					 
					
						
						
							
							Refactor seen token detection  
						
						
						
					 
					
						2020-09-22 14:42:51 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d53c84b6d6 
							
						 
					 
					
						
						
							
							avoid None callback ( #6100 )  
						
						
						
					 
					
						2020-09-22 13:54:44 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							535842e483 
							
						 
					 
					
						
						
							
							Merge branch 'develop' into feature/doc-ents-v3-2  
						
						
						
					 
					
						2020-09-22 13:45:50 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5e3b796b12 
							
						 
					 
					
						
						
							
							Validate section refs in debug config  
						
						
						
					 
					
						2020-09-22 12:24:39 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							085a1c8e2b 
							
						 
					 
					
						
						
							
							add no_output_layer to TextCatBOW config  
						
						
						
					 
					
						2020-09-22 12:06:40 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							e1b8090b9b 
							
						 
					 
					
						
						
							
							few more fixes  
						
						
						
					 
					
						2020-09-22 12:01:06 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							b556a10808 
							
						 
					 
					
						
						
							
							rename converts in_to_out  
						
						
						
					 
					
						2020-09-22 11:50:19 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							e931f4d757 
							
						 
					 
					
						
						
							
							add textcat score  
						
						
						
					 
					
						2020-09-22 10:56:43 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							396b33257f 
							
						 
					 
					
						
						
							
							add entity_linker to jinja template  
						
						
						
					 
					
						2020-09-22 10:40:05 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							db7126ead9 
							
						 
					 
					
						
						
							
							Increment version  
						
						
						
					 
					
						2020-09-22 10:31:26 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							135de82a2d 
							
						 
					 
					
						
						
							
							add textcat to quickstart  
						
						
						
					 
					
						2020-09-22 10:22:06 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6316d5f398 
							
						 
					 
					
						
						
							
							Improve messages in project CLI [ci skip]  
						
						
						
					 
					
						2020-09-22 09:45:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							49e80dbcac 
							
						 
					 
					
						
						
							
							Merge pull request  #6103  from explosion/chore/tidy-up-tests-docs-get-doc  
						
						
						
					 
					
						2020-09-22 09:45:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							81606b29bd 
							
						 
					 
					
						
						
							
							Merge pull request  #6104  from svlandeg/fix/debug_model [ci skip]  
						
						
						
					 
					
						2020-09-22 09:31:23 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							beb766d0a0 
							
						 
					 
					
						
						
							
							Add test  
						
						
						
					 
					
						2020-09-22 09:15:57 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							285fa934d8 
							
						 
					 
					
						
						
							
							Merge branch 'chore/tidy-up-tests-docs-get-doc' of  https://github.com/explosion/spaCy  into chore/tidy-up-tests-docs-get-doc  
						
						
						
					 
					
						2020-09-22 09:10:14 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							69f7e52c26 
							
						 
					 
					
						
						
							
							Update README.md  
						
						
						
					 
					
						2020-09-22 09:10:06 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							45b29c4a5b 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2020-09-21 23:17:23 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							fa5c416db6 
							
						 
					 
					
						
						
							
							initialize through nlp object and with train_corpus  
						
						
						
					 
					
						2020-09-21 23:09:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3abc4a5adb 
							
						 
					 
					
						
						
							
							Slightly tidy doc.ents.__set__  
						
						
						
					 
					
						2020-09-21 22:58:03 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							67fbcb3da5 
							
						 
					 
					
						
						
							
							Tidy up tests and docs  
						
						
						
					 
					
						2020-09-21 20:43:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a5f6ab4943 
							
						 
					 
					
						
						
							
							Merge pull request  #6098  from adrianeboyd/feature/doc-init  
						
						
						
					 
					
						2020-09-21 18:35:20 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							f212303729 
							
						 
					 
					
						
						
							
							Add sent_starts to Doc.__init__  
						
						... 
						
						
						
						Add sent_starts to `Doc.__init__`. Officially specify `is_sent_start`
values but also convert to and accept `sent_start` internally. 
						
					 
					
						2020-09-21 17:59:09 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							447b3e5787 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into fix/debug_model  
						
						... 
						
						
						
						# Conflicts:
#	spacy/cli/debug_model.py 
						
					 
					
						2020-09-21 16:58:40 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b3327c1e45 
							
						 
					 
					
						
						
							
							Increment version [ci skip]  
						
						
						
					 
					
						2020-09-21 16:04:30 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e8bcaa44f1 
							
						 
					 
					
						
						
							
							Don't auto-decompress archives with smart_open [ci skip]  
						
						
						
					 
					
						2020-09-21 16:01:46 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							6aa91c7ca0 
							
						 
					 
					
						
						
							
							Make user_data keyword-only  
						
						
						
					 
					
						2020-09-21 16:00:06 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							177df15d89 
							
						 
					 
					
						
						
							
							Implement Doc.set_ents  
						
						
						
					 
					
						2020-09-21 15:54:05 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							13fbf6556a 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into feature/doc-ents-v3-2  
						
						
						
					 
					
						2020-09-21 14:42:04 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							eb9b447960 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/develop' into fix/debug_model  
						
						... 
						
						
						
						# Conflicts:
#	spacy/cli/debug_model.py 
						
					 
					
						2020-09-21 14:05:16 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							ce455f30ca 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2020-09-21 13:53:29 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							bc02e86494 
							
						 
					 
					
						
						
							
							Extend Doc.__init__ with additional annotation  
						
						... 
						
						
						
						Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to
`Doc.__init__` to initialize the most common doc/token values. 
						
					 
					
						2020-09-21 13:36:24 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							758ead8a47 
							
						 
					 
					
						
						
							
							Sync overrides with CLI overrides  
						
						
						
					 
					
						2020-09-21 12:50:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							5497acf49a 
							
						 
					 
					
						
						
							
							Support config overrides via environment variables  
						
						
						
					 
					
						2020-09-21 11:25:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1114219ae3 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2020-09-21 10:59:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b2302c0a1c 
							
						 
					 
					
						
						
							
							Improve error for missing dependency  
						
						
						
					 
					
						2020-09-20 17:44:51 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8fb59d958c 
							
						 
					 
					
						
						
							
							Format  
						
						
						
					 
					
						2020-09-20 16:31:48 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dc22771f87 
							
						 
					 
					
						
						
							
							Fix sparse checkout  
						
						
						
					 
					
						2020-09-20 16:30:05 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a0fb5e50db 
							
						 
					 
					
						
						
							
							Use simple git clone call if not sparse  
						
						
						
					 
					
						2020-09-20 16:22:04 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c24d633d0 
							
						 
					 
					
						
						
							
							Use updated run_command  
						
						
						
					 
					
						2020-09-20 16:21:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							889128e5c5 
							
						 
					 
					
						
						
							
							Improve error handling in run_command  
						
						
						
					 
					
						2020-09-20 16:20:57 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							554c9a2497 
							
						 
					 
					
						
						
							
							Update docs [ci skip]  
						
						
						
					 
					
						2020-09-20 12:30:53 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6db1d5dc0d 
							
						 
					 
					
						
						
							
							trying some stuff  
						
						
						
					 
					
						2020-09-19 19:11:30 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e863b3dc14 
							
						 
					 
					
						
						
							
							Merge pull request  #6092  from adrianeboyd/bugfix/load-vocab-lookups-2  
						
						
						
					 
					
						2020-09-19 12:33:38 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							39872de1f6 
							
						 
					 
					
						
						
							
							Introducing the gpu_allocator ( #6091 )  
						
						... 
						
						
						
						* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string 
						
					 
					
						2020-09-19 01:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							47080fba98 
							
						 
					 
					
						
						
							
							Minor renaming / refactoring  
						
						... 
						
						
						
						* Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message
* Make `Vocab.lookups` a property 
						
					 
					
						2020-09-18 19:43:19 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							73ff52b9ec 
							
						 
					 
					
						
						
							
							hack for tok2vec listener  
						
						
						
					 
					
						2020-09-18 16:43:15 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							eed4b785f5 
							
						 
					 
					
						
						
							
							Load vocab lookups tables at beginning of training  
						
						... 
						
						
						
						Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
``` 
						
					 
					
						2020-09-18 15:59:16 +02:00