adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2281c4708c 
							
						 
					 
					
						
						
							
							Restore empty tokenizer properties ( #5026 )  
						
						... 
						
						
						
						* Restore empty tokenizer properties
* Check for types in tokenizer.from_bytes()
* Add test for setting empty tokenizer rules 
						
					 
					
						2020-03-02 11:55:02 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c6b12ab02a 
							
						 
					 
					
						
						
							
							Bugfix/get doc ( #5049 )  
						
						... 
						
						
						
						* new (broken) unit test
* fixing get_doc method 
						
					 
					
						2020-03-02 11:49:28 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							65d7bab10f 
							
						 
					 
					
						
						
							
							Initialize all values in a2b/b2a in new align ( #5063 )  
						
						
						
					 
					
						2020-02-27 18:43:00 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b4e0d2bf50 
							
						 
					 
					
						
						
							
							Improve Makefile ( #5067 )  
						
						... 
						
						
						
						* Improve pex making
* Update gitignore 
						
					 
					
						2020-02-26 20:59:10 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1c212215cd 
							
						 
					 
					
						
						
							
							Merge pull request  #5064  from adrianeboyd/feature/german-tokenization  
						
						... 
						
						
						
						Improve German tokenization 
						
					 
					
						2020-02-26 13:41:44 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							56978f5cd8 
							
						 
					 
					
						
						
							
							Merge pull request  #5060  from svlandeg/feature/update-thinc  
						
						... 
						
						
						
						update thinc 
						
					 
					
						2020-02-26 13:40:23 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							d1f703d78d 
							
						 
					 
					
						
						
							
							Improve German tokenization  
						
						... 
						
						
						
						Improve German tokenization with respect to Tiger. 
						
					 
					
						2020-02-26 13:06:52 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							54da6a2a07 
							
						 
					 
					
						
						
							
							Update pyproject.toml  
						
						
						
					 
					
						2020-02-26 12:51:53 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							ed9358420e 
							
						 
					 
					
						
						
							
							Merge branch 'master' into pr/5060  
						
						
						
					 
					
						2020-02-26 12:51:29 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ff184b7a9c 
							
						 
					 
					
						
						
							
							Add tag_map argument to CLI debug-data and train ( #4750 ) ( #5038 )  
						
						... 
						
						
						
						Add an argument for a path to a JSON-formatted tag map, which is used to
update and extend the default language tag map. 
						
					 
					
						2020-02-26 12:10:38 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							18ff97589d 
							
						 
					 
					
						
						
							
							update spacy to 2.2.4.dev0  
						
						
						
					 
					
						2020-02-26 10:50:05 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							62406a9513 
							
						 
					 
					
						
						
							
							update from thinc 7.4.0.dev2 to 7.4.0  
						
						
						
					 
					
						2020-02-26 10:30:35 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c7e3c034d2 
							
						 
					 
					
						
						
							
							Merge pull request  #5061  from explosion/fix/pyproject-toml-master  
						
						... 
						
						
						
						Update pyproject.toml 
						
					 
					
						2020-02-25 20:22:26 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							dc36ec98a4 
							
						 
					 
					
						
						
							
							Update pyproject.toml  
						
						
						
					 
					
						2020-02-25 16:46:14 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							acb4e3c7ba 
							
						 
					 
					
						
						
							
							Merge pull request  #5039  from adrianeboyd/typo/website-token-api-shape  
						
						... 
						
						
						
						Fix formatting in Token API 
						
					 
					
						2020-02-25 14:57:25 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d50152b917 
							
						 
					 
					
						
						
							
							Merge pull request  #5019  from questoph/master  
						
						... 
						
						
						
						Optimizing tokenization for Luxembourgish (dealing with apostrophe infixes) 
						
					 
					
						2020-02-25 14:48:50 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4440a072d2 
							
						 
					 
					
						
						
							
							Merge pull request  #5006  from svlandeg/bugfix/multiproc-underscore  
						
						... 
						
						
						
						load Underscore state when multiprocessing 
						
					 
					
						2020-02-25 14:46:02 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							38fc05986c 
							
						 
					 
					
						
						
							
							Merge pull request  #5058  from bryant1410/patch-1  
						
						... 
						
						
						
						Add missing comma in a dependency specification 
						
					 
					
						2020-02-25 14:44:29 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							d848a68340 
							
						 
					 
					
						
						
							
							thinc 7.4.0.dev2  
						
						
						
					 
					
						2020-02-25 12:07:42 +01:00 
						 
				 
			
				
					
						
							
							
								Santiago Castro 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							54d8665ff7 
							
						 
					 
					
						
						
							
							Add missing comma in a dependency specification  
						
						... 
						
						
						
						Conda is complaining that it can't parse that line otherwise. 
						
					 
					
						2020-02-24 16:15:28 -05:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							b49a3afd0c 
							
						 
					 
					
						
						
							
							use clean_underscore fixture  
						
						
						
					 
					
						2020-02-23 15:49:20 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d6c0746347 
							
						 
					 
					
						
						
							
							Merge branch 'master' into spacy.io  
						
						
						
					 
					
						2020-02-23 13:57:01 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							4890db6339 
							
						 
					 
					
						
						
							
							Auto-format and fix image [ci skip]  
						
						
						
					 
					
						2020-02-23 13:56:50 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							89967f3701 
							
						 
					 
					
						
						
							
							Merge branch 'master' into spacy.io  
						
						
						
					 
					
						2020-02-23 12:04:20 +01:00 
						 
				 
			
				
					
						
							
							
								Tom Keefe 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ddf63b97a8 
							
						 
					 
					
						
						
							
							make idx available via to_array ( #5030 )  
						
						
						
					 
					
						2020-02-22 14:13:06 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							44f4142ce4 
							
						 
					 
					
						
						
							
							add two abbreviations and some additional unit tests ( #5040 )  
						
						
						
					 
					
						2020-02-22 14:12:32 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							479bd8d09f 
							
						 
					 
					
						
						
							
							add lemma option to displacy 'dep' visualiser ( #5041 )  
						
						... 
						
						
						
						* add lemma option to displacy 'dep' visualiser
* more compact list comprehension
* add option to doc
* fix test and add lemmas to util.get_doc
* fix capital
* remove lemma from get_doc
* cleanup 
						
					 
					
						2020-02-22 14:11:51 +01:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							3853d385fa 
							
						 
					 
					
						
						
							
							Fix formatting in Token API  
						
						
						
					 
					
						2020-02-20 13:41:24 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2164e71ea8 
							
						 
					 
					
						
						
							
							Improved Romanian tokenization for UD RRT ( #5036 )  
						
						... 
						
						
						
						Modifications to Romanian tokenization to improve tokenization for
UD_Romanian-RRT. 
						
					 
					
						2020-02-19 16:15:59 +01:00 
						 
				 
			
				
					
						
							
							
								Jan Jessewitsch 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c7e4fe9c5c 
							
						 
					 
					
						
						
							
							Fix/Improve german stop words ( #5024 )  
						
						... 
						
						
						
						* Fix german stop words
Two stop words ("einige" and  "einigen") are sticking together.
Remove three nouns that may serve as stop words in a specific context (e.g. religious or news) but are not applicable for general use.
* Create Jan-711.md 
						
					 
					
						2020-02-17 18:59:22 +01:00 
						 
				 
			
				
					
						
							
							
								Kabir Khan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f6ed07b85c 
							
						 
					 
					
						
						
							
							Use nlp.pipe in EntityRuler for phrase patterns in add_patterns ( #4931 )  
						
						... 
						
						
						
						* Fix ent_ids and labels properties when id attribute used in patterns
* use set for labels
* sort end_ids for comparison in entity_ruler tests
* fixing entity_ruler ent_ids test
* add to set
* Run make_doc optimistically if using phrase matcher patterns.
* remove unused coveragerc I was testing with
* format
* Refactor EntityRuler.add_patterns to use nlp.pipe for phrase patterns. Improves speed substantially.
* Removing old add_patterns function
* Fixing spacing
* Make sure token_patterns loaded as well, before generator was being emptied in from_disk 
						
					 
					
						2020-02-16 18:17:47 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							72c964bcf4 
							
						 
					 
					
						
						
							
							define pretrained_dims which is used by build_text_classifier ( #5004 )  
						
						
						
					 
					
						2020-02-16 17:21:17 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3b22eb651b 
							
						 
					 
					
						
						
							
							Sync Span __eq__ and __hash__ ( #5005 )  
						
						... 
						
						
						
						* Sync Span __eq__ and __hash__
Use the same tuple for `__eq__` and `__hash__`, including all attributes
except `vector` and `vector_norm`.
* Update entity comparison in tests
Update `assert_docs_equal()` test util to compare `Span` properties for
ents rather than `Span` objects. 
						
					 
					
						2020-02-16 17:20:36 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0c47a53b5e 
							
						 
					 
					
						
						
							
							Use int only in key2row for better performance ( #4990 )  
						
						... 
						
						
						
						Cast all keys and rows to `int` in `vectors.key2row` for more efficient
access and serialization. 
						
					 
					
						2020-02-16 17:19:41 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5b102963bf 
							
						 
					 
					
						
						
							
							Require HEAD for is_parsed in Doc.from_array() ( #5011 )  
						
						... 
						
						
						
						Modify flag settings so that `DEP` is not sufficient to set `is_parsed`
and only run `set_children_from_heads()` if `HEAD` is provided.
Then the combination `[SENT_START, DEP]` will set deps and not clobber
sent starts with a lot of one-word sentences. 
						
					 
					
						2020-02-16 17:17:09 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2572460175 
							
						 
					 
					
						
						
							
							add tok2vec parameters to train script to facilitate init_tok2vec ( #5021 )  
						
						
						
					 
					
						2020-02-16 17:16:41 +01:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a27c77ce62 
							
						 
					 
					
						
						
							
							add message when cli train script throws exception ( #5009 )  
						
						... 
						
						
						
						* add message when cli train script throws exception
* fix formatting 
						
					 
					
						2020-02-15 15:50:17 +01:00 
						 
				 
			
				
					
						
							
							
								Christos Aridas 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ff8e71f46d 
							
						 
					 
					
						
						
							
							Update streamlit app ( #5017 )  
						
						... 
						
						
						
						* Update streamlit app [ci skip]
* Add all labels by default
* Tidy up and auto-format
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2020-02-15 15:49:09 +01:00 
						 
				 
			
				
					
						
							
							
								nlptechbook 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							979a3fd1f5 
							
						 
					 
					
						
						
							
							Update universe.json ( #5022 )  
						
						... 
						
						
						
						e-book is available from https://nostarch.com/NLPPython  
						
					 
					
						2020-02-15 15:44:55 +01:00 
						 
				 
			
				
					
						
							
							
								questoph 
							
						 
					 
					
						
						
						
						
							
						
						
							5352fc8fc3 
							
						 
					 
					
						
						
							
							Update tokenizer_exceptions.py  
						
						
						
					 
					
						2020-02-14 12:02:15 +01:00 
						 
				 
			
				
					
						
							
							
								questoph 
							
						 
					 
					
						
						
						
						
							
						
						
							d1f0b397b5 
							
						 
					 
					
						
						
							
							Update punctuation.py  
						
						
						
					 
					
						2020-02-13 22:18:51 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6e717c62ed 
							
						 
					 
					
						
						
							
							avoid the tests interacting with eachother through the global Underscore variable  
						
						
						
					 
					
						2020-02-12 13:21:31 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							7939c63886 
							
						 
					 
					
						
						
							
							use English instead of model  
						
						
						
					 
					
						2020-02-12 12:26:27 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							46628d8890 
							
						 
					 
					
						
						
							
							add some asserts  
						
						
						
					 
					
						2020-02-12 12:12:52 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							51d37033c8 
							
						 
					 
					
						
						
							
							remove old comment  
						
						
						
					 
					
						2020-02-12 12:10:05 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							65f5b48b5d 
							
						 
					 
					
						
						
							
							add comment  
						
						
						
					 
					
						2020-02-12 12:06:27 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							05dedaa2cf 
							
						 
					 
					
						
						
							
							add unit test  
						
						
						
					 
					
						2020-02-12 12:00:13 +01:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ecbb9c4b9f 
							
						 
					 
					
						
						
							
							load Underscore state when multiprocessing  
						
						
						
					 
					
						2020-02-12 11:50:42 +01:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							99a543367d 
							
						 
					 
					
						
						
							
							Set GPU before loading any models in train CLI ( #4989 )  
						
						... 
						
						
						
						Set the GPU before loading any existing models in the train CLI so that
you can start with a base model and train on GPU. 
						
					 
					
						2020-02-11 17:45:41 -05:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							842dfddbb9 
							
						 
					 
					
						
						
							
							Standardize Greek tag map setup ( #4997 )  
						
						... 
						
						
						
						* Rename `tag_map.py` to `tag_map_fine.py` to indicate that it's not the
default tag map
* Remove duplicate generic UD tag map and load `../tag_map.py` instead 
						
					 
					
						2020-02-11 17:44:56 -05:00