Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7b06bb896e 
							
						 
					 
					
						
						
							
							Fix for serialization  
						
						
						
					 
					
						2017-05-29 13:42:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f4aafca222 
							
						 
					 
					
						
						
							
							Merge changes to test_misc  
						
						
						
					 
					
						2017-05-29 12:26:02 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ff26aa6c37 
							
						 
					 
					
						
						
							
							Work on to/from bytes/disk serialization methods  
						
						
						
					 
					
						2017-05-29 11:45:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							df920ba0e7 
							
						 
					 
					
						
						
							
							Add tests for displaCy and util functions and fix util typo  
						
						
						
					 
					
						2017-05-29 10:51:19 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c91b121aeb 
							
						 
					 
					
						
						
							
							Move serialization functions to util  
						
						
						
					 
					
						2017-05-29 10:13:42 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6dad4117ad 
							
						 
					 
					
						
						
							
							Work on serialization for models  
						
						
						
					 
					
						2017-05-29 01:37:57 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c1983621fb 
							
						 
					 
					
						
						
							
							Update util functions for model loading  
						
						
						
					 
					
						2017-05-28 00:22:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c8543c8237 
							
						 
					 
					
						
						
							
							Fix formatting and docstrings and remove deprecated function  
						
						
						
					 
					
						2017-05-28 00:22:40 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							51882c4984 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-05-26 12:37:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							80cf42e33b 
							
						 
					 
					
						
						
							
							Fix compounding and decaying utils  
						
						
						
					 
					
						2017-05-25 17:15:39 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b9cea9cd93 
							
						 
					 
					
						
						
							
							Add compounding and decaying functions  
						
						
						
					 
					
						2017-05-25 16:16:10 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b5fb43fdd8 
							
						 
					 
					
						
						
							
							Allow sys.exit status as exits keyword arg in util.prints()  
						
						
						
					 
					
						2017-05-22 12:29:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5db89053aa 
							
						 
					 
					
						
						
							
							Merge docstrings  
						
						
						
					 
					
						2017-05-21 13:46:23 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0731971bfc 
							
						 
					 
					
						
						
							
							Add itershuffle utility function. Maybe belongs in thinc  
						
						
						
					 
					
						2017-05-21 09:05:05 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3871157d84 
							
						 
					 
					
						
						
							
							Update spacy.util documentation  
						
						
						
					 
					
						2017-05-21 01:12:09 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							238be0f16a 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-18 08:32:22 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c214c0decb 
							
						 
					 
					
						
						
							
							Improve env_opt reporting  
						
						
						
					 
					
						2017-05-18 08:32:03 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							489d2fb4ba 
							
						 
					 
					
						
						
							
							Add is_in_jupyter() helper for displaCy (see  #1058 )  
						
						
						
					 
					
						2017-05-18 14:13:14 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							abf0188b0a 
							
						 
					 
					
						
						
							
							Move cupy and CudaStream to compat  
						
						
						
					 
					
						2017-05-18 14:12:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fc8d3a112c 
							
						 
					 
					
						
						
							
							Add util.env_opt support: Can set hyper params through environment variables.  
						
						
						
					 
					
						2017-05-18 04:36:53 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1d7c18e58a 
							
						 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/explosion/spaCy  into develop  
						
						
						
					 
					
						2017-05-15 21:53:47 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a9edb3aa1d 
							
						 
					 
					
						
						
							
							Improve integration of NN parser, to support unified training API  
						
						
						
					 
					
						2017-05-15 21:53:27 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c31792aaec 
							
						 
					 
					
						
						
							
							Add displaCy visualisers (see  #1058 )  
						
						
						
					 
					
						2017-05-14 17:50:23 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b462076d80 
							
						 
					 
					
						
						
							
							Merge load_lang_class and get_lang_class  
						
						
						
					 
					
						2017-05-14 01:31:10 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							36bebe7164 
							
						 
					 
					
						
						
							
							Update docstrings  
						
						
						
					 
					
						2017-05-14 01:30:29 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4b9d69f428 
							
						 
					 
					
						
						
							
							Merge branch 'v2' into develop  
						
						... 
						
						
						
						* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module
Currently the two parsers live side-by-side, until we figure out how to
organize them. 
						
					 
					
						2017-05-14 01:10:23 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f8c02b4341 
							
						 
					 
					
						
						
							
							Remove cupy imports from parser, so it can work on CPU  
						
						
						
					 
					
						2017-05-14 00:37:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1694c24e52 
							
						 
					 
					
						
						
							
							Add docstrings, error messages and fix consistency  
						
						
						
					 
					
						2017-05-13 21:22:49 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ee7dcf65c9 
							
						 
					 
					
						
						
							
							Fix expand_exc to make sure it returns combined dict  
						
						
						
					 
					
						2017-05-13 21:22:25 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							824d09bb74 
							
						 
					 
					
						
						
							
							Move resolve_load_name to deprecated  
						
						
						
					 
					
						2017-05-13 21:21:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c4857bc7db 
							
						 
					 
					
						
						
							
							Remove unused argument  
						
						
						
					 
					
						2017-05-12 15:37:54 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							86d9c29f30 
							
						 
					 
					
						
						
							
							Reorder util functions  
						
						
						
					 
					
						2017-05-08 23:51:15 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9a0d2fdef1 
							
						 
					 
					
						
						
							
							Add load_lang_class() util function  
						
						
						
					 
					
						2017-05-08 23:50:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							2edc0aee12 
							
						 
					 
					
						
						
							
							Update warning message  
						
						
						
					 
					
						2017-05-08 19:53:36 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							b9ba58ba5c 
							
						 
					 
					
						
						
							
							Add function to resolve load name  
						
						... 
						
						
						
						Warn if old 'path' keyword argument is used. 
						
					 
					
						2017-05-08 16:33:37 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							607ba458e7 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2017-05-08 15:42:31 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							60db497525 
							
						 
					 
					
						
						
							
							Add update_exc and expand_exc to util  
						
						... 
						
						
						
						Doesn't require separate language data util anymore 
						
					 
					
						2017-05-08 15:42:12 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							95edd9e896 
							
						 
					 
					
						
						
							
							Let parse_package_meta take full path  
						
						
						
					 
					
						2017-05-08 15:30:48 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							326746eb15 
							
						 
					 
					
						
						
							
							Add util function to resolve arg to model path  
						
						... 
						
						
						
						1. check if in data dir or shortcut link
2. check if installed as a pip package
3. check if string is path to model
4. check if Path or Path-like object 
						
					 
					
						2017-05-08 15:29:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							94697e9afc 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2017-05-08 02:00:37 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c4492d260a 
							
						 
					 
					
						
						
							
							Fix kwargs  
						
						
						
					 
					
						2017-05-08 01:05:24 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							59c3b9d4dd 
							
						 
					 
					
						
						
							
							Tidy up CLI and fix print functions  
						
						
						
					 
					
						2017-05-07 23:25:29 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e34069db9f 
							
						 
					 
					
						
						
							
							Move is_package and get_model_package_path to util  
						
						
						
					 
					
						2017-05-07 23:24:51 +02:00 
						 
				 
			
				
					
						
							
							
								Ben Eyal 
							
						 
					 
					
						
						
						
						
							
						
						
							d8098a8be2 
							
						 
					 
					
						
						
							
							Use regex instead of re  
						
						
						
					 
					
						2017-04-20 02:22:52 +03:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							97647c46cd 
							
						 
					 
					
						
						
							
							Add docstring and todo note  
						
						
						
					 
					
						2017-04-16 22:14:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5c5f8c0a72 
							
						 
					 
					
						
						
							
							Check if full string is found in lang classes first  
						
						... 
						
						
						
						This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.) 
						
					 
					
						2017-04-16 22:14:38 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1f9f867c70 
							
						 
					 
					
						
						
							
							Remove unused util function  
						
						
						
					 
					
						2017-04-16 20:37:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ed7e19ad68 
							
						 
					 
					
						
						
							
							Remove unused import  
						
						
						
					 
					
						2017-04-16 20:37:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0084466a66 
							
						 
					 
					
						
						
							
							Remove unused utf8open util and replace os.path with ensure_path  
						
						
						
					 
					
						2017-04-16 20:37:45 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d10bd0eaf9 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-04-16 13:42:34 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							31fa73293a 
							
						 
					 
					
						
						
							
							Move read_json out to own util function  
						
						
						
					 
					
						2017-04-16 13:03:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e6ee7e130f 
							
						 
					 
					
						
						
							
							Fix parse package meta  
						
						
						
					 
					
						2017-04-15 13:38:53 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							e1efd589c3 
							
						 
					 
					
						
						
							
							Fix json imports and use ujson  
						
						
						
					 
					
						2017-04-15 12:13:34 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							956dc36785 
							
						 
					 
					
						
						
							
							Move functions to deprecated  
						
						
						
					 
					
						2017-04-15 12:12:31 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							c05ec4b89a 
							
						 
					 
					
						
						
							
							Add compat functions and remove old workarounds  
						
						... 
						
						
						
						Add ensure_path util function to handle checking instance of path 
						
					 
					
						2017-04-15 12:11:16 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d24589aa72 
							
						 
					 
					
						
						
							
							Clean up imports, unused code, whitespace, docstrings  
						
						
						
					 
					
						2017-04-15 12:05:47 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							75f9b4c6e2 
							
						 
					 
					
						
						
							
							Fix whitespace  
						
						
						
					 
					
						2017-04-07 10:22:18 +02:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							fdec758113 
							
						 
					 
					
						
						
							
							Add is_windows and is_python2 utility functions  
						
						
						
					 
					
						2017-03-25 14:04:02 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3f20efe165 
							
						 
					 
					
						
						
							
							Merge branch 'develop'  
						
						... 
						
						
						
						# Conflicts:
#	spacy/util.py 
						
					 
					
						2017-03-22 17:14:15 +01:00 
						 
				 
			
				
					
						
							
							
								Raphaël Bournhonesque 
							
						 
					 
					
						
						
						
						
							
						
						
							f332bf05be 
							
						 
					 
					
						
						
							
							Remove unused import statements  
						
						
						
					 
					
						2017-03-21 21:08:54 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5aea327a5b 
							
						 
					 
					
						
						
							
							Add util function to get raw user input  
						
						
						
					 
					
						2017-03-20 22:48:56 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							a6c0361803 
							
						 
					 
					
						
						
							
							Handle raw_input vs input in Python 2 and 3  
						
						
						
					 
					
						2017-03-20 22:48:32 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							adbcac6591 
							
						 
					 
					
						
						
							
							Fix spacing  
						
						
						
					 
					
						2017-03-20 22:48:21 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0eafc0f2c6 
							
						 
					 
					
						
						
							
							Add util functions to print data as table or markdown list  
						
						
						
					 
					
						2017-03-18 13:00:14 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							adb0b7e43b 
							
						 
					 
					
						
						
							
							Fix loading when no package found  
						
						
						
					 
					
						2017-03-16 18:30:23 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3d484c3faf 
							
						 
					 
					
						
						
							
							Don't print in parse_package_meta and accept on_erro callback instead  
						
						... 
						
						
						
						TODO: log warning for missing meta data in spacy.link, as this affects
the Language class returned by spacy.load() 
						
					 
					
						2017-03-16 20:34:50 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							5f3f04bd0a 
							
						 
					 
					
						
						
							
							Add util function to load and parse package meta.json  
						
						
						
					 
					
						2017-03-16 17:10:05 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							7f920c2f75 
							
						 
					 
					
						
						
							
							Don't break text in when rendering print_msg  
						
						
						
					 
					
						2017-03-16 17:09:50 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							68c04fa897 
							
						 
					 
					
						
						
							
							Move sys_exit() function to util  
						
						
						
					 
					
						2017-03-16 17:08:58 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							7b2eca36e4 
							
						 
					 
					
						
						
							
							Revert "Fix formatting and remove unused code"  
						
						... 
						
						
						
						This reverts commit d7898d586f 
						
					 
					
						2017-03-16 09:58:41 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							f5d1a39a5b 
							
						 
					 
					
						
						
							
							Add util functions for printing and wrapping messages  
						
						
						
					 
					
						2017-03-15 17:35:57 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d7898d586f 
							
						 
					 
					
						
						
							
							Fix formatting and remove unused code  
						
						
						
					 
					
						2017-03-15 17:35:41 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66c1f194f9 
							
						 
					 
					
						
						
							
							Use consistent unicode declarations  
						
						
						
					 
					
						2017-03-12 13:07:28 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0f9b8a00a5 
							
						 
					 
					
						
						
							
							Unbreak data download  
						
						
						
					 
					
						2017-01-09 23:40:26 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d9a77ddf14 
							
						 
					 
					
						
						
							
							Return None for data path if it doesn't exist  
						
						
						
					 
					
						2017-01-09 14:10:05 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							de5aa92bc2 
							
						 
					 
					
						
						
							
							Handle deprecated tokenizer prefix data  
						
						
						
					 
					
						2017-01-08 20:33:28 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6a60a61086 
							
						 
					 
					
						
						
							
							Move update_exc to global language data utils  
						
						
						
					 
					
						2016-12-17 12:29:02 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							66c7348cda 
							
						 
					 
					
						
						
							
							Add update_exc util function  
						
						
						
					 
					
						2016-12-08 13:58:12 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8e977cc71c 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2016-12-08 13:56:17 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6b8b05ef83 
							
						 
					 
					
						
						
							
							Specify that spacy.util is encoded in utf8  
						
						
						
					 
					
						2016-11-02 19:58:00 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9efe568177 
							
						 
					 
					
						
						
							
							Add missing unicode_literals to spacy.util. I think this was messing up the tokenizer regex for non-ascii characters in Python 2. Re Issue  #596  
						
						
						
					 
					
						2016-11-02 12:31:34 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5e923b9bfa 
							
						 
					 
					
						
						
							
							Return None in match_best_version if not path exists.  
						
						
						
					 
					
						2016-10-15 14:47:29 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ea23b64cc8 
							
						 
					 
					
						
						
							
							Refactor training, with new spacy.train module. Defaults still a little awkward.  
						
						
						
					 
					
						2016-10-09 12:24:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95aaea0d3f 
							
						 
					 
					
						
						
							
							Refactor so that the tokenizer data is read from Python data, rather than from disk  
						
						
						
					 
					
						2016-09-25 14:49:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							82b8cc5efb 
							
						 
					 
					
						
						
							
							Whitespace  
						
						
						
					 
					
						2016-09-24 22:17:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f19af6cb2c 
							
						 
					 
					
						
						
							
							Python 3 compatible basestring  
						
						
						
					 
					
						2016-09-24 22:08:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd65cf6cbb 
							
						 
					 
					
						
						
							
							Finish refactoring data loading  
						
						
						
					 
					
						2016-09-24 20:26:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83e364188c 
							
						 
					 
					
						
						
							
							Mostly finished loading refactoring. Design is in place, but doesn't work yet.  
						
						
						
					 
					
						2016-09-24 15:42:01 +02:00 
						 
				 
			
				
					
						
							
							
								Daylen Yang 
							
						 
					 
					
						
						
						
						
							
						
						
							5405e7dd73 
							
						 
					 
					
						
						
							
							Fix get_lang_class parsing (take 2)  
						
						
						
					 
					
						2016-05-16 16:40:31 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b240104f40 
							
						 
					 
					
						
						
							
							Revert "Fix get_lang_class parsing"  
						
						
						
					 
					
						2016-05-17 08:04:26 +10:00 
						 
				 
			
				
					
						
							
							
								Daylen Yang 
							
						 
					 
					
						
						
						
						
							
						
						
							1692c2df3c 
							
						 
					 
					
						
						
							
							Fix get_lang_class parsing  
						
						... 
						
						
						
						We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens. 
						
					 
					
						2016-05-16 14:38:20 -07:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							ff690f76ba 
							
						 
					 
					
						
						
							
							fix loading non-german models  
						
						
						
					 
					
						2016-04-12 16:00:56 +02:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							c90d4a6f17 
							
						 
					 
					
						
						
							
							relative imports in __init__.py  
						
						
						
					 
					
						2016-03-26 11:44:53 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							b8f63071eb 
							
						 
					 
					
						
						
							
							add lang registration facility  
						
						
						
					 
					
						2016-03-25 18:54:45 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							a7d7ea3afa 
							
						 
					 
					
						
						
							
							first idea for supporting multiple langs in download script  
						
						
						
					 
					
						2016-03-24 11:19:43 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							eb7ae61b1c 
							
						 
					 
					
						
						
							
							cleanup api  
						
						
						
					 
					
						2016-03-08 12:59:18 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9cc4f8d5b3 
							
						 
					 
					
						
						
							
							avoid shadowing __name__  
						
						
						
					 
					
						2016-02-15 01:33:39 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							235f094534 
							
						 
					 
					
						
						
							
							untangle data_path/via  
						
						
						
					 
					
						2016-01-16 12:23:45 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							6d1a3af343 
							
						 
					 
					
						
						
							
							cleanup unused  
						
						
						
					 
					
						2016-01-16 10:05:04 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							846fa49b2a 
							
						 
					 
					
						
						
							
							distinct load() and from_package() methods  
						
						
						
					 
					
						2016-01-16 10:00:57 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							211913d689 
							
						 
					 
					
						
						
							
							add about.py, adapt setup.py  
						
						
						
					 
					
						2016-01-15 18:57:01 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							788f734513 
							
						 
					 
					
						
						
							
							refactored data_dir->via, add zip_safe, add spacy.load()  
						
						
						
					 
					
						2016-01-15 18:01:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							d9471f684f 
							
						 
					 
					
						
						
							
							fix typo  
						
						
						
					 
					
						2016-01-14 12:14:12 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9b75d872b0 
							
						 
					 
					
						
						
							
							fix model download  
						
						
						
					 
					
						2016-01-14 12:02:56 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							bc229790ac 
							
						 
					 
					
						
						
							
							integrate with sputnik  
						
						
						
					 
					
						2016-01-13 19:46:17 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf2ad59f1 
							
						 
					 
					
						
						
							
							* Fix use of mock Package object  
						
						
						
					 
					
						2015-12-31 04:13:15 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2dfdec85d 
							
						 
					 
					
						
						
							
							* Clean up spacy.util  
						
						
						
					 
					
						2015-12-29 18:06:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aec130af56 
							
						 
					 
					
						
						
							
							Use util.Package class for io  
						
						... 
						
						
						
						Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download 
						
					 
					
						2015-12-29 18:00:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4131e45543 
							
						 
					 
					
						
						
							
							* Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way  
						
						
						
					 
					
						2015-12-29 16:55:03 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							d8d348bb55 
							
						 
					 
					
						
						
							
							allow to specify version constraint within model name  
						
						
						
					 
					
						2015-12-18 19:12:08 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							cfa187aaf0 
							
						 
					 
					
						
						
							
							fix tests  
						
						
						
					 
					
						2015-12-18 10:58:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							8359bd4d93 
							
						 
					 
					
						
						
							
							strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible  
						
						
						
					 
					
						2015-12-18 09:52:55 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9027cef3bc 
							
						 
					 
					
						
						
							
							access model via sputnik  
						
						
						
					 
					
						2015-12-07 06:01:28 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dc393a5f1d 
							
						 
					 
					
						
						
							
							Merge pull request  #126  from tomtung/master  
						
						... 
						
						
						
						Improve slicing support for both Doc and Span 
						
					 
					
						2015-10-10 14:14:57 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83dccf0fd7 
							
						 
					 
					
						
						
							
							* Use io module insteads of deprecated codecs module  
						
						
						
					 
					
						2015-10-10 14:13:01 +11:00 
						 
				 
			
				
					
						
							
							
								Yubing (Tom) Dong 
							
						 
					 
					
						
						
						
						
							
						
						
							3fd3bc79aa 
							
						 
					 
					
						
						
							
							Refactor to remove duplicate slicing logic  
						
						
						
					 
					
						2015-10-07 01:25:35 -07:00 
						 
				 
			
				
					
						
							
							
								alvations 
							
						 
					 
					
						
						
						
						
							
						
						
							8199012d26 
							
						 
					 
					
						
						
							
							changing deprecated codecs.open to io.open =)  
						
						
						
					 
					
						2015-09-30 20:10:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6ab1696b15 
							
						 
					 
					
						
						
							
							* Remove read_encoding_freqs from util.py  
						
						
						
					 
					
						2015-07-23 01:17:32 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							317cbbc015 
							
						 
					 
					
						
						
							
							* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.  
						
						
						
					 
					
						2015-07-19 15:18:17 +02:00 
						 
				 
			
				
					
						
							
							
								Jordan Suchow 
							
						 
					 
					
						
						
						
						
							
						
						
							3a8d9b37a6 
							
						 
					 
					
						
						
							
							Remove trailing whitespace  
						
						
						
					 
					
						2015-04-19 13:01:38 -07:00 
						 
				 
			
				
					
						
							
							
								Jordan Suchow 
							
						 
					 
					
						
						
						
						
							
						
						
							5f0f940a1f 
							
						 
					 
					
						
						
							
							Remove unused imports  
						
						
						
					 
					
						2015-04-19 01:05:22 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3f1944d688 
							
						 
					 
					
						
						
							
							* Make PyPy work  
						
						
						
					 
					
						2015-01-05 17:54:38 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f5d41028b5 
							
						 
					 
					
						
						
							
							* Move around data files for test release  
						
						
						
					 
					
						2015-01-03 01:59:22 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e1c1a4b868 
							
						 
					 
					
						
						
							
							* Tmp  
						
						
						
					 
					
						2014-12-21 05:36:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b962fe73d7 
							
						 
					 
					
						
						
							
							* Make suffixes file use full-power regex, so that we can handle periods properly  
						
						
						
					 
					
						2014-12-09 19:04:27 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							302e09018b 
							
						 
					 
					
						
						
							
							* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas  
						
						
						
					 
					
						2014-12-09 14:48:01 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ea8f1e7053 
							
						 
					 
					
						
						
							
							* Tighten interfaces  
						
						
						
					 
					
						2014-10-30 18:14:42 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							67c8c8019f 
							
						 
					 
					
						
						
							
							* Update lexeme serialization, using a binary file format  
						
						
						
					 
					
						2014-10-30 01:01:00 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							43d5964e13 
							
						 
					 
					
						
						
							
							* Add function to read detokenization rules  
						
						
						
					 
					
						2014-10-22 12:54:59 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							12742f4f83 
							
						 
					 
					
						
						
							
							* Add detokenize method and test  
						
						
						
					 
					
						2014-10-18 18:07:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6fb42c4919 
							
						 
					 
					
						
						
							
							* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang  
						
						
						
					 
					
						2014-10-14 16:17:45 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e40caae51f 
							
						 
					 
					
						
						
							
							* Update Lexicon class to expect a list of lexeme dict descriptions  
						
						
						
					 
					
						2014-10-09 14:51:35 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e44fa7179 
							
						 
					 
					
						
						
							
							* Add util.py  
						
						
						
					 
					
						2014-09-25 18:26:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e9a62b6eba 
							
						 
					 
					
						
						
							
							* Refactoring with Lexeme as a class now compiles. Basic design seems to work  
						
						
						
					 
					
						2014-08-27 17:15:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d10993f41a 
							
						 
					 
					
						
						
							
							* More docs work  
						
						
						
					 
					
						2014-08-21 16:37:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3379d7a571 
							
						 
					 
					
						
						
							
							* Reforming data model for lexemes  
						
						
						
					 
					
						2014-08-19 02:40:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01469b0888 
							
						 
					 
					
						
						
							
							* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.  
						
						
						
					 
					
						2014-08-18 19:14:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ff1869ff07 
							
						 
					 
					
						
						
							
							* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++  
						
						
						
					 
					
						2014-07-07 07:36:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							25849fc926 
							
						 
					 
					
						
						
							
							* Generalize tokenization rules to capitals  
						
						
						
					 
					
						2014-07-07 05:07:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4e79446dc2 
							
						 
					 
					
						
						
							
							* Reading in tokenization rules correctly. Passing tests.  
						
						
						
					 
					
						2014-07-07 00:02:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							556f6a18ca 
							
						 
					 
					
						
						
							
							* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.  
						
						
						
					 
					
						2014-07-05 20:51:42 +02:00