Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e619aba8df 
							
						 
					 
					
						
						
							
							Move WordNet license to correct place  
						
						
						
					 
					
						2016-10-21 01:07:16 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eab2376547 
							
						 
					 
					
						
						
							
							* Allow longer ellipses to be treated as a single token, e.g. Hello......there  
						
						
						
					 
					
						2016-05-09 13:22:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8569dbc2d0 
							
						 
					 
					
						
						
							
							* Add initial stuff for Chinese parsing  
						
						
						
					 
					
						2016-04-24 18:44:24 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fe9299a118 
							
						 
					 
					
						
						
							
							* Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue  #191 , and also Issue  #325 , which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag.  
						
						
						
					 
					
						2016-04-14 12:46:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6f82065761 
							
						 
					 
					
						
						
							
							* Fix infixed commas in tokenizer, re Issue  #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.  
						
						
						
					 
					
						2016-04-14 11:36:03 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							85485f5c2b 
							
						 
					 
					
						
						
							
							Fix inconsistencies in generate_specials.py  
						
						... 
						
						
						
						Re Issue #321 , fix inconsistencies in the script that generates specials.json. The result still isn't so satisfying --- we need to revise this as we move to parse more morphologically rich languages. 
						
					 
					
						2016-04-07 11:21:52 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							910a6c805f 
							
						 
					 
					
						
						
							
							* Add infix rule for double hyphens, re Issue  #302  
						
						
						
					 
					
						2016-03-29 13:03:44 +11:00 
						 
				 
			
				
					
						
							
							
								Wolfgang Seeker 
							
						 
					 
					
						
						
						
						
							
						
						
							eae35e9b27 
							
						 
					 
					
						
						
							
							add tokenizer files for German, add/change code to train German pos tagger  
						
						... 
						
						
						
						- add files to specify rules for German tokenization
- change generate_specials.py to generate from an external file (abbrev.de.tab)
- copy gazetteer.json from lang_data/en/
- init_model.py
	- change doc freq threshold to 0
- add train_german_tagger.py
	- expects conll09-formatted input 
						
					 
					
						2016-02-18 13:24:20 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7cbff48ace 
							
						 
					 
					
						
						
							
							* Set the German lemma rules to be an empty JSON object  
						
						
						
					 
					
						2016-02-02 22:30:51 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d0f06c5cc4 
							
						 
					 
					
						
						
							
							* Add missing tags to the German tag map  
						
						
						
					 
					
						2016-02-02 22:30:22 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6c633f2edc 
							
						 
					 
					
						
						
							
							Fix Issue  #243 : Incorrect gazetteer entry  
						
						
						
					 
					
						2016-01-30 06:58:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4b4eec8b47 
							
						 
					 
					
						
						
							
							* Fix Issue  #201 : Tokenization of there'll  
						
						
						
					 
					
						2015-12-29 18:09:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e8bd92f1e7 
							
						 
					 
					
						
						
							
							* Fix lemma of let's, re Issue  #177  
						
						
						
					 
					
						2015-11-13 06:42:23 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							726bb648da 
							
						 
					 
					
						
						
							
							* Fix non-breaking space in specials.json  
						
						
						
					 
					
						2015-10-19 12:46:11 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e39095da82 
							
						 
					 
					
						
						
							
							* Fix designation of non-breaking space in specials.json.  
						
						
						
					 
					
						2015-10-19 12:39:03 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							454c1996d0 
							
						 
					 
					
						
						
							
							* Add tokenizer rule to fix numeric range tokenization  
						
						
						
					 
					
						2015-10-17 15:49:51 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7488821677 
							
						 
					 
					
						
						
							
							* Map NIL to empty string in tag map  
						
						
						
					 
					
						2015-10-10 22:09:50 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4bbd1388bd 
							
						 
					 
					
						
						
							
							* Whitespace  
						
						
						
					 
					
						2015-10-10 16:03:48 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bdcb8d695c 
							
						 
					 
					
						
						
							
							* Add non-breaking space to specials.json  
						
						
						
					 
					
						2015-10-10 15:54:06 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c12d36d5f4 
							
						 
					 
					
						
						
							
							* Fix quote marks in lemma_rules  
						
						
						
					 
					
						2015-10-10 15:03:36 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							57b3cd4661 
							
						 
					 
					
						
						
							
							* Add smart-quotes to lemma rules  
						
						
						
					 
					
						2015-10-10 14:06:46 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7e7f28e1fd 
							
						 
					 
					
						
						
							
							* Add smart-quote possessive marker in generate_specials  
						
						
						
					 
					
						2015-10-10 14:06:09 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a510858f5a 
							
						 
					 
					
						
						
							
							* Pretty-print specials.json, and add the em dash  
						
						
						
					 
					
						2015-10-09 11:07:45 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							49600a44a8 
							
						 
					 
					
						
						
							
							* Fix trailing comma in lemma_rules.json  
						
						
						
					 
					
						2015-10-09 11:06:57 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0e92e8574a 
							
						 
					 
					
						
						
							
							* Fix pos tag in em-dash in specials  
						
						
						
					 
					
						2015-10-09 11:06:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d341443282 
							
						 
					 
					
						
						
							
							* Remove em-dash from lemma rules. Handle instead in specials.  
						
						
						
					 
					
						2015-10-09 10:27:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b6047afe4c 
							
						 
					 
					
						
						
							
							* Fix punctuation lemma rules, to resolve Issue  #130  
						
						
						
					 
					
						2015-10-09 10:25:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							393a13d1af 
							
						 
					 
					
						
						
							
							* Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue  #130 .  
						
						
						
					 
					
						2015-10-09 19:24:33 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1490feda29 
							
						 
					 
					
						
						
							
							* Make generate_specials pretty-print the specials.json file  
						
						
						
					 
					
						2015-10-09 19:23:47 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1842a53e73 
							
						 
					 
					
						
						
							
							* Lemmatize smart quotes as plain quotes  
						
						
						
					 
					
						2015-10-09 19:09:36 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5332c0b697 
							
						 
					 
					
						
						
							
							* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue  #130  
						
						
						
					 
					
						2015-10-09 18:54:40 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e3e8994368 
							
						 
					 
					
						
						
							
							* Patch italian tag map  
						
						
						
					 
					
						2015-10-08 14:00:13 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2d68f75b6a 
							
						 
					 
					
						
						
							
							* Fix identity tag map  
						
						
						
					 
					
						2015-10-08 13:59:56 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							095831e5bf 
							
						 
					 
					
						
						
							
							* Start adding auxiliaries to morphs.json  
						
						
						
					 
					
						2015-09-27 16:56:34 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c579b6b96c 
							
						 
					 
					
						
						
							
							* Update English morphs.json  
						
						
						
					 
					
						2015-09-24 22:38:41 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3b3547251c 
							
						 
					 
					
						
						
							
							* Fix Issue  #102 : DT tag was mapped to DET.  
						
						
						
					 
					
						2015-09-24 18:38:47 +10:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							be4848fbcb 
							
						 
					 
					
						
						
							
							* Update morphs.json with universal dependencies/interset morphological features  
						
						
						
					 
					
						2015-09-24 00:59:42 +10:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							911de2ae49 
							
						 
					 
					
						
						
							
							add overseen (?) char  
						
						
						
					 
					
						2015-09-22 12:29:47 +02:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9ecb98f30e 
							
						 
					 
					
						
						
							
							basic german rules  
						
						
						
					 
					
						2015-09-22 11:56:29 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b9e31dc245 
							
						 
					 
					
						
						
							
							* Bug fix to gazetteer.json  
						
						
						
					 
					
						2015-09-10 14:50:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							623329b19a 
							
						 
					 
					
						
						
							
							Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop  
						
						
						
					 
					
						2015-09-08 14:27:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							86c888667f 
							
						 
					 
					
						
						
							
							* Merge in changes from de branch  
						
						
						
					 
					
						2015-09-06 19:49:28 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dbf8dce109 
							
						 
					 
					
						
						
							
							Merge branch 'gaz' of ssh://github.com/honnibal/spaCy into gaz  
						
						
						
					 
					
						2015-09-06 18:44:14 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							577418986a 
							
						 
					 
					
						
						
							
							* Add draft Italian stuff  
						
						
						
					 
					
						2015-09-06 18:44:10 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							80a66c0159 
							
						 
					 
					
						
						
							
							* Add draft finnish stuff  
						
						
						
					 
					
						2015-09-06 18:43:44 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b3703836f9 
							
						 
					 
					
						
						
							
							* Add en lemma rules  
						
						
						
					 
					
						2015-09-06 17:56:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							238b2f533b 
							
						 
					 
					
						
						
							
							* Add lemma rules  
						
						
						
					 
					
						2015-09-06 17:55:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c9f2082e3c 
							
						 
					 
					
						
						
							
							* Fix compilation error in en/tag_map.json  
						
						
						
					 
					
						2015-09-06 17:54:51 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0af139e183 
							
						 
					 
					
						
						
							
							* Tagger training now working. Still need to test load/save of model. Morphology still broken.  
						
						
						
					 
					
						2015-08-27 09:16:11 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							56c4e07a59 
							
						 
					 
					
						
						
							
							Update gazetteer.json  
						
						
						
					 
					
						2015-08-27 08:53:48 +10:00