Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12699a1152
							
						
					 | 
					
						
						
							
							* Set initial freqs, to avoid missing values in serializer
						
						
						
						
						
					 | 
					
						2015-07-23 01:16:27 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							317cbbc015
							
						
					 | 
					
						
						
							
							* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.
						
						
						
						
						
					 | 
					
						2015-07-19 15:18:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4dddc8a69b
							
						
					 | 
					
						
						
							
							* Fix type declarations for attr_t. Remove unused id_t.
						
						
						
						
						
					 | 
					
						2015-07-18 22:39:57 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							95e57c2780
							
						
					 | 
					
						
						
							
							* Remove unnecessary key and id properties from Utf8String.
						
						
						
						
						
					 | 
					
						2015-07-17 01:40:18 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6eef0bf9ab
							
						
					 | 
					
						
						
							
							* Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx
						
						
						
						
						
					 | 
					
						2015-07-13 20:20:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							89a91ad726
							
						
					 | 
					
						
						
							
							* Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity
						
						
						
						
						
					 | 
					
						2015-07-09 13:30:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb522496dd
							
						
					 | 
					
						
						
							
							* Rename Tokens to Doc
						
						
						
						
						
					 | 
					
						2015-07-08 18:53:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fb8d50b3d5
							
						
					 | 
					
						
						
							
							Merge branch 'master' of ssh://github.com/honnibal/spaCy
						
						
						
						
						
					 | 
					
						2015-04-30 12:45:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							378c2a6435
							
						
					 | 
					
						
						
							
							* Fix POS model: make it use tag instead of pos in history features
						
						
						
						
						
					 | 
					
						2015-04-29 00:02:53 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							3a8d9b37a6
							
						
					 | 
					
						
						
							
							Remove trailing whitespace
						
						
						
						
						
					 | 
					
						2015-04-19 13:01:38 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c6707778dd
							
						
					 | 
					
						
						
							
							* Fix Issue #51: Handle non-ascii lemmas correctly
						
						
						
						
						
					 | 
					
						2015-04-13 22:28:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							567388e38d
							
						
					 | 
					
						
						
							
							* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8cc3524dc9
							
						
					 | 
					
						
						
							
							* Ws
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							caf046b220
							
						
					 | 
					
						
						
							
							* Hastily add method to apply tags from a list of strings, instead of predicting the tags.
						
						
						
						
						
					 | 
					
						2015-02-23 15:40:17 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0b7e769211
							
						
					 | 
					
						
						
							
							* Add POS tags to support SWBD tag set
						
						
						
						
						
					 | 
					
						2015-02-11 14:08:28 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							312b3a45f3
							
						
					 | 
					
						
						
							
							* Fix issue #19: Allow parsing/pos tagging of empty strings
						
						
						
						
						
					 | 
					
						2015-02-10 10:15:58 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5c3513583d
							
						
					 | 
					
						
						
							
							* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.
						
						
						
						
						
					 | 
					
						2015-02-09 03:57:10 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							be5536d239
							
						
					 | 
					
						
						
							
							* Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON.
						
						
						
						
						
					 | 
					
						2015-02-08 18:36:18 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							56c2ef2982
							
						
					 | 
					
						
						
							
							* Tweak POS features for web text
						
						
						
						
						
					 | 
					
						2015-02-02 11:59:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							024cfd485c
							
						
					 | 
					
						
						
							
							* Pass tag_strings as a tuple, to support new Tokens API
						
						
						
						
						
					 | 
					
						2015-01-31 13:43:37 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							67d6e53a69
							
						
					 | 
					
						
						
							
							* Ensure parser and tagger function correctly when training from missing values, indicated by -1
						
						
						
						
						
					 | 
					
						2015-01-30 14:08:56 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12b034e3ef
							
						
					 | 
					
						
						
							
							* Move POS tag definitions to parts_of_speech.pxd
						
						
						
						
						
					 | 
					
						2015-01-25 16:31:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7431c133d8
							
						
					 | 
					
						
						
							
							* Add error if try to access head and not is_parsed
						
						
						
						
						
					 | 
					
						2015-01-25 15:33:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e857ab7a6
							
						
					 | 
					
						
						
							
							* Fix bug in POS tagger feature
						
						
						
						
						
					 | 
					
						2015-01-25 02:20:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a97bed9359
							
						
					 | 
					
						
						
							
							* Fix POS and dependency label tag names.  Add parse and string navigation functions.
						
						
						
						
						
					 | 
					
						2015-01-24 17:29:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5ed8b2b98f
							
						
					 | 
					
						
						
							
							* Rename sic to orth
						
						
						
						
						
					 | 
					
						2015-01-23 02:08:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c7e44140b
							
						
					 | 
					
						
						
							
							* Work on word vectors, and other stuff
						
						
						
						
						
					 | 
					
						2015-01-17 16:21:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0930892fc1
							
						
					 | 
					
						
						
							
							* Tmp. Working on refactor. Compiles, must hook up lexical feats.
						
						
						
						
						
					 | 
					
						2015-01-14 00:03:48 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							46da3d74d2
							
						
					 | 
					
						
						
							
							* Tmp. Refactoring, introducing a Lexeme PyObject.
						
						
						
						
						
					 | 
					
						2015-01-12 11:23:44 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ce2edd6312
							
						
					 | 
					
						
						
							
							* Tmp commit. Refactoring to create a Python Lexeme class.
						
						
						
						
						
					 | 
					
						2015-01-12 10:26:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3f1944d688
							
						
					 | 
					
						
						
							
							* Make PyPy work
						
						
						
						
						
					 | 
					
						2015-01-05 17:54:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							94034f1112
							
						
					 | 
					
						
						
							
							* Fix encoding in lemmatization
						
						
						
						
						
					 | 
					
						2015-01-05 11:54:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0e4c2ba036
							
						
					 | 
					
						
						
							
							* Fix loading of special morph words
						
						
						
						
						
					 | 
					
						2015-01-03 23:13:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5d9a096e2f
							
						
					 | 
					
						
						
							
							* Some minor clean-up after HastyModel
						
						
						
						
						
					 | 
					
						2014-12-31 19:46:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aafaf58cbe
							
						
					 | 
					
						
						
							
							* Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile.
						
						
						
						
						
					 | 
					
						2014-12-31 19:40:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1a075f77ff
							
						
					 | 
					
						
						
							
							* Don't over-ride pre-loaded POS tags, if set by special-cases
						
						
						
						
						
					 | 
					
						2014-12-30 23:26:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb0b00f819
							
						
					 | 
					
						
						
							
							* Repurporse the Tagger class as a generic Model, wrapping thinc's interface
						
						
						
						
						
					 | 
					
						2014-12-30 21:20:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb80937544
							
						
					 | 
					
						
						
							
							* Upd docstrings
						
						
						
						
						
					 | 
					
						2014-12-27 18:45:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b8b65903fc
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-24 17:42:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b00bc01d8c
							
						
					 | 
					
						
						
							
							* All tests now passing for reorg
						
						
						
						
						
					 | 
					
						2014-12-23 13:18:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							73f200436f
							
						
					 | 
					
						
						
							
							* Tests passing except for morphology/lemmatization stuff
						
						
						
						
						
					 | 
					
						2014-12-23 11:40:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							61df50b598
							
						
					 | 
					
						
						
							
							* Add English-subclass POS tagger
						
						
						
						
						
					 | 
					
						2014-12-21 20:59:07 +11:00 | 
					
					
						
						
							
							
							
						
					 |