Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c20dd79748
							
						
					 | 
					
						
						
							
							* Fiddle with const correctness and comments
						
						
						
						
						
					 | 
					
						2014-12-08 00:03:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b031c7c430
							
						
					 | 
					
						
						
							
							* Remove language-general context module
						
						
						
						
						
					 | 
					
						2014-12-07 23:53:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ef4398b204
							
						
					 | 
					
						
						
							
							* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules
						
						
						
						
						
					 | 
					
						2014-12-07 23:52:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							327383e38a
							
						
					 | 
					
						
						
							
							* Remove unused code in tagger.pyx
						
						
						
						
						
					 | 
					
						2014-12-07 22:16:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f2f319c57
							
						
					 | 
					
						
						
							
							* Add a couple more contractions tests
						
						
						
						
						
					 | 
					
						2014-12-07 22:08:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f17467c2e
							
						
					 | 
					
						
						
							
							* Fix EMPTY_TOKEN
						
						
						
						
						
					 | 
					
						2014-12-07 22:07:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3819a88e1b
							
						
					 | 
					
						
						
							
							* Add support for tag dictionary, and fix error-code for predict method
						
						
						
						
						
					 | 
					
						2014-12-07 22:07:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f00afe12c4
							
						
					 | 
					
						
						
							
							* Load POS tagger in load() function if path exists
						
						
						
						
						
					 | 
					
						2014-12-07 22:05:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							677e111ee7
							
						
					 | 
					
						
						
							
							* Revise tokenization rules to match PTB. Rules are pretty messy around periods, need better support for these.
						
						
						
						
						
					 | 
					
						2014-12-07 22:04:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5fe5e6e66b
							
						
					 | 
					
						
						
							
							* Move context functions to header, inlining them.
						
						
						
						
						
					 | 
					
						2014-12-07 21:59:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							91e8d9ea1c
							
						
					 | 
					
						
						
							
							* Compile context.pyx and tagger.pyx modules
						
						
						
						
						
					 | 
					
						2014-12-07 15:29:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5caabec789
							
						
					 | 
					
						
						
							
							* Link in tagger, to work on integrating POS tagging
						
						
						
						
						
					 | 
					
						2014-12-07 15:29:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0c7aeb9de7
							
						
					 | 
					
						
						
							
							* Begin revising tagger, focussing on POS tagging
						
						
						
						
						
					 | 
					
						2014-12-07 15:29:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f5c4f2eb52
							
						
					 | 
					
						
						
							
							* Revise context, focussing on POS tagging for now
						
						
						
						
						
					 | 
					
						2014-12-07 15:28:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e27b912ef9
							
						
					 | 
					
						
						
							
							* Remove need for confusing _data pointer to be stored on Tokens
						
						
						
						
						
					 | 
					
						2014-12-05 16:31:30 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1c9253701d
							
						
					 | 
					
						
						
							
							* Introduce a TokenC struct, to handle token indices, pos tags and sense tags
						
						
						
						
						
					 | 
					
						2014-12-05 15:56:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							187372c7f3
							
						
					 | 
					
						
						
							
							* Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached
						
						
						
						
						
					 | 
					
						2014-12-05 03:29:50 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							75b8dfb348
							
						
					 | 
					
						
						
							
							* Remove upper_pc from lexeme.pyx
						
						
						
						
						
					 | 
					
						2014-12-04 22:14:34 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a14f9eaf63
							
						
					 | 
					
						
						
							
							* Add index.pyx to setup
						
						
						
						
						
					 | 
					
						2014-12-04 22:14:11 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							49f3780ff5
							
						
					 | 
					
						
						
							
							* Fiddle with lexeme attrs
						
						
						
						
						
					 | 
					
						2014-12-04 21:22:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							564082e48e
							
						
					 | 
					
						
						
							
							* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...
						
						
						
						
						
					 | 
					
						2014-12-04 20:51:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							69bb022204
							
						
					 | 
					
						
						
							
							* Add as_array and count_by method
						
						
						
						
						
					 | 
					
						2014-12-04 20:46:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e1b1f45cc9
							
						
					 | 
					
						
						
							
							* Add STEM attribute to lexeme
						
						
						
						
						
					 | 
					
						2014-12-04 20:46:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d7952634ca
							
						
					 | 
					
						
						
							
							* Make the string-store serve const pointers to Utf8Str
						
						
						
						
						
					 | 
					
						2014-12-03 16:01:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e04c22f8f
							
						
					 | 
					
						
						
							
							* const added to Lexicon interface. Seems to work.
						
						
						
						
						
					 | 
					
						2014-12-03 15:58:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d70d31aa45
							
						
					 | 
					
						
						
							
							* Introduce first attempt at const-ness
						
						
						
						
						
					 | 
					
						2014-12-03 15:44:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d0d812c548
							
						
					 | 
					
						
						
							
							* Hack setup.py to exclude tagger stuff
						
						
						
						
						
					 | 
					
						2014-12-03 11:06:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4560ada85b
							
						
					 | 
					
						
						
							
							* Add typedef for attr_t. Change flag_t to flags_t
						
						
						
						
						
					 | 
					
						2014-12-03 11:06:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e600f7b327
							
						
					 | 
					
						
						
							
							* Move String struct stuff into the utf8string module, from spacy.lang
						
						
						
						
						
					 | 
					
						2014-12-03 11:06:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e170faf5b0
							
						
					 | 
					
						
						
							
							* Hack Tokens to work without tagger.pyx
						
						
						
						
						
					 | 
					
						2014-12-03 11:05:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b463a7eb86
							
						
					 | 
					
						
						
							
							* Make flag-setting a language-specific thing
						
						
						
						
						
					 | 
					
						2014-12-03 11:04:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							71b009e323
							
						
					 | 
					
						
						
							
							* Fix bug in refactored StringStore.__getitem__
						
						
						
						
						
					 | 
					
						2014-12-03 11:02:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							14097311ae
							
						
					 | 
					
						
						
							
							* Make StringStore.__getitem__ accept unicode-typed keys.
						
						
						
						
						
					 | 
					
						2014-12-03 01:33:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							522bb0346e
							
						
					 | 
					
						
						
							
							* Work on get_array method of Tokens
						
						
						
						
						
					 | 
					
						2014-12-02 23:48:05 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8c2938fe01
							
						
					 | 
					
						
						
							
							* Rename Lexicon._dict to Lexicon._map
						
						
						
						
						
					 | 
					
						2014-12-02 23:46:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2ee8a1e61f
							
						
					 | 
					
						
						
							
							* Make intro chattier, explain philosophy better
						
						
						
						
						
					 | 
					
						2014-12-02 15:20:18 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ea19850a69
							
						
					 | 
					
						
						
							
							* Add tokenizer section
						
						
						
						
						
					 | 
					
						2014-12-02 04:39:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3430d5f629
							
						
					 | 
					
						
						
							
							* Revise intro copy. Add NLTK comparison
						
						
						
						
						
					 | 
					
						2014-12-01 22:55:13 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							33dfb4933c
							
						
					 | 
					
						
						
							
							* Remove taggers from Language class. Work on doc strings
						
						
						
						
						
					 | 
					
						2014-11-26 19:53:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							80baa2e3db
							
						
					 | 
					
						
						
							
							* Work on beam parser
						
						
						
						
						
					 | 
					
						2014-11-20 19:49:33 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5c3016bac8
							
						
					 | 
					
						
						
							
							* Tmp commit of ner code
						
						
						
						
						
					 | 
					
						2014-11-14 18:27:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							33c421bcf8
							
						
					 | 
					
						
						
							
							* More feature tweaks
						
						
						
						
						
					 | 
					
						2014-11-12 23:59:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							41dedfb14e
							
						
					 | 
					
						
						
							
							* Add label features for NER parsing
						
						
						
						
						
					 | 
					
						2014-11-12 23:55:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cf55b48ba6
							
						
					 | 
					
						
						
							
							* Switch to predict label on shift. Big increase in accuracy.
						
						
						
						
						
					 | 
					
						2014-11-12 23:50:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f84e8a78b
							
						
					 | 
					
						
						
							
							* Neaten oracle
						
						
						
						
						
					 | 
					
						2014-11-12 23:38:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							66cb4f96e1
							
						
					 | 
					
						
						
							
							* Upd gitignore
						
						
						
						
						
					 | 
					
						2014-11-12 23:25:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							60c1e78596
							
						
					 | 
					
						
						
							
							* Commit outstanding tests
						
						
						
						
						
					 | 
					
						2014-11-12 23:24:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e0a9077dd
							
						
					 | 
					
						
						
							
							* Add context files
						
						
						
						
						
					 | 
					
						2014-11-12 23:22:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9b13392ac7
							
						
					 | 
					
						
						
							
							* Add conll experiments
						
						
						
						
						
					 | 
					
						2014-11-12 23:22:05 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b934bf1c69
							
						
					 | 
					
						
						
							
							* Compile IOB
						
						
						
						
						
					 | 
					
						2014-11-12 23:21:40 +11:00 | 
					
					
						
						
							
							
							
						
					 |