Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2b9629ed62
							
						
					 | 
					
						
						
							
							* Begin adding stateclass to ArcEager
						
						
						
						
						
					 | 
					
						2015-06-09 01:41:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ba10fd8af5
							
						
					 | 
					
						
						
							
							* Add StateClass, to replace/refactor the mess in _state
						
						
						
						
						
					 | 
					
						2015-06-09 01:39:54 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c7e3dfc1dc
							
						
					 | 
					
						
						
							
							* Don't automatically push words when stack is empty, as it messes up beam parsing. Add hash method to beam state.
						
						
						
						
						
					 | 
					
						2015-06-08 14:49:04 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							00a0dfcb59
							
						
					 | 
					
						
						
							
							* Avoid shipping the spacy.munge package
						
						
						
						
						
					 | 
					
						2015-06-08 00:54:13 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7d265a9c62
							
						
					 | 
					
						
						
							
							* Revert to wget in spacy.en.download
						
						
						
						
						
					 | 
					
						2015-06-08 00:48:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a8fc5f1285
							
						
					 | 
					
						
						
							
							* Fix munge/read_ner
						
						
						
						
						
					 | 
					
						2015-06-08 00:35:04 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1515862861
							
						
					 | 
					
						
						
							
							* Fix download.py
						
						
						
						
						
					 | 
					
						2015-06-08 00:08:05 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e9e8f654a
							
						
					 | 
					
						
						
							
							* Use urllib in spacy.en.download
						
						
						
						
						
					 | 
					
						2015-06-07 23:51:38 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							80cff41a9c
							
						
					 | 
					
						
						
							
							* Upd download.py
						
						
						
						
						
					 | 
					
						2015-06-07 19:13:28 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6e2564239d
							
						
					 | 
					
						
						
							
							* Bug fixes to beam parser. Search still broken on non-gold sentences
						
						
						
						
						
					 | 
					
						2015-06-07 19:12:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1ec4e6fc95
							
						
					 | 
					
						
						
							
							* Don't score whitespace tokens
						
						
						
						
						
					 | 
					
						2015-06-07 19:10:32 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							731e5f1e46
							
						
					 | 
					
						
						
							
							* Add get() function in spacy/syntax/Config
						
						
						
						
						
					 | 
					
						2015-06-07 19:09:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f142c1838
							
						
					 | 
					
						
						
							
							* Refactor transition system oracles, to split out move and label cost. Preparing to add Unshift move. Will exclude non-monotonic.
						
						
						
						
						
					 | 
					
						2015-06-07 03:21:29 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							89b8775887
							
						
					 | 
					
						
						
							
							* Fix output from _min_edit_path when inputs match.
						
						
						
						
						
					 | 
					
						2015-06-06 05:58:53 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							98cfd84123
							
						
					 | 
					
						
						
							
							* Remove hyphenation from main tokenizer loop: do it in infix.txt instead. This lets emoticons work
						
						
						
						
						
					 | 
					
						2015-06-06 05:57:03 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1fee7ade61
							
						
					 | 
					
						
						
							
							* Tweak to ner
						
						
						
						
						
					 | 
					
						2015-06-05 23:48:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							33e70b167f
							
						
					 | 
					
						
						
							
							* Remove dead code from ner.pyx
						
						
						
						
						
					 | 
					
						2015-06-05 17:12:47 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							88ac5c6e98
							
						
					 | 
					
						
						
							
							* Send beam_width < 0 to greedy parser
						
						
						
						
						
					 | 
					
						2015-06-05 17:12:06 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0114e7600d
							
						
					 | 
					
						
						
							
							* Fix NER oracle
						
						
						
						
						
					 | 
					
						2015-06-05 17:11:26 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c04e6ebca6
							
						
					 | 
					
						
						
							
							* Allow user to load different sized vectors.
						
						
						
						
						
					 | 
					
						2015-06-05 16:26:39 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6bf35cecc3
							
						
					 | 
					
						
						
							
							* Refactor transition system to use classes with staticmethods.
						
						
						
						
						
					 | 
					
						2015-06-05 02:27:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							36a34d544b
							
						
					 | 
					
						
						
							
							* Refactoring arc_eager, grouping oracle functions into transitions
						
						
						
						
						
					 | 
					
						2015-06-04 22:43:03 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4433396005
							
						
					 | 
					
						
						
							
							* Impove efficiency of dynamic oracle, making beam training faster
						
						
						
						
						
					 | 
					
						2015-06-04 21:15:14 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							079dad28a7
							
						
					 | 
					
						
						
							
							* Update for faster beam training
						
						
						
						
						
					 | 
					
						2015-06-04 19:32:32 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f8843906ad
							
						
					 | 
					
						
						
							
							Merge branch 'constituency'
						
						
						
						
						
						
						
						Add beam parsing and training from JSON files, with Levenshtein alignment. 
						
					 | 
					
						2015-06-03 06:07:24 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ae653b850a
							
						
					 | 
					
						
						
							
							* Remove unused import from gold.pyx
						
						
						
						
						
					 | 
					
						2015-06-03 06:07:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a2627b6102
							
						
					 | 
					
						
						
							
							* Fix bug in refactored init_transition
						
						
						
						
						
					 | 
					
						2015-06-03 06:01:26 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dd0867645d
							
						
					 | 
					
						
						
							
							* Remove stray const from State header
						
						
						
						
						
					 | 
					
						2015-06-03 00:10:04 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c47b10a6e
							
						
					 | 
					
						
						
							
							* Make optimization to children_in_buffer: stop searching when we would cross a bracket.
						
						
						
						
						
					 | 
					
						2015-06-02 21:05:24 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a513ec500f
							
						
					 | 
					
						
						
							
							* Have oracle functions take a struct instead of a Python object
						
						
						
						
						
					 | 
					
						2015-06-02 20:01:06 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d1b55310a1
							
						
					 | 
					
						
						
							
							* Refactor _advance_beam function
						
						
						
						
						
					 | 
					
						2015-06-02 18:38:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0786d9b3c7
							
						
					 | 
					
						
						
							
							* Refactor TransitionSystem, adding set_valid method
						
						
						
						
						
					 | 
					
						2015-06-02 18:38:07 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bd82a49994
							
						
					 | 
					
						
						
							
							* Add set_scores method to Model
						
						
						
						
						
					 | 
					
						2015-06-02 18:37:10 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a3964957f6
							
						
					 | 
					
						
						
							
							* Add profiling for _state.pyx
						
						
						
						
						
					 | 
					
						2015-06-02 18:36:27 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e822df0867
							
						
					 | 
					
						
						
							
							* Fix bugs in new greedy/beam parser
						
						
						
						
						
					 | 
					
						2015-06-02 02:01:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							66dfa95847
							
						
					 | 
					
						
						
							
							* Revise greedy_parse/beam_parse ownership goof
						
						
						
						
						
					 | 
					
						2015-06-02 01:34:19 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							75658b2ed3
							
						
					 | 
					
						
						
							
							* Remove use of new beam.loss property, to maintain compatibility with older versions of thinc for now.
						
						
						
						
						
					 | 
					
						2015-06-02 00:57:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7c29362d60
							
						
					 | 
					
						
						
							
							* Rename parser class in parser.pxd, now that beam parsing is supported
						
						
						
						
						
					 | 
					
						2015-06-02 00:53:49 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							58d5ac0944
							
						
					 | 
					
						
						
							
							* Add beam search capabilities to Parser. Rename GreedyParser to Parser.
						
						
						
						
						
					 | 
					
						2015-06-02 00:28:02 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							62424e6c76
							
						
					 | 
					
						
						
							
							* Remove unused regularize argument from _ml.Model
						
						
						
						
						
					 | 
					
						2015-06-02 00:27:07 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							adeb57cb1e
							
						
					 | 
					
						
						
							
							* Fix long line
						
						
						
						
						
					 | 
					
						2015-06-01 23:07:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e09a08bd00
							
						
					 | 
					
						
						
							
							* Add copy_state function
						
						
						
						
						
					 | 
					
						2015-06-01 23:06:30 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c7876aa8b6
							
						
					 | 
					
						
						
							
							* Add get_valid method
						
						
						
						
						
					 | 
					
						2015-06-01 23:06:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d82f9d958d
							
						
					 | 
					
						
						
							
							* Remove regularization cruft from _ml, move score from .pxd file to .pyx
						
						
						
						
						
					 | 
					
						2015-05-31 18:48:05 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5e99ff94c8
							
						
					 | 
					
						
						
							
							* Edits to arc eager oracle. Couldn't figure out how the non-monotonic lines made sense. They seem covered by children_in_stack
						
						
						
						
						
					 | 
					
						2015-05-31 15:14:37 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c5632b71c
							
						
					 | 
					
						
						
							
							* Roll back proposed change to Break transition while investigate effect
						
						
						
						
						
					 | 
					
						2015-05-31 06:49:52 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6bba793df3
							
						
					 | 
					
						
						
							
							* Disable the Zipf-reweighting thing while investigate effect
						
						
						
						
						
					 | 
					
						2015-05-31 06:48:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e77940565d
							
						
					 | 
					
						
						
							
							* Add length cap to distance feature
						
						
						
						
						
					 | 
					
						2015-05-31 05:25:30 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fd596351ba
							
						
					 | 
					
						
						
							
							* Fix valency features
						
						
						
						
						
					 | 
					
						2015-05-31 05:24:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							87d6551d19
							
						
					 | 
					
						
						
							
							* Allow gold parse to cut non-projective arcs
						
						
						
						
						
					 | 
					
						2015-05-31 01:11:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c4f0914b4e
							
						
					 | 
					
						
						
							
							* Fix POS tag evaluation in scorer.py: do evaluate punctuation tags
						
						
						
						
						
					 | 
					
						2015-05-30 18:24:32 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9e39a206da
							
						
					 | 
					
						
						
							
							* Fix efficiency of JSON reading, by using ujson instead of stream
						
						
						
						
						
					 | 
					
						2015-05-30 17:54:52 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							76300bbb1b
							
						
					 | 
					
						
						
							
							* Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag.
						
						
						
						
						
					 | 
					
						2015-05-30 01:25:46 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b76bbbd12c
							
						
					 | 
					
						
						
							
							* Read json files recursively from a directory, instead of requiring a single .json file
						
						
						
						
						
					 | 
					
						2015-05-29 03:52:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f31d3b864
							
						
					 | 
					
						
						
							
							* Relax constraint on Break transition for non-monotonic parsing.
						
						
						
						
						
					 | 
					
						2015-05-28 23:39:52 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6b2e5c4b8a
							
						
					 | 
					
						
						
							
							* Avoid NER scoring for sentences with some missing NER values.
						
						
						
						
						
					 | 
					
						2015-05-28 22:39:08 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d25d31442d
							
						
					 | 
					
						
						
							
							* Hackishly support broken NER annotations. Should fix this.
						
						
						
						
						
					 | 
					
						2015-05-27 19:14:31 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7a2725bca4
							
						
					 | 
					
						
						
							
							* Read input json in a streaming way
						
						
						
						
						
					 | 
					
						2015-05-27 19:13:11 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6a1c91675e
							
						
					 | 
					
						
						
							
							* Add file to read ENAMEX ner data
						
						
						
						
						
					 | 
					
						2015-05-27 17:36:23 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							732fa7709a
							
						
					 | 
					
						
						
							
							* Edits to align_raw script, for use in prepare_treebank
						
						
						
						
						
					 | 
					
						2015-05-27 04:23:31 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4010b9b6d9
							
						
					 | 
					
						
						
							
							* Pass parameter for regularization in parser.pyx
						
						
						
						
						
					 | 
					
						2015-05-27 03:18:50 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4c6058baa7
							
						
					 | 
					
						
						
							
							* Fix evaluation of NER in scorer.py
						
						
						
						
						
					 | 
					
						2015-05-27 03:18:16 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6016ee83a6
							
						
					 | 
					
						
						
							
							* Fix reading of NER in gold.pyx
						
						
						
						
						
					 | 
					
						2015-05-27 03:17:50 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							04bda8648d
							
						
					 | 
					
						
						
							
							* Pass parameter for regularization to model
						
						
						
						
						
					 | 
					
						2015-05-27 03:16:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f69fe6a635
							
						
					 | 
					
						
						
							
							* Fix heads problem in read_conll
						
						
						
						
						
					 | 
					
						2015-05-27 01:14:54 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0eec1d12af
							
						
					 | 
					
						
						
							
							* Add comment about zipf reweighting
						
						
						
						
						
					 | 
					
						2015-05-27 01:14:07 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4d37b66c55
							
						
					 | 
					
						
						
							
							* Make Zipf regularization a bit more efficient
						
						
						
						
						
					 | 
					
						2015-05-27 01:12:50 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7fc24821bc
							
						
					 | 
					
						
						
							
							* Experiment with Zipfian corruptions when calculating prediction
						
						
						
						
						
					 | 
					
						2015-05-26 22:17:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							eba7b34f66
							
						
					 | 
					
						
						
							
							* Add flag to disable loading of word vectors
						
						
						
						
						
					 | 
					
						2015-05-25 01:02:42 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3593babd35
							
						
					 | 
					
						
						
							
							* Add functions for Levenshtein distance alignment
						
						
						
						
						
					 | 
					
						2015-05-24 21:50:48 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							744f06abf5
							
						
					 | 
					
						
						
							
							* Add script to read OntoNotes source documents
						
						
						
						
						
					 | 
					
						2015-05-24 21:49:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fc75210941
							
						
					 | 
					
						
						
							
							* Move spacy.syntax.conll to spacy.gold
						
						
						
						
						
					 | 
					
						2015-05-24 21:35:02 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							765b61cac4
							
						
					 | 
					
						
						
							
							* Update spacy.scorer, to use P/R/F to support tokenization errors
						
						
						
						
						
					 | 
					
						2015-05-24 20:07:18 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							efe7a7d7d6
							
						
					 | 
					
						
						
							
							* Clean unused functions from spacy.syntax.conll
						
						
						
						
						
					 | 
					
						2015-05-24 20:06:46 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							78487f3e66
							
						
					 | 
					
						
						
							
							* Update parser oracle for missing heads
						
						
						
						
						
					 | 
					
						2015-05-24 20:05:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1044a13413
							
						
					 | 
					
						
						
							
							* Begin refactoring scorer to use recall over gold dependencies
						
						
						
						
						
					 | 
					
						2015-05-24 17:40:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							acd1245ad4
							
						
					 | 
					
						
						
							
							* Remove cruft from conll.pyx --- unused stuff about evlauation, which now lives in spacy.scorer
						
						
						
						
						
					 | 
					
						2015-05-24 17:35:49 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							20f1d868a3
							
						
					 | 
					
						
						
							
							* Tmp commit. Working on whole document parsing
						
						
						
						
						
					 | 
					
						2015-05-24 02:49:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f2ee9c4feb
							
						
					 | 
					
						
						
							
							* Comment out constituency parsing stuff, so that code compiles
						
						
						
						
						
					 | 
					
						2015-05-20 16:55:05 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8ee7c541f1
							
						
					 | 
					
						
						
							
							* Update Constituent definition
						
						
						
						
						
					 | 
					
						2015-05-20 16:03:26 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9dfc9c039c
							
						
					 | 
					
						
						
							
							* Work on constituency parsing.
						
						
						
						
						
					 | 
					
						2015-05-20 16:02:51 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5a5710e711
							
						
					 | 
					
						
						
							
							* Fix Span.subtree property
						
						
						
						
						
					 | 
					
						2015-05-13 21:53:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							badf030b6c
							
						
					 | 
					
						
						
							
							* Add parse navigation to Span objects
						
						
						
						
						
					 | 
					
						2015-05-13 21:45:19 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ca320afe86
							
						
					 | 
					
						
						
							
							* Add docstring for ents attribute
						
						
						
						
						
					 | 
					
						2015-05-13 21:20:47 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ba07b925a7
							
						
					 | 
					
						
						
							
							* Fix compile error in conll.pyx
						
						
						
						
						
					 | 
					
						2015-05-12 22:33:47 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f1e0272b18
							
						
					 | 
					
						
						
							
							* Disable c-parsing transitions
						
						
						
						
						
					 | 
					
						2015-05-12 22:33:25 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							03a6626545
							
						
					 | 
					
						
						
							
							* Tmp commit
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9568ebed08
							
						
					 | 
					
						
						
							
							* Fix off-by-one in head reading
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							69840d8cc3
							
						
					 | 
					
						
						
							
							* Tweak verbose output printing in scorer.py
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0605af6838
							
						
					 | 
					
						
						
							
							* Fix head misalignment in read_conll, when periods are ignored
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d2ac8d8007
							
						
					 | 
					
						
						
							
							* Add ctnt field to State, in preparation for constituency parsing
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:56 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ab67693393
							
						
					 | 
					
						
						
							
							* Add read_json_file to conll.pyx
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aff9359a8d
							
						
					 | 
					
						
						
							
							* Update ner.pyx to expect brackets from gold_tuples
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0ad72a77ce
							
						
					 | 
					
						
						
							
							* Write JSON files, with both dependency and PSG parses
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d48218f4b2
							
						
					 | 
					
						
						
							
							* Add left_edge and right_edge properties
						
						
						
						
						
					 | 
					
						2015-05-12 20:27:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							53cf77e1c8
							
						
					 | 
					
						
						
							
							* Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list
						
						
						
						
						
					 | 
					
						2015-05-12 20:26:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a4e2af54f9
							
						
					 | 
					
						
						
							
							* Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible
						
						
						
						
						
					 | 
					
						2015-05-12 20:26:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d634038eb6
							
						
					 | 
					
						
						
							
							* Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token
						
						
						
						
						
					 | 
					
						2015-05-12 20:26:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							03ebf70a66
							
						
					 | 
					
						
						
							
							* Inc version to 0.84
						
						
						
						
						
					 | 
					
						2015-05-12 02:38:51 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e73eaf2d05
							
						
					 | 
					
						
						
							
							* Replace some assertions with proper errors
						
						
						
						
						
					 | 
					
						2015-05-08 16:52:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fb8d50b3d5
							
						
					 | 
					
						
						
							
							Merge branch 'master' of ssh://github.com/honnibal/spaCy
						
						
						
						
						
					 | 
					
						2015-04-30 12:45:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ed8e8c3bd0
							
						
					 | 
					
						
						
							
							* Whitespace
						
						
						
						
						
					 | 
					
						2015-04-29 14:22:47 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							378c2a6435
							
						
					 | 
					
						
						
							
							* Fix POS model: make it use tag instead of pos in history features
						
						
						
						
						
					 | 
					
						2015-04-29 00:02:53 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							763ef01575
							
						
					 | 
					
						
						
							
							* Fix two bugs in feature calculation
						
						
						
						
						
					 | 
					
						2015-04-28 23:25:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b3fd48c97b
							
						
					 | 
					
						
						
							
							* Fix missing root labels bug identified in Issue #57
						
						
						
						
						
					 | 
					
						2015-04-28 20:45:51 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							3a8d9b37a6
							
						
					 | 
					
						
						
							
							Remove trailing whitespace
						
						
						
						
						
					 | 
					
						2015-04-19 13:01:38 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Jordan Suchow
							
						 
					 | 
					
						
						
						
						
							
						
						
							5f0f940a1f
							
						
					 | 
					
						
						
							
							Remove unused imports
						
						
						
						
						
					 | 
					
						2015-04-19 01:05:22 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cc4e395927
							
						
					 | 
					
						
						
							
							* Add some ad hoc regexes, for multi-word location prepositions
						
						
						
						
						
					 | 
					
						2015-04-17 04:44:24 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f7ffd94e6a
							
						
					 | 
					
						
						
							
							* Add Token.conjuncts property
						
						
						
						
						
					 | 
					
						2015-04-17 01:40:53 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							684d0e5e85
							
						
					 | 
					
						
						
							
							* Download updated data
						
						
						
						
						
					 | 
					
						2015-04-16 04:29:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2ef170a991
							
						
					 | 
					
						
						
							
							* Fix Issue #54: Error merging multi-word token when there's a mid-token match.
						
						
						
						
						
					 | 
					
						2015-04-16 04:28:06 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							42617548af
							
						
					 | 
					
						
						
							
							* Disable merge_mwes by default
						
						
						
						
						
					 | 
					
						2015-04-16 04:20:31 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							99dbf8a38c
							
						
					 | 
					
						
						
							
							* Fix error type in lookup_transition
						
						
						
						
						
					 | 
					
						2015-04-16 01:36:22 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							77d0700caf
							
						
					 | 
					
						
						
							
							* Add on X way regexes
						
						
						
						
						
					 | 
					
						2015-04-16 01:35:46 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f16848b60
							
						
					 | 
					
						
						
							
							* Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend'
						
						
						
						
						
					 | 
					
						2015-04-15 06:01:18 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c6707778dd
							
						
					 | 
					
						
						
							
							* Fix Issue #51: Handle non-ascii lemmas correctly
						
						
						
						
						
					 | 
					
						2015-04-13 22:28:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bf0aff5124
							
						
					 | 
					
						
						
							
							* Fix bug in Tokens.ents where entity wasn't being emitted if another started immediately after
						
						
						
						
						
					 | 
					
						2015-04-13 21:34:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2b84a90bbb
							
						
					 | 
					
						
						
							
							* Fix Issue #50: Python 3 compatibility of v0.80
						
						
						
						
						
					 | 
					
						2015-04-13 05:59:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fbd48c571d
							
						
					 | 
					
						
						
							
							* Rearrange code in tokens.pyx
						
						
						
						
						
					 | 
					
						2015-04-13 05:41:25 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							507048dc45
							
						
					 | 
					
						
						
							
							* Rename StandardError to Exception, for Python 3 compatibility
						
						
						
						
						
					 | 
					
						2015-04-12 07:28:34 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							761a19113a
							
						
					 | 
					
						
						
							
							* Fix /tmp moving thing in download.py
						
						
						
						
						
					 | 
					
						2015-04-12 07:04:10 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							248a2b4b0f
							
						
					 | 
					
						
						
							
							* Remove Spans class
						
						
						
						
						
					 | 
					
						2015-04-12 04:07:29 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1d05e6da00
							
						
					 | 
					
						
						
							
							* Add ne_iob and ne_type features to NER
						
						
						
						
						
					 | 
					
						2015-04-10 19:07:08 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4df8a3d90f
							
						
					 | 
					
						
						
							
							* Add ne_iob and ne_type attributes to context vector
						
						
						
						
						
					 | 
					
						2015-04-10 05:02:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8c354c432b
							
						
					 | 
					
						
						
							
							* Add ValueError condition to ner_tag reading
						
						
						
						
						
					 | 
					
						2015-04-10 04:59:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							435cccf098
							
						
					 | 
					
						
						
							
							* Add read_conll03_file function to conll.pyx
						
						
						
						
						
					 | 
					
						2015-04-10 04:59:11 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							99c9ecfc18
							
						
					 | 
					
						
						
							
							* Fix bug in prefix, suffix and word shape features in parser and NER
						
						
						
						
						
					 | 
					
						2015-04-10 03:53:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cff2b13fef
							
						
					 | 
					
						
						
							
							* Fix Issue #44: Broken Token.string attribute when single word sentence
						
						
						
						
						
					 | 
					
						2015-04-07 06:08:25 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6640386b25
							
						
					 | 
					
						
						
							
							* Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.
						
						
						
						
						
					 | 
					
						2015-04-07 06:00:57 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b64b2bd910
							
						
					 | 
					
						
						
							
							* Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.
						
						
						
						
						
					 | 
					
						2015-04-07 06:00:30 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f9e510a893
							
						
					 | 
					
						
						
							
							* Whitespace
						
						
						
						
						
					 | 
					
						2015-04-07 04:53:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							66c7ccf6cc
							
						
					 | 
					
						
						
							
							* Fix Spans.orth_
						
						
						
						
						
					 | 
					
						2015-04-07 04:53:40 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b8d34531c4
							
						
					 | 
					
						
						
							
							* Add support for units to English.__init__, by loading and applying regular expressions
						
						
						
						
						
					 | 
					
						2015-04-07 04:02:32 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0ea5af88b6
							
						
					 | 
					
						
						
							
							* Add multi-word expression RegexMatcher
						
						
						
						
						
					 | 
					
						2015-04-07 03:45:40 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2fee67cfa3
							
						
					 | 
					
						
						
							
							* Add regular expressions for English multi-word expressions
						
						
						
						
						
					 | 
					
						2015-04-07 03:45:18 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5a075ea3fc
							
						
					 | 
					
						
						
							
							* Ensure NER moves are available for single-word tokens
						
						
						
						
						
					 | 
					
						2015-04-05 22:30:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a60a366b2c
							
						
					 | 
					
						
						
							
							* Support 'punct' dep label in conll.pyx
						
						
						
						
						
					 | 
					
						2015-04-05 22:30:19 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							021c972137
							
						
					 | 
					
						
						
							
							* Print parse if verbose in scorer
						
						
						
						
						
					 | 
					
						2015-04-05 22:29:30 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fbf19049cf
							
						
					 | 
					
						
						
							
							* Add ent_type_ property
						
						
						
						
						
					 | 
					
						2015-03-31 02:01:29 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e70b87efeb
							
						
					 | 
					
						
						
							
							* Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data
						
						
						
						
						
					 | 
					
						2015-03-30 01:37:41 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							557856e84c
							
						
					 | 
					
						
						
							
							* Allow regular expressions to specify labels for merged spans
						
						
						
						
						
					 | 
					
						2015-03-27 17:40:52 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a3af6b7c3d
							
						
					 | 
					
						
						
							
							* Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty.
						
						
						
						
						
					 | 
					
						2015-03-27 17:39:16 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							db5a43318c
							
						
					 | 
					
						
						
							
							* Improve print_state debug printer
						
						
						
						
						
					 | 
					
						2015-03-27 17:29:58 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1705eccbbe
							
						
					 | 
					
						
						
							
							* Remove whitespace
						
						
						
						
						
					 | 
					
						2015-03-27 15:22:39 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3feb52374c
							
						
					 | 
					
						
						
							
							* Break apart a condition, for ease of debug printing
						
						
						
						
						
					 | 
					
						2015-03-27 15:21:38 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b32f581acb
							
						
					 | 
					
						
						
							
							* Fix bug in ArcEager.get_labels
						
						
						
						
						
					 | 
					
						2015-03-27 15:21:06 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5f2a4ff36d
							
						
					 | 
					
						
						
							
							* Fix spans.lemma_
						
						
						
						
						
					 | 
					
						2015-03-26 16:45:38 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f4cc222ec3
							
						
					 | 
					
						
						
							
							* Fix NER scoring
						
						
						
						
						
					 | 
					
						2015-03-26 16:45:38 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1320bd19db
							
						
					 | 
					
						
						
							
							* Move Span class to own file
						
						
						
						
						
					 | 
					
						2015-03-26 16:45:38 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6f47a667cf
							
						
					 | 
					
						
						
							
							* Move Span class to own file
						
						
						
						
						
					 | 
					
						2015-03-26 16:45:38 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f02c39dfaf
							
						
					 | 
					
						
						
							
							* Compare to is not None, for more robustness
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:48 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f68b864c4
							
						
					 | 
					
						
						
							
							* Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:48 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e854ba0a13
							
						
					 | 
					
						
						
							
							* Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6a6085f8b9
							
						
					 | 
					
						
						
							
							* Clean up GreedyParser.train function a bit
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b3157927e6
							
						
					 | 
					
						
						
							
							* Clean up unused feature templates
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							411bf377d4
							
						
					 | 
					
						
						
							
							* Remove dependency on ner_util module
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							01c892f583
							
						
					 | 
					
						
						
							
							* Add comment to fill_context
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2741179aff
							
						
					 | 
					
						
						
							
							* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2b2dec95d3
							
						
					 | 
					
						
						
							
							* Add comment to set_parse
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e770fade1e
							
						
					 | 
					
						
						
							
							* Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up...
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							71648205d9
							
						
					 | 
					
						
						
							
							* Add support for debug feature set. Just use unigrams for this.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3b70b304b2
							
						
					 | 
					
						
						
							
							* Add words to gold_tuples from gold conll file
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2e12dec76e
							
						
					 | 
					
						
						
							
							* Adjust scorer to account for tokenization mistakes
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							05d6065e2e
							
						
					 | 
					
						
						
							
							* Add assertion
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							377e9b29b1
							
						
					 | 
					
						
						
							
							* Whitespace
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							670959f40c
							
						
					 | 
					
						
						
							
							* Fix iteration order on Tokens.rights
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							231ce2dae5
							
						
					 | 
					
						
						
							
							* Assign ROOT label by default. May be papering over another bug.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f4ad8fdfb
							
						
					 | 
					
						
						
							
							* Assign root words the ROOT label via the Break transition. Something is still wrong here...
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f729164c01
							
						
					 | 
					
						
						
							
							* Fix bug in label assignment: ensure null-label transitions receive the label 0
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7237c805c7
							
						
					 | 
					
						
						
							
							* Load tag for specials.json token
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:46 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							567388e38d
							
						
					 | 
					
						
						
							
							* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3105c7f8ba
							
						
					 | 
					
						
						
							
							* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							801bf14f4f
							
						
					 | 
					
						
						
							
							* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							31fad99518
							
						
					 | 
					
						
						
							
							* Use StringStore to encode label names, instead of label_ids
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							64db61bff1
							
						
					 | 
					
						
						
							
							* Add Span class to Python API
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b9b695fb1b
							
						
					 | 
					
						
						
							
							* Remove debug word list
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f21ab2d7fb
							
						
					 | 
					
						
						
							
							* Fix bug in ugly ent_strings hack on English class
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:45 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1c843934be
							
						
					 | 
					
						
						
							
							* Fix oracle bug in NER. Now getting 77% F on ontonotes
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							903f196b3f
							
						
					 | 
					
						
						
							
							* Fix verbose printing for scorer
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e181c051d5
							
						
					 | 
					
						
						
							
							* Improve features for NER
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7ecb52c0ed
							
						
					 | 
					
						
						
							
							* Add scorer script
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8057a95f20
							
						
					 | 
					
						
						
							
							* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ae235e07b9
							
						
					 | 
					
						
						
							
							* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b3eda03c9c
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							220ce8bfed
							
						
					 | 
					
						
						
							
							* Prepare English class for NER
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f5830dc1c1
							
						
					 | 
					
						
						
							
							* Remove _transitions.pyx
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6865c2fb4d
							
						
					 | 
					
						
						
							
							* Fix assignment of dep strings in tokens.pyx
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6b6bce9e7a
							
						
					 | 
					
						
						
							
							* Fix label loading for transition system
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5278c7504b
							
						
					 | 
					
						
						
							
							* Hacks to conll.pyx. Should clean these up.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f321b2b2eb
							
						
					 | 
					
						
						
							
							* Remove TODO comment
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fdabd93bfb
							
						
					 | 
					
						
						
							
							* Ensure high loss for invalid moves, and fix label reading for arc-eager
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							10ed738df2
							
						
					 | 
					
						
						
							
							* Tmp commit
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4f83c9b3d5
							
						
					 | 
					
						
						
							
							* Make costs label-sensitive
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							179b7eb0a7
							
						
					 | 
					
						
						
							
							* Specify parser transition system in language
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8c883cef58
							
						
					 | 
					
						
						
							
							* Refactored transition system code now compiling. Still need to hook up label oracle, and test
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:43 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f0159ab4b6
							
						
					 | 
					
						
						
							
							* Add file to hold GoldParse class
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8eadb984cb
							
						
					 | 
					
						
						
							
							* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b063001596
							
						
					 | 
					
						
						
							
							* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							01bc4d6815
							
						
					 | 
					
						
						
							
							* Add set_parse method, to assign parse to tokens in a less hacky way.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dc986dbc0b
							
						
					 | 
					
						
						
							
							* Work on refactored parser, where TransitionSystem can be easily subclassed
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1cc6329b18
							
						
					 | 
					
						
						
							
							* Add base class to do transitions
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							135756ac3d
							
						
					 | 
					
						
						
							
							* Tmp commit of NER refactoring
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							23c1f6fc04
							
						
					 | 
					
						
						
							
							* Merge changes from stash
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0ff078876a
							
						
					 | 
					
						
						
							
							* Commit some work on ner.yx done on the plane
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d81b7be6a2
							
						
					 | 
					
						
						
							
							* Merge train.py
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2e3dc3dfe2
							
						
					 | 
					
						
						
							
							* Merge changes in tokens.pyx
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8cc3524dc9
							
						
					 | 
					
						
						
							
							* Ws
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3d0570685c
							
						
					 | 
					
						
						
							
							* Add NER transition system
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							043b758cf4
							
						
					 | 
					
						
						
							
							* Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference.
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b139aa92ba
							
						
					 | 
					
						
						
							
							* Start setting out how NER will be implemented in the data model
						
						
						
						
						
					 | 
					
						2015-03-26 16:44:41 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0962ffc095
							
						
					 | 
					
						
						
							
							* Fix issue #37: missing check_flag attribute from Token class
						
						
						
						
						
					 | 
					
						2015-03-26 15:06:26 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2e8d0e5d45
							
						
					 | 
					
						
						
							
							* Upd download script
						
						
						
						
						
					 | 
					
						2015-03-03 05:47:16 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dbe26f5793
							
						
					 | 
					
						
						
							
							* Add children and subtree methods to Token, which are generators to assist parse-tree navigation.
						
						
						
						
						
					 | 
					
						2015-03-03 04:18:41 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ea90d136e8
							
						
					 | 
					
						
						
							
							* Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy.
						
						
						
						
						
					 | 
					
						2015-02-27 03:56:10 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							caf046b220
							
						
					 | 
					
						
						
							
							* Hastily add method to apply tags from a list of strings, instead of predicting the tags.
						
						
						
						
						
					 | 
					
						2015-02-23 15:40:17 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cae077b583
							
						
					 | 
					
						
						
							
							* Work on fixing orphaned Token objects bug
						
						
						
						
						
					 | 
					
						2015-02-16 15:20:31 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7572e31f5e
							
						
					 | 
					
						
						
							
							* Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive.
						
						
						
						
						
					 | 
					
						2015-02-11 18:05:06 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							64645a1c2f
							
						
					 | 
					
						
						
							
							* Improve docstring on English
						
						
						
						
						
					 | 
					
						2015-02-11 15:13:20 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							594e50bd45
							
						
					 | 
					
						
						
							
							* Add option to download speech-parsing data set.
						
						
						
						
						
					 | 
					
						2015-02-11 14:20:29 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0b7e769211
							
						
					 | 
					
						
						
							
							* Add POS tags to support SWBD tag set
						
						
						
						
						
					 | 
					
						2015-02-11 14:08:28 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							312b3a45f3
							
						
					 | 
					
						
						
							
							* Fix issue #19: Allow parsing/pos tagging of empty strings
						
						
						
						
						
					 | 
					
						2015-02-10 10:15:58 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2a0615104b
							
						
					 | 
					
						
						
							
							* Upd download script
						
						
						
						
						
					 | 
					
						2015-02-09 10:22:59 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5c3513583d
							
						
					 | 
					
						
						
							
							* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.
						
						
						
						
						
					 | 
					
						2015-02-09 03:57:10 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							be5536d239
							
						
					 | 
					
						
						
							
							* Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON.
						
						
						
						
						
					 | 
					
						2015-02-08 18:36:18 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0492cee8b4
							
						
					 | 
					
						
						
							
							* Fix Issue #24: Lemmas are empty when the L field is missing for special-cased tokens
						
						
						
						
						
					 | 
					
						2015-02-08 18:30:30 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d229fbd228
							
						
					 | 
					
						
						
							
							* Give better error on out-of-bounds array access
						
						
						
						
						
					 | 
					
						2015-02-07 12:59:12 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ab8bb047d0
							
						
					 | 
					
						
						
							
							* Fix negative index for __getitem__
						
						
						
						
						
					 | 
					
						2015-02-07 12:58:46 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							44c7eafe44
							
						
					 | 
					
						
						
							
							* Fix download.py
						
						
						
						
						
					 | 
					
						2015-02-07 12:00:36 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6ca7f2eedc
							
						
					 | 
					
						
						
							
							* Upd download script
						
						
						
						
						
					 | 
					
						2015-02-07 11:32:33 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f0e0588833
							
						
					 | 
					
						
						
							
							* Fill L2 norm attribute on LexemeC struct
						
						
						
						
						
					 | 
					
						2015-02-07 08:44:42 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							75f9b7d6bf
							
						
					 | 
					
						
						
							
							* Add L2 norm field to LexemeC struct
						
						
						
						
						
					 | 
					
						2015-02-07 08:43:17 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							51b618d646
							
						
					 | 
					
						
						
							
							* Add a has_repvec property to Lexeme, and a check function to check flags
						
						
						
						
						
					 | 
					
						2015-02-07 08:42:44 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							321b402739
							
						
					 | 
					
						
						
							
							* Store the l2 norm of the word's vector
						
						
						
						
						
					 | 
					
						2015-02-07 08:42:16 -05:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c7d8644149
							
						
					 | 
					
						
						
							
							* Fix regression on 'prob' attr of Token.
						
						
						
						
						
					 | 
					
						2015-02-03 03:32:18 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c55a33d045
							
						
					 | 
					
						
						
							
							* Catch oracle errors
						
						
						
						
						
					 | 
					
						2015-02-02 23:02:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							de772088e6
							
						
					 | 
					
						
						
							
							* Use parse tree for sbd in Tokens.sents
						
						
						
						
						
					 | 
					
						2015-02-02 12:17:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							56c2ef2982
							
						
					 | 
					
						
						
							
							* Tweak POS features for web text
						
						
						
						
						
					 | 
					
						2015-02-02 11:59:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d68678a93e
							
						
					 | 
					
						
						
							
							* Add Exception class, OracleError
						
						
						
						
						
					 | 
					
						2015-02-02 11:57:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a20fdbd8ee
							
						
					 | 
					
						
						
							
							* Upd download script
						
						
						
						
						
					 | 
					
						2015-02-01 13:22:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							76d9394cb4
							
						
					 | 
					
						
						
							
							* Fix vocab.pyx for Python3
						
						
						
						
						
					 | 
					
						2015-02-01 13:14:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							63abdf154c
							
						
					 | 
					
						
						
							
							* Hastily hack download file
						
						
						
						
						
					 | 
					
						2015-01-31 22:48:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7de00c5a79
							
						
					 | 
					
						
						
							
							* Try not holding a reference to Pool, since that seems to confuse the GC
						
						
						
						
						
					 | 
					
						2015-01-31 22:10:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ce3ae8b5d9
							
						
					 | 
					
						
						
							
							* Fix platform-specific lexicon bug.
						
						
						
						
						
					 | 
					
						2015-01-31 16:38:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a1ed574b7b
							
						
					 | 
					
						
						
							
							* Fix default model path for English
						
						
						
						
						
					 | 
					
						2015-01-31 16:38:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							018e0bfa24
							
						
					 | 
					
						
						
							
							* Bug fixes to parse navigation
						
						
						
						
						
					 | 
					
						2015-01-31 16:37:13 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e013555b25
							
						
					 | 
					
						
						
							
							* Add option to download script
						
						
						
						
						
					 | 
					
						2015-01-31 13:51:56 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							08ca5c8970
							
						
					 | 
					
						
						
							
							* Add sent_end flag to TokenC struct
						
						
						
						
						
					 | 
					
						2015-01-31 13:44:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							024cfd485c
							
						
					 | 
					
						
						
							
							* Pass tag_strings as a tuple, to support new Tokens API
						
						
						
						
						
					 | 
					
						2015-01-31 13:43:37 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							77d62d0179
							
						
					 | 
					
						
						
							
							* Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation.
						
						
						
						
						
					 | 
					
						2015-01-31 13:42:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							88170e6295
							
						
					 | 
					
						
						
							
							* Supply dep_strings as a tuple, for the changed API on Tokens
						
						
						
						
						
					 | 
					
						2015-01-31 13:42:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0981d68022
							
						
					 | 
					
						
						
							
							* Set a sent_end flag during parsing, for later use
						
						
						
						
						
					 | 
					
						2015-01-31 13:41:46 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							251dbf24d7
							
						
					 | 
					
						
						
							
							* Fix unintialised variable error
						
						
						
						
						
					 | 
					
						2015-01-30 20:46:34 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							83a4df5a1a
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-30 20:40:42 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6f9ebc2f34
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-30 20:33:19 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8b85d0bb8a
							
						
					 | 
					
						
						
							
							* Only download small data if no data dir exists
						
						
						
						
						
					 | 
					
						2015-01-30 20:27:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1a7a1c2771
							
						
					 | 
					
						
						
							
							* Fix Issue #16: tokens recurse when printing
						
						
						
						
						
					 | 
					
						2015-01-30 19:47:50 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cb95ef6934
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-30 19:28:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e578bd37bd
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-30 18:59:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							df52014d12
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-30 18:36:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0f95712189
							
						
					 | 
					
						
						
							
							* Improve accuracy reporting during training
						
						
						
						
						
					 | 
					
						2015-01-30 18:05:06 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b68f563c2f
							
						
					 | 
					
						
						
							
							* Fix Issue #14: Improve parsing API
						
						
						
						
						
					 | 
					
						2015-01-30 18:04:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							998b607f65
							
						
					 | 
					
						
						
							
							* Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source
						
						
						
						
						
					 | 
					
						2015-01-30 18:04:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							67d6e53a69
							
						
					 | 
					
						
						
							
							* Ensure parser and tagger function correctly when training from missing values, indicated by -1
						
						
						
						
						
					 | 
					
						2015-01-30 14:08:56 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4ff180db74
							
						
					 | 
					
						
						
							
							* Fix off-by-one error in commit 0a7fceb
						
						
						
						
						
					 | 
					
						2015-01-30 12:49:33 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0a7fcebdf7
							
						
					 | 
					
						
						
							
							* Fix Issue #12: Incorrect token.idx calculations for some punctuation, in the presence of token cache
						
						
						
						
						
					 | 
					
						2015-01-30 12:33:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ebf7d2fab1
							
						
					 | 
					
						
						
							
							* Use non-joint sbd, for more simplicity and fewer classes
						
						
						
						
						
					 | 
					
						2015-01-29 06:22:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d05c5bf141
							
						
					 | 
					
						
						
							
							* Remove comment
						
						
						
						
						
					 | 
					
						2015-01-29 05:19:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							320b045daa
							
						
					 | 
					
						
						
							
							* Oracle now consistent over gold standard derivation
						
						
						
						
						
					 | 
					
						2015-01-29 03:41:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f590382134
							
						
					 | 
					
						
						
							
							* Work on sbd
						
						
						
						
						
					 | 
					
						2015-01-29 03:18:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1884a7a0be
							
						
					 | 
					
						
						
							
							* Attach comment with paper
						
						
						
						
						
					 | 
					
						2015-01-28 03:18:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a2d6b195db
							
						
					 | 
					
						
						
							
							* Add messy Break transitions, carefully following the scheme of Dd Zhang et al (2013)
						
						
						
						
						
					 | 
					
						2015-01-28 03:09:45 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f9ee5d9934
							
						
					 | 
					
						
						
							
							* Build a python list of word strings, for debugging
						
						
						
						
						
					 | 
					
						2015-01-28 01:06:13 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d819101571
							
						
					 | 
					
						
						
							
							* Improve error message on oracle failure
						
						
						
						
						
					 | 
					
						2015-01-28 00:58:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e6c3d3471f
							
						
					 | 
					
						
						
							
							* Tweak documentation for Tokens, and hide constructor as __cinit__
						
						
						
						
						
					 | 
					
						2015-01-27 18:57:52 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c38c62d4a3
							
						
					 | 
					
						
						
							
							* Add docstring to English class
						
						
						
						
						
					 | 
					
						2015-01-27 02:45:21 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d4c99f7dec
							
						
					 | 
					
						
						
							
							* Add attrs.pxd
						
						
						
						
						
					 | 
					
						2015-01-26 22:22:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d4a493855e
							
						
					 | 
					
						
						
							
							* Fix error msg
						
						
						
						
						
					 | 
					
						2015-01-25 23:01:30 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7f87716cf7
							
						
					 | 
					
						
						
							
							* Fix download script
						
						
						
						
						
					 | 
					
						2015-01-25 23:01:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							92fb9257dd
							
						
					 | 
					
						
						
							
							* Add parts-of-speech file
						
						
						
						
						
					 | 
					
						2015-01-25 22:00:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c1c3dba4cb
							
						
					 | 
					
						
						
							
							* Check whether vector files are present before trying to load them.
						
						
						
						
						
					 | 
					
						2015-01-25 18:16:48 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5049d4c2e6
							
						
					 | 
					
						
						
							
							* Add parts_of_speech.pyx
						
						
						
						
						
					 | 
					
						2015-01-25 16:32:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12b034e3ef
							
						
					 | 
					
						
						
							
							* Move POS tag definitions to parts_of_speech.pxd
						
						
						
						
						
					 | 
					
						2015-01-25 16:31:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7431c133d8
							
						
					 | 
					
						
						
							
							* Add error if try to access head and not is_parsed
						
						
						
						
						
					 | 
					
						2015-01-25 15:33:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							951d06c824
							
						
					 | 
					
						
						
							
							* Silently don't parse if data is not present
						
						
						
						
						
					 | 
					
						2015-01-25 14:47:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e857ab7a6
							
						
					 | 
					
						
						
							
							* Fix bug in POS tagger feature
						
						
						
						
						
					 | 
					
						2015-01-25 02:20:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dd56e298e2
							
						
					 | 
					
						
						
							
							* Ensure tagging is applied if parse=True
						
						
						
						
						
					 | 
					
						2015-01-25 02:19:44 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							94750819cd
							
						
					 | 
					
						
						
							
							* Set parse=True by default --- i.e. parse unless told not to.
						
						
						
						
						
					 | 
					
						2015-01-25 01:28:28 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							71b95202eb
							
						
					 | 
					
						
						
							
							* Add docstring to StringStore
						
						
						
						
						
					 | 
					
						2015-01-24 20:49:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6d1c08dafd
							
						
					 | 
					
						
						
							
							* Add docstring to Lexeme
						
						
						
						
						
					 | 
					
						2015-01-24 20:48:34 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a97bed9359
							
						
					 | 
					
						
						
							
							* Fix POS and dependency label tag names.  Add parse and string navigation functions.
						
						
						
						
						
					 | 
					
						2015-01-24 17:29:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							76cd024095
							
						
					 | 
					
						
						
							
							* Add whitespace property to Token
						
						
						
						
						
					 | 
					
						2015-01-24 07:41:21 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5fd72bc220
							
						
					 | 
					
						
						
							
							* Have 'string' refer to the whitespace-padded string
						
						
						
						
						
					 | 
					
						2015-01-24 07:32:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fda94271af
							
						
					 | 
					
						
						
							
							* Rename NORM1 and NORM2 attrs to lower and norm
						
						
						
						
						
					 | 
					
						2015-01-24 06:17:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5ed8b2b98f
							
						
					 | 
					
						
						
							
							* Rename sic to orth
						
						
						
						
						
					 | 
					
						2015-01-23 02:08:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a27b23cc8f
							
						
					 | 
					
						
						
							
							* Have SBD return start/end indices
						
						
						
						
						
					 | 
					
						2015-01-22 22:24:44 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d460c28838
							
						
					 | 
					
						
						
							
							* Rename vec to repvec
						
						
						
						
						
					 | 
					
						2015-01-22 02:06:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8b9d913d97
							
						
					 | 
					
						
						
							
							* Rename vec to repvec
						
						
						
						
						
					 | 
					
						2015-01-22 02:05:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9cd0b6b3e9
							
						
					 | 
					
						
						
							
							* Various tweaks to Tokens class
						
						
						
						
						
					 | 
					
						2015-01-22 02:05:37 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5928d158ce
							
						
					 | 
					
						
						
							
							* Pass the string to Tokens
						
						
						
						
						
					 | 
					
						2015-01-22 02:04:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							45264e356b
							
						
					 | 
					
						
						
							
							* Rename vec to repvec
						
						
						
						
						
					 | 
					
						2015-01-22 02:04:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5e63c606ad
							
						
					 | 
					
						
						
							
							* Rename vec to repvec
						
						
						
						
						
					 | 
					
						2015-01-22 02:03:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							56e6cf0672
							
						
					 | 
					
						
						
							
							* Add _string attr to Tokens object
						
						
						
						
						
					 | 
					
						2015-01-21 18:57:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d6ac60e91c
							
						
					 | 
					
						
						
							
							* Bug fixes to sentences method, and improved vector transport for tokens
						
						
						
						
						
					 | 
					
						2015-01-21 18:56:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f2a229136c
							
						
					 | 
					
						
						
							
							* Fix data_dir=None argument to English class
						
						
						
						
						
					 | 
					
						2015-01-21 18:27:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ef49b8c179
							
						
					 | 
					
						
						
							
							* Add stop-word flag
						
						
						
						
						
					 | 
					
						2015-01-21 18:22:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6646bfc5df
							
						
					 | 
					
						
						
							
							* Add LOWER attr
						
						
						
						
						
					 | 
					
						2015-01-21 18:19:08 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f149259bf5
							
						
					 | 
					
						
						
							
							* Fix negative indices in tokens
						
						
						
						
						
					 | 
					
						2015-01-20 01:16:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b65b0c07bf
							
						
					 | 
					
						
						
							
							* Messily hook up vector in tokens
						
						
						
						
						
					 | 
					
						2015-01-19 19:59:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8ff5b8bd84
							
						
					 | 
					
						
						
							
							* Add attribute for POS scheme
						
						
						
						
						
					 | 
					
						2015-01-17 17:33:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c7e44140b
							
						
					 | 
					
						
						
							
							* Work on word vectors, and other stuff
						
						
						
						
						
					 | 
					
						2015-01-17 16:21:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							802867e96a
							
						
					 | 
					
						
						
							
							* Revise interface to Token. Strings now have attribute names like norm1_
						
						
						
						
						
					 | 
					
						2015-01-15 03:51:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7d3c40de7d
							
						
					 | 
					
						
						
							
							* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme
						
						
						
						
						
					 | 
					
						2015-01-15 00:33:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0930892fc1
							
						
					 | 
					
						
						
							
							* Tmp. Working on refactor. Compiles, must hook up lexical feats.
						
						
						
						
						
					 | 
					
						2015-01-14 00:03:48 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							46da3d74d2
							
						
					 | 
					
						
						
							
							* Tmp. Refactoring, introducing a Lexeme PyObject.
						
						
						
						
						
					 | 
					
						2015-01-12 11:23:44 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ce2edd6312
							
						
					 | 
					
						
						
							
							* Tmp commit. Refactoring to create a Python Lexeme class.
						
						
						
						
						
					 | 
					
						2015-01-12 10:26:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aacaf1a0f0
							
						
					 | 
					
						
						
							
							* Fix parser
						
						
						
						
						
					 | 
					
						2015-01-08 01:19:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9a21127bf7
							
						
					 | 
					
						
						
							
							* Fix parser, which was importing the wrong model
						
						
						
						
						
					 | 
					
						2015-01-08 00:10:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6a3e39cdd1
							
						
					 | 
					
						
						
							
							* Add typedefs.pyx
						
						
						
						
						
					 | 
					
						2015-01-06 04:51:40 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a58920cc5e
							
						
					 | 
					
						
						
							
							* Import orth.word_shape as a C module
						
						
						
						
						
					 | 
					
						2015-01-06 03:18:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6b68f7ef75
							
						
					 | 
					
						
						
							
							* Finally get string types right for orth function
						
						
						
						
						
					 | 
					
						2015-01-06 03:17:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							90c143bd85
							
						
					 | 
					
						
						
							
							* Fix orth import
						
						
						
						
						
					 | 
					
						2015-01-05 18:49:19 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7689dccd0f
							
						
					 | 
					
						
						
							
							* Remove unused import
						
						
						
						
						
					 | 
					
						2015-01-05 18:48:48 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3f1944d688
							
						
					 | 
					
						
						
							
							* Make PyPy work
						
						
						
						
						
					 | 
					
						2015-01-05 17:54:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a510d9f677
							
						
					 | 
					
						
						
							
							* Another assertion removed
						
						
						
						
						
					 | 
					
						2015-01-05 13:01:40 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2856946a66
							
						
					 | 
					
						
						
							
							* Remove assertion that doesn't work on Python 3
						
						
						
						
						
					 | 
					
						2015-01-05 12:51:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							94034f1112
							
						
					 | 
					
						
						
							
							* Fix encoding in lemmatization
						
						
						
						
						
					 | 
					
						2015-01-05 11:54:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b132b3caa6
							
						
					 | 
					
						
						
							
							* Fix unicode error in lemmatizer
						
						
						
						
						
					 | 
					
						2015-01-05 11:53:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							477e7fbffe
							
						
					 | 
					
						
						
							
							* Fix data reading for lemmatizer
						
						
						
						
						
					 | 
					
						2015-01-05 06:01:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							58f75abaca
							
						
					 | 
					
						
						
							
							* Fix unicode error in orth
						
						
						
						
						
					 | 
					
						2015-01-05 05:53:08 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e085d5166
							
						
					 | 
					
						
						
							
							* Fix lemmatizer for Python3
						
						
						
						
						
					 | 
					
						2015-01-05 05:51:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ae7c811fd1
							
						
					 | 
					
						
						
							
							* Use Exception instead of StandardError
						
						
						
						
						
					 | 
					
						2015-01-04 01:22:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0e4c2ba036
							
						
					 | 
					
						
						
							
							* Fix loading of special morph words
						
						
						
						
						
					 | 
					
						2015-01-03 23:13:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f5d41028b5
							
						
					 | 
					
						
						
							
							* Move around data files for test release
						
						
						
						
						
					 | 
					
						2015-01-03 01:59:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a24321b63a
							
						
					 | 
					
						
						
							
							* Add downloader
						
						
						
						
						
					 | 
					
						2015-01-02 21:44:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5d9a096e2f
							
						
					 | 
					
						
						
							
							* Some minor clean-up after HastyModel
						
						
						
						
						
					 | 
					
						2014-12-31 19:46:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aafaf58cbe
							
						
					 | 
					
						
						
							
							* Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile.
						
						
						
						
						
					 | 
					
						2014-12-31 19:40:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bcd038e7b6
							
						
					 | 
					
						
						
							
							* Implement HastyModel
						
						
						
						
						
					 | 
					
						2014-12-31 01:16:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1a075f77ff
							
						
					 | 
					
						
						
							
							* Don't over-ride pre-loaded POS tags, if set by special-cases
						
						
						
						
						
					 | 
					
						2014-12-30 23:26:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							785c7ba76a
							
						
					 | 
					
						
						
							
							* Embed signature on attrs
						
						
						
						
						
					 | 
					
						2014-12-30 23:25:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							30e5805656
							
						
					 | 
					
						
						
							
							* Lazy-load tagger and parser
						
						
						
						
						
					 | 
					
						2014-12-30 23:25:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9976aa976e
							
						
					 | 
					
						
						
							
							* Messily fix morphology and POS tags on special tokens.
						
						
						
						
						
					 | 
					
						2014-12-30 23:24:37 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c1ef3febee
							
						
					 | 
					
						
						
							
							* Embedsignature in tokens.pyx
						
						
						
						
						
					 | 
					
						2014-12-30 21:22:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aac5028b6e
							
						
					 | 
					
						
						
							
							* Move tagger to _ml
						
						
						
						
						
					 | 
					
						2014-12-30 21:21:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1ffb0229ed
							
						
					 | 
					
						
						
							
							* Import tokens in parser.pxd
						
						
						
						
						
					 | 
					
						2014-12-30 21:21:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb0b00f819
							
						
					 | 
					
						
						
							
							* Repurporse the Tagger class as a generic Model, wrapping thinc's interface
						
						
						
						
						
					 | 
					
						2014-12-30 21:20:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fe2a5e0370
							
						
					 | 
					
						
						
							
							* Work on docstrings
						
						
						
						
						
					 | 
					
						2014-12-27 21:46:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb80937544
							
						
					 | 
					
						
						
							
							* Upd docstrings
						
						
						
						
						
					 | 
					
						2014-12-27 18:45:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b8b65903fc
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-24 17:42:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ab61673edd
							
						
					 | 
					
						
						
							
							* Fix api of array method
						
						
						
						
						
					 | 
					
						2014-12-23 15:18:48 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7708d0e24a
							
						
					 | 
					
						
						
							
							* Move lemmatizer to en dir
						
						
						
						
						
					 | 
					
						2014-12-23 15:16:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							98eb4c0426
							
						
					 | 
					
						
						
							
							* Fix path to parser model
						
						
						
						
						
					 | 
					
						2014-12-23 15:09:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b00bc01d8c
							
						
					 | 
					
						
						
							
							* All tests now passing for reorg
						
						
						
						
						
					 | 
					
						2014-12-23 13:18:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							73f200436f
							
						
					 | 
					
						
						
							
							* Tests passing except for morphology/lemmatization stuff
						
						
						
						
						
					 | 
					
						2014-12-23 11:40:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cf8d26c3d2
							
						
					 | 
					
						
						
							
							* POS tagger training working after reorg
						
						
						
						
						
					 | 
					
						2014-12-22 08:54:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4c4aa2c5c9
							
						
					 | 
					
						
						
							
							* Work on train
						
						
						
						
						
					 | 
					
						2014-12-22 07:25:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							61df50b598
							
						
					 | 
					
						
						
							
							* Add English-subclass POS tagger
						
						
						
						
						
					 | 
					
						2014-12-21 20:59:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f3f07cab6
							
						
					 | 
					
						
						
							
							* Add attrs file for English
						
						
						
						
						
					 | 
					
						2014-12-21 11:29:11 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2a89d70429
							
						
					 | 
					
						
						
							
							* Add vocab.pyx to setup, and ensure we can import spacy.en.lang
						
						
						
						
						
					 | 
					
						2014-12-21 06:03:53 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b34a1325d3
							
						
					 | 
					
						
						
							
							* Everything compiling after reorg. About to start testing.
						
						
						
						
						
					 | 
					
						2014-12-21 05:42:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e1c1a4b868
							
						
					 | 
					
						
						
							
							* Tmp
						
						
						
						
						
					 | 
					
						2014-12-21 05:36:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d11c1edf8c
							
						
					 | 
					
						
						
							
							* Import slice_unicode from strings.pyx
						
						
						
						
						
					 | 
					
						2014-12-20 07:56:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							be1bdcbd85
							
						
					 | 
					
						
						
							
							* Move lang.pyx to tokenizer.pyx
						
						
						
						
						
					 | 
					
						2014-12-20 07:55:40 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							89a1cc1a48
							
						
					 | 
					
						
						
							
							* Move murmurhash to .pxd in strings file
						
						
						
						
						
					 | 
					
						2014-12-20 07:41:08 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d5a942c4a4
							
						
					 | 
					
						
						
							
							* Rename lang.pyx to tokenizer.pyx
						
						
						
						
						
					 | 
					
						2014-12-20 07:30:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a60ae261ae
							
						
					 | 
					
						
						
							
							* Move tokenizer to its own file, and refactor
						
						
						
						
						
					 | 
					
						2014-12-20 07:29:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							867a4a000c
							
						
					 | 
					
						
						
							
							* Export set_morph_from_dict function
						
						
						
						
						
					 | 
					
						2014-12-20 07:28:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e30195c6d
							
						
					 | 
					
						
						
							
							* Refactor morphology.pyx
						
						
						
						
						
					 | 
					
						2014-12-20 07:27:28 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4c6ce7ee84
							
						
					 | 
					
						
						
							
							* Update tokens.pyx as part of reorg
						
						
						
						
						
					 | 
					
						2014-12-20 07:03:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							116f7f3bc1
							
						
					 | 
					
						
						
							
							* Rename Lexicon to Vocab, and move it to its own file
						
						
						
						
						
					 | 
					
						2014-12-20 06:54:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							780cbd68b1
							
						
					 | 
					
						
						
							
							* Move all struct definitions to structs.pxd, to avoid circular dependencies
						
						
						
						
						
					 | 
					
						2014-12-20 06:51:33 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f6556d8e5d
							
						
					 | 
					
						
						
							
							* Refactor, move Lexeme struct to structs.pxd
						
						
						
						
						
					 | 
					
						2014-12-20 06:51:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7d48bba6c4
							
						
					 | 
					
						
						
							
							* Move StringStore class to its own file
						
						
						
						
						
					 | 
					
						2014-12-20 06:42:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b066102d2d
							
						
					 | 
					
						
						
							
							* Remove POS cache for now
						
						
						
						
						
					 | 
					
						2014-12-20 03:49:58 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ff252dd535
							
						
					 | 
					
						
						
							
							* Clean up 'guess_cache' idea, which didnt work well enough
						
						
						
						
						
					 | 
					
						2014-12-20 03:49:11 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9d3ca13909
							
						
					 | 
					
						
						
							
							* Start work on parse-tree iteration classes
						
						
						
						
						
					 | 
					
						2014-12-20 03:48:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bed680c632
							
						
					 | 
					
						
						
							
							* Remove commented-out features
						
						
						
						
						
					 | 
					
						2014-12-20 03:47:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3d178c03ae
							
						
					 | 
					
						
						
							
							* Prune the features a bit
						
						
						
						
						
					 | 
					
						2014-12-20 02:46:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a0408e1758
							
						
					 | 
					
						
						
							
							* Working DecisionMemory class
						
						
						
						
						
					 | 
					
						2014-12-20 01:43:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7920ea72b4
							
						
					 | 
					
						
						
							
							* Working parser with the decision memory idea. Disabling that for now, for simplicity
						
						
						
						
						
					 | 
					
						2014-12-20 01:43:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a2f2a48da9
							
						
					 | 
					
						
						
							
							* Add some extra features
						
						
						
						
						
					 | 
					
						2014-12-20 01:42:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8fd9762d91
							
						
					 | 
					
						
						
							
							* Start laying out parse tree iteration methods
						
						
						
						
						
					 | 
					
						2014-12-20 01:42:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							53b8bc1f3c
							
						
					 | 
					
						
						
							
							* Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency
						
						
						
						
						
					 | 
					
						2014-12-19 09:30:50 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							033d6c9ac2
							
						
					 | 
					
						
						
							
							* Adapt POS tagger decision-memory for use in parser
						
						
						
						
						
					 | 
					
						2014-12-19 07:23:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							809ddf7887
							
						
					 | 
					
						
						
							
							* Add index.pxd
						
						
						
						
						
					 | 
					
						2014-12-19 07:23:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1879abd16a
							
						
					 | 
					
						
						
							
							* Set const-correctness for tagger
						
						
						
						
						
					 | 
					
						2014-12-18 20:41:52 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f72243b156
							
						
					 | 
					
						
						
							
							* Set const-correctness for Feature* array
						
						
						
						
						
					 | 
					
						2014-12-18 20:41:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6ab7e40590
							
						
					 | 
					
						
						
							
							* Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set
						
						
						
						
						
					 | 
					
						2014-12-18 11:33:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e0c692daf
							
						
					 | 
					
						
						
							
							* Automatically push when the stack is empty
						
						
						
						
						
					 | 
					
						2014-12-18 09:16:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							61142a8eff
							
						
					 | 
					
						
						
							
							* Tweak features
						
						
						
						
						
					 | 
					
						2014-12-18 09:15:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8446ebfbbb
							
						
					 | 
					
						
						
							
							* Work on parser. Up to 92 UAS on YM labels
						
						
						
						
						
					 | 
					
						2014-12-18 09:05:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							55de747bfc
							
						
					 | 
					
						
						
							
							* Remove .cpp files
						
						
						
						
						
					 | 
					
						2014-12-18 02:43:13 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4448a840f7
							
						
					 | 
					
						
						
							
							* Work on greedy parsing. Scoring about 91.2
						
						
						
						
						
					 | 
					
						2014-12-18 02:42:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							87e9487d76
							
						
					 | 
					
						
						
							
							* Work on parser
						
						
						
						
						
					 | 
					
						2014-12-17 21:10:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9d7d97978d
							
						
					 | 
					
						
						
							
							* Work on greedy parser
						
						
						
						
						
					 | 
					
						2014-12-17 21:09:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d524dd306a
							
						
					 | 
					
						
						
							
							* Work on greedy parser
						
						
						
						
						
					 | 
					
						2014-12-17 03:19:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							95ccea03b2
							
						
					 | 
					
						
						
							
							* Work on greedy parser
						
						
						
						
						
					 | 
					
						2014-12-16 22:46:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a432862fde
							
						
					 | 
					
						
						
							
							* Add exception type to _arg_max_among in tagger
						
						
						
						
						
					 | 
					
						2014-12-16 09:44:19 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9e00798820
							
						
					 | 
					
						
						
							
							* Work on integrating a greedy dependency parser
						
						
						
						
						
					 | 
					
						2014-12-16 08:06:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							792802b2b9
							
						
					 | 
					
						
						
							
							* POS tag memoisation working, with good speed-up
						
						
						
						
						
					 | 
					
						2014-12-12 14:33:51 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ca54d58638
							
						
					 | 
					
						
						
							
							* Merge setup.py
						
						
						
						
						
					 | 
					
						2014-12-10 15:21:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9959a64f7b
							
						
					 | 
					
						
						
							
							* Working morphology and lemmatisation. POS tagging quite fast.
						
						
						
						
						
					 | 
					
						2014-12-10 08:09:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							df3be14987
							
						
					 | 
					
						
						
							
							* Add pos_type features to POS tagger
						
						
						
						
						
					 | 
					
						2014-12-10 08:08:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							42973c4b37
							
						
					 | 
					
						
						
							
							* Improve efficiency of tagger, and improve morphological processing
						
						
						
						
						
					 | 
					
						2014-12-10 01:02:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6b34a2f34b
							
						
					 | 
					
						
						
							
							* Move morphological analysis into its own module, morphology.pyx
						
						
						
						
						
					 | 
					
						2014-12-09 21:16:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b962fe73d7
							
						
					 | 
					
						
						
							
							* Make suffixes file use full-power regex, so that we can handle periods properly
						
						
						
						
						
					 | 
					
						2014-12-09 19:04:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							accdbe989b
							
						
					 | 
					
						
						
							
							* Remove Tokens.extend method
						
						
						
						
						
					 | 
					
						2014-12-09 17:09:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							495e1c7366
							
						
					 | 
					
						
						
							
							* Use fused type in Tokens.push_back, simplifying the use of the cache
						
						
						
						
						
					 | 
					
						2014-12-09 16:50:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							302e09018b
							
						
					 | 
					
						
						
							
							* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas
						
						
						
						
						
					 | 
					
						2014-12-09 14:48:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							99bbbb6feb
							
						
					 | 
					
						
						
							
							* Work on morphological processing
						
						
						
						
						
					 | 
					
						2014-12-08 21:12:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7b68f911cf
							
						
					 | 
					
						
						
							
							* Add WordNet lemmatizer
						
						
						
						
						
					 | 
					
						2014-12-08 01:39:13 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c20dd79748
							
						
					 | 
					
						
						
							
							* Fiddle with const correctness and comments
						
						
						
						
						
					 | 
					
						2014-12-08 00:03:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b031c7c430
							
						
					 | 
					
						
						
							
							* Remove language-general context module
						
						
						
						
						
					 | 
					
						2014-12-07 23:53:01 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ef4398b204
							
						
					 | 
					
						
						
							
							* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules
						
						
						
						
						
					 | 
					
						2014-12-07 23:52:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							327383e38a
							
						
					 | 
					
						
						
							
							* Remove unused code in tagger.pyx
						
						
						
						
						
					 | 
					
						2014-12-07 22:16:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f17467c2e
							
						
					 | 
					
						
						
							
							* Fix EMPTY_TOKEN
						
						
						
						
						
					 | 
					
						2014-12-07 22:07:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3819a88e1b
							
						
					 | 
					
						
						
							
							* Add support for tag dictionary, and fix error-code for predict method
						
						
						
						
						
					 | 
					
						2014-12-07 22:07:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f00afe12c4
							
						
					 | 
					
						
						
							
							* Load POS tagger in load() function if path exists
						
						
						
						
						
					 | 
					
						2014-12-07 22:05:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5fe5e6e66b
							
						
					 | 
					
						
						
							
							* Move context functions to header, inlining them.
						
						
						
						
						
					 | 
					
						2014-12-07 21:59:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5caabec789
							
						
					 | 
					
						
						
							
							* Link in tagger, to work on integrating POS tagging
						
						
						
						
						
					 | 
					
						2014-12-07 15:29:41 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0c7aeb9de7
							
						
					 | 
					
						
						
							
							* Begin revising tagger, focussing on POS tagging
						
						
						
						
						
					 | 
					
						2014-12-07 15:29:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f5c4f2eb52
							
						
					 | 
					
						
						
							
							* Revise context, focussing on POS tagging for now
						
						
						
						
						
					 | 
					
						2014-12-07 15:28:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e27b912ef9
							
						
					 | 
					
						
						
							
							* Remove need for confusing _data pointer to be stored on Tokens
						
						
						
						
						
					 | 
					
						2014-12-05 16:31:30 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							1c9253701d
							
						
					 | 
					
						
						
							
							* Introduce a TokenC struct, to handle token indices, pos tags and sense tags
						
						
						
						
						
					 | 
					
						2014-12-05 15:56:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							187372c7f3
							
						
					 | 
					
						
						
							
							* Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached
						
						
						
						
						
					 | 
					
						2014-12-05 03:29:50 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							75b8dfb348
							
						
					 | 
					
						
						
							
							* Remove upper_pc from lexeme.pyx
						
						
						
						
						
					 | 
					
						2014-12-04 22:14:34 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							49f3780ff5
							
						
					 | 
					
						
						
							
							* Fiddle with lexeme attrs
						
						
						
						
						
					 | 
					
						2014-12-04 21:22:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							564082e48e
							
						
					 | 
					
						
						
							
							* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...
						
						
						
						
						
					 | 
					
						2014-12-04 20:51:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							69bb022204
							
						
					 | 
					
						
						
							
							* Add as_array and count_by method
						
						
						
						
						
					 | 
					
						2014-12-04 20:46:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e1b1f45cc9
							
						
					 | 
					
						
						
							
							* Add STEM attribute to lexeme
						
						
						
						
						
					 | 
					
						2014-12-04 20:46:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d7952634ca
							
						
					 | 
					
						
						
							
							* Make the string-store serve const pointers to Utf8Str
						
						
						
						
						
					 | 
					
						2014-12-03 16:01:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e04c22f8f
							
						
					 | 
					
						
						
							
							* const added to Lexicon interface. Seems to work.
						
						
						
						
						
					 | 
					
						2014-12-03 15:58:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d70d31aa45
							
						
					 | 
					
						
						
							
							* Introduce first attempt at const-ness
						
						
						
						
						
					 | 
					
						2014-12-03 15:44:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4560ada85b
							
						
					 | 
					
						
						
							
							* Add typedef for attr_t. Change flag_t to flags_t
						
						
						
						
						
					 | 
					
						2014-12-03 11:06:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e600f7b327
							
						
					 | 
					
						
						
							
							* Move String struct stuff into the utf8string module, from spacy.lang
						
						
						
						
						
					 | 
					
						2014-12-03 11:06:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e170faf5b0
							
						
					 | 
					
						
						
							
							* Hack Tokens to work without tagger.pyx
						
						
						
						
						
					 | 
					
						2014-12-03 11:05:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b463a7eb86
							
						
					 | 
					
						
						
							
							* Make flag-setting a language-specific thing
						
						
						
						
						
					 | 
					
						2014-12-03 11:04:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							71b009e323
							
						
					 | 
					
						
						
							
							* Fix bug in refactored StringStore.__getitem__
						
						
						
						
						
					 | 
					
						2014-12-03 11:02:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							14097311ae
							
						
					 | 
					
						
						
							
							* Make StringStore.__getitem__ accept unicode-typed keys.
						
						
						
						
						
					 | 
					
						2014-12-03 01:33:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							522bb0346e
							
						
					 | 
					
						
						
							
							* Work on get_array method of Tokens
						
						
						
						
						
					 | 
					
						2014-12-02 23:48:05 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8c2938fe01
							
						
					 | 
					
						
						
							
							* Rename Lexicon._dict to Lexicon._map
						
						
						
						
						
					 | 
					
						2014-12-02 23:46:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							33dfb4933c
							
						
					 | 
					
						
						
							
							* Remove taggers from Language class. Work on doc strings
						
						
						
						
						
					 | 
					
						2014-11-26 19:53:55 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							80baa2e3db
							
						
					 | 
					
						
						
							
							* Work on beam parser
						
						
						
						
						
					 | 
					
						2014-11-20 19:49:33 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5c3016bac8
							
						
					 | 
					
						
						
							
							* Tmp commit of ner code
						
						
						
						
						
					 | 
					
						2014-11-14 18:27:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							33c421bcf8
							
						
					 | 
					
						
						
							
							* More feature tweaks
						
						
						
						
						
					 | 
					
						2014-11-12 23:59:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							41dedfb14e
							
						
					 | 
					
						
						
							
							* Add label features for NER parsing
						
						
						
						
						
					 | 
					
						2014-11-12 23:55:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							cf55b48ba6
							
						
					 | 
					
						
						
							
							* Switch to predict label on shift. Big increase in accuracy.
						
						
						
						
						
					 | 
					
						2014-11-12 23:50:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8f84e8a78b
							
						
					 | 
					
						
						
							
							* Neaten oracle
						
						
						
						
						
					 | 
					
						2014-11-12 23:38:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7e0a9077dd
							
						
					 | 
					
						
						
							
							* Add context files
						
						
						
						
						
					 | 
					
						2014-11-12 23:22:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3b0b902384
							
						
					 | 
					
						
						
							
							* IOB-style parsing working. Accuracy down from BILOU, form 87-88 to 85-86
						
						
						
						
						
					 | 
					
						2014-11-12 23:21:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e6bb8aa3a9
							
						
					 | 
					
						
						
							
							* Move moves to bilou_moves. Refactor context, returning to the simpler giant-enum style
						
						
						
						
						
					 | 
					
						2014-11-12 00:54:50 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c788633429
							
						
					 | 
					
						
						
							
							* Add tokens_from_list method to Language
						
						
						
						
						
					 | 
					
						2014-11-11 23:43:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							95282d4993
							
						
					 | 
					
						
						
							
							* Use the dynamic oracle 'follow' strategy
						
						
						
						
						
					 | 
					
						2014-11-11 21:11:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5aaf7a024d
							
						
					 | 
					
						
						
							
							* Move ner features to ner subdir
						
						
						
						
						
					 | 
					
						2014-11-11 21:09:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ff8989b63c
							
						
					 | 
					
						
						
							
							* Use greedy NER parser
						
						
						
						
						
					 | 
					
						2014-11-11 21:08:35 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0d943ab358
							
						
					 | 
					
						
						
							
							* Fixed greedy NER parsing. With static oracle, replicates accuracy from tagger.
						
						
						
						
						
					 | 
					
						2014-11-11 17:17:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							399239760b
							
						
					 | 
					
						
						
							
							* Fix moves for new State struct
						
						
						
						
						
					 | 
					
						2014-11-10 22:16:05 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							82247169f2
							
						
					 | 
					
						
						
							
							* Implement validation and oracle on pystate, for testing
						
						
						
						
						
					 | 
					
						2014-11-10 22:15:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3709ed9d6d
							
						
					 | 
					
						
						
							
							* Add curr field to State, to handle entity being built
						
						
						
						
						
					 | 
					
						2014-11-10 22:14:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							af9ed18cf1
							
						
					 | 
					
						
						
							
							* Bug fixes to NER
						
						
						
						
						
					 | 
					
						2014-11-10 17:39:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9f2587f5ec
							
						
					 | 
					
						
						
							
							* Work on shift-reduce NER
						
						
						
						
						
					 | 
					
						2014-11-10 16:28:56 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f307eb2e36
							
						
					 | 
					
						
						
							
							* Refactor context extraction, and start breaking out gold standards into their own functions
						
						
						
						
						
					 | 
					
						2014-11-09 15:43:07 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							602f993af9
							
						
					 | 
					
						
						
							
							* Moving tagger to accept multiple correct answers
						
						
						
						
						
					 | 
					
						2014-11-09 15:18:33 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f37d896a42
							
						
					 | 
					
						
						
							
							* Upd NER feats. With adadelta learner, getting 76.9 on NER
						
						
						
						
						
					 | 
					
						2014-11-07 04:43:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							68d1cdad62
							
						
					 | 
					
						
						
							
							* When encoding POS/NER tags, accept '-' as a missing value
						
						
						
						
						
					 | 
					
						2014-11-07 04:42:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							949a6245f9
							
						
					 | 
					
						
						
							
							* Increase default number of iterations from 5 to 10
						
						
						
						
						
					 | 
					
						2014-11-07 04:42:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3cab1d9a29
							
						
					 | 
					
						
						
							
							* Refine word_shape feature, by trimming the max sequence length
						
						
						
						
						
					 | 
					
						2014-11-07 04:41:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b4454cf036
							
						
					 | 
					
						
						
							
							* Add extra context tokens
						
						
						
						
						
					 | 
					
						2014-11-07 04:40:36 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							50309e6e49
							
						
					 | 
					
						
						
							
							* Fix context vector, importing all features
						
						
						
						
						
					 | 
					
						2014-11-05 22:11:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							07a23768de
							
						
					 | 
					
						
						
							
							* Play with NER feats a bit. Up to 82.00 training on MUC7.
						
						
						
						
						
					 | 
					
						2014-11-05 21:47:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4ecbe8c893
							
						
					 | 
					
						
						
							
							* Complete refactor of Tagger features, to use a generic list of context names.
						
						
						
						
						
					 | 
					
						2014-11-05 20:45:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0a8c84625d
							
						
					 | 
					
						
						
							
							* Moving feature context stuff to a generalized place
						
						
						
						
						
					 | 
					
						2014-11-05 19:55:10 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3733444101
							
						
					 | 
					
						
						
							
							* Generalize tagger code, in preparation for NER and supersense tagging.
						
						
						
						
						
					 | 
					
						2014-11-05 03:42:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							abbe3e44b0
							
						
					 | 
					
						
						
							
							* Move spacy.pos tagger to spacy.tagger, and generalize it so that it can take on other tagging tasks, given a different set of feature templates.
						
						
						
						
						
					 | 
					
						2014-11-05 00:37:59 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							954c970415
							
						
					 | 
					
						
						
							
							* Add __iter__ method to tokens
						
						
						
						
						
					 | 
					
						2014-11-04 01:07:08 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f07457a91f
							
						
					 | 
					
						
						
							
							* Remove POS alignment stuff. Now use training data based on raw text, instead of clumsy detokenization stuff
						
						
						
						
						
					 | 
					
						2014-11-04 01:06:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ae52f9f38c
							
						
					 | 
					
						
						
							
							* Remove vocab10k from tokens
						
						
						
						
						
					 | 
					
						2014-11-03 00:23:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							32fb50dc35
							
						
					 | 
					
						
						
							
							* Remove non_sparse method --- features wanting this can do it easily enough.
						
						
						
						
						
					 | 
					
						2014-11-03 00:15:47 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b5ae1471db
							
						
					 | 
					
						
						
							
							* Fiddle with POS tag features
						
						
						
						
						
					 | 
					
						2014-11-03 00:15:03 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							70ea862703
							
						
					 | 
					
						
						
							
							* Remove vocab10k field, and add flags for gazetteers
						
						
						
						
						
					 | 
					
						2014-11-03 00:13:51 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							711ed0f636
							
						
					 | 
					
						
						
							
							* Whitespace
						
						
						
						
						
					 | 
					
						2014-11-02 14:22:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fcd9490d56
							
						
					 | 
					
						
						
							
							* Add pos_tag method to Language
						
						
						
						
						
					 | 
					
						2014-11-02 14:21:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							829bb2bdbe
							
						
					 | 
					
						
						
							
							* Add mappings to Twitter POS tag corpus
						
						
						
						
						
					 | 
					
						2014-11-02 13:21:19 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							437cd2217d
							
						
					 | 
					
						
						
							
							* Fix strings i/o, removing use of ujson library in favour of plain text file. Allows better control of codecs.
						
						
						
						
						
					 | 
					
						2014-11-02 13:20:37 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							3352e89e21
							
						
					 | 
					
						
						
							
							* Use LIKE_URL and LIKE_NUMBER flag features. Seems to improve accuracy on onto web
						
						
						
						
						
					 | 
					
						2014-11-02 13:19:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8335706321
							
						
					 | 
					
						
						
							
							* Add LIKE_URL and LIKE_NUMBER flag features
						
						
						
						
						
					 | 
					
						2014-11-02 13:19:23 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5484fbea69
							
						
					 | 
					
						
						
							
							* Implement is_number
						
						
						
						
						
					 | 
					
						2014-11-01 19:13:24 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f685218e21
							
						
					 | 
					
						
						
							
							* Add is_urlish function
						
						
						
						
						
					 | 
					
						2014-11-01 17:39:34 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							09a3e54176
							
						
					 | 
					
						
						
							
							* Delete print statements from stringstore
						
						
						
						
						
					 | 
					
						2014-10-31 17:45:26 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b186a66bae
							
						
					 | 
					
						
						
							
							* Rename Token.lex_pos to Token.postype, and Token.lex_supersense to Token.sensetype
						
						
						
						
						
					 | 
					
						2014-10-31 17:44:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a8ca078b24
							
						
					 | 
					
						
						
							
							* Restore lexemes field to lexicon
						
						
						
						
						
					 | 
					
						2014-10-31 17:43:25 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6c807aa45f
							
						
					 | 
					
						
						
							
							* Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries
						
						
						
						
						
					 | 
					
						2014-10-31 17:43:00 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							aaf6953fe0
							
						
					 | 
					
						
						
							
							* Add count_tags functionto pos.pyx, which should probably live in another file. Feature set achieves 97.9 on wsj19-21, 95.85 on onto web.
						
						
						
						
						
					 | 
					
						2014-10-31 17:42:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							f67cb9a5a3
							
						
					 | 
					
						
						
							
							* Add count_tags functionto pos.pyx, which should probably live in another file. Feature set achieves 97.9 on wsj19-21, 95.85 on onto web.
						
						
						
						
						
					 | 
					
						2014-10-31 17:42:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ea8f1e7053
							
						
					 | 
					
						
						
							
							* Tighten interfaces
						
						
						
						
						
					 | 
					
						2014-10-30 18:14:42 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ea85bf3a0a
							
						
					 | 
					
						
						
							
							* Tighten the interface to Language
						
						
						
						
						
					 | 
					
						2014-10-30 18:01:27 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c6fcd03692
							
						
					 | 
					
						
						
							
							* Small efficiency tweak to lexeme init
						
						
						
						
						
					 | 
					
						2014-10-30 17:56:11 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							87c2418a89
							
						
					 | 
					
						
						
							
							* Fiddle with data types on Lexeme, to compress them to a much smaller size.
						
						
						
						
						
					 | 
					
						2014-10-30 15:42:15 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							ac88893232
							
						
					 | 
					
						
						
							
							* Fix Token after lexeme changes
						
						
						
						
						
					 | 
					
						2014-10-30 15:30:52 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e6b87766fe
							
						
					 | 
					
						
						
							
							* Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme
						
						
						
						
						
					 | 
					
						2014-10-30 15:21:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							889b7b48b4
							
						
					 | 
					
						
						
							
							* Fix POS tagger, so that it loads correctly. Lexemes are being read in.
						
						
						
						
						
					 | 
					
						2014-10-30 13:38:55 +11:00 | 
					
					
						
						
							
							
							
						
					 |