| 
							
							
								 Matthew Honnibal | 0ceb1f71c2 | * Update parse features | 2015-07-08 19:11:36 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb522496dd | * Rename Tokens to Doc | 2015-07-08 18:53:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ff885e8511 | * Add ParserFactory convenience function | 2015-07-08 12:35:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 52fd80c6c6 | * Add experimental supersense features for parsing, based on lookup into wordnet. | 2015-07-01 20:12:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e20106fdff | * Begin reorganizing neuralnet work | 2015-06-30 14:26:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3bb5876c5a | * Inline methods in StateClass | 2015-06-29 01:10:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 313a7f87b3 | * Inline methods in StateClass | 2015-06-29 01:06:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a02fd3af5d | * Check valency in L and R feature methods, to make feaure calculation faster | 2015-06-29 00:27:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d870720bc | * Check valency in L and R feature methods, to make feaure calculation faster | 2015-06-29 00:17:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f4986d5d3c | * Use new Example class | 2015-06-28 22:36:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 735f1af91f | * Fix neural net stuff | 2015-06-28 11:44:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e7003f1cf3 | * Remove hard-coding of vector lengths | 2015-06-28 11:37:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 897dd0dd0b | * Merge changes, and adjust Example to use memoryview | 2015-06-28 11:36:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9282a8e72c | * Prepare for new models to be plugged in by using Example class | 2015-06-28 11:02:35 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 75aeccc064 | * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search | 2015-06-28 11:02:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bbef71f213 | * Fix min function in fill_context | 2015-06-28 10:46:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 142b6f9510 | * Revert last changes | 2015-06-28 10:44:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b06962f18b | * Pad buffers in state | 2015-06-28 10:36:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 53be72387c | * Hack at fill_context to investigate performance loss | 2015-06-28 10:34:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 71a4e876a9 | * Fix parse features | 2015-06-28 09:27:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5af500909c | * Remove unused directve from parser.pyx | 2015-06-28 06:20:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d5b4090705 | * Add profile directive | 2015-06-28 06:19:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2b5421e60c | * Add profile directive | 2015-06-28 06:07:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8b5de4a411 | * Add word / tag / label sets, for use in neural net | 2015-06-28 05:46:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ed40a8380e | * Remove hard-coding of vector lengths | 2015-06-27 04:18:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ebe630cc8d | * Enable more features for NN | 2015-06-27 04:17:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f8bb43475e | * Bridge to Theano working. Very disorganised. Using thinc adb60aba966ed2 | 2015-06-27 02:39:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2fe98b8a9a | * Prepare for new models to be plugged in by using Example class | 2015-06-26 13:51:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6896455884 | * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search | 2015-06-26 06:25:36 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 02b171ee67 | * Bug fixes to edge calculation | 2015-06-24 04:28:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7f9384f53c | * Remove deprecated _state module | 2015-06-23 17:28:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6dbe182491 | * Fix merge conflicts | 2015-06-23 17:28:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 579735a095 | * Remove import of _state module | 2015-06-23 17:25:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 88f55d136b | * Remove deprecated _state module | 2015-06-23 17:19:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9ab9dd2bf7 | * Clean up unused orig_arc_eager and tree_arc_eager modules, which were only added for EMNLP experiments | 2015-06-23 17:17:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7ebfe4b983 | * Fixes to edge features | 2015-06-23 16:32:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7b125f5a86 | * Fixes to edge features | 2015-06-23 16:31:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 35c290bee4 | * Fix edge features | 2015-06-23 15:50:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 221e2e485f | * Assign 'ROOT' as label, not 'root' | 2015-06-23 15:09:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a7bf7b0626 | * Rename sent_start to sent_end, to reflect its new usage in the Break transition | 2015-06-23 05:39:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ee3e56f27b | * Fix bounds checking on entities | 2015-06-23 04:35:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 43ef5ddea5 | * Ensure root albel is spelled ROOT, for backwards compatibility | 2015-06-23 04:14:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 065c2e1d2d | * Add some bounds checking around state arrays | 2015-06-23 04:13:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f01b3d043e | * Add padding to arrays in stateclass. May be papering over a deeper bug. | 2015-06-23 03:03:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 69507bc729 | * Re-enable Break transition in arc_eager.pyx | 2015-06-23 00:03:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab110be125 | * Remove debugging in parser.pyx | 2015-06-16 23:37:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9b13d11ab3 | * Fix handling of entities in StateClass | 2015-06-16 23:35:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c40a2c661c | * Add tree_arc_eager | 2015-06-15 08:23:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5da5cf7084 | * Add some more features for S1/S0 | 2015-06-15 04:07:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8156a01bca | * Fix root label for orig_arc_eager | 2015-06-15 02:54:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 21930ede15 | * Switch toggle on USE_ROOT_ARC_SEGMENT | 2015-06-15 02:54:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 38a6afa484 | * Make possibly dubious correction to the unshift oracle | 2015-06-15 02:50:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f66228f253 | * Add some more features, esp for labels | 2015-06-14 21:18:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3da8e0f317 | * Add orig_arc_eager | 2015-06-14 20:31:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ea8a103007 | * Fix import of TransitionSystem in parser.pyx | 2015-06-14 19:01:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e0984ca139 | * Fix valency features in StateClass | 2015-06-14 17:50:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 763cbd23d5 | * Upd stateclass.print_state | 2015-06-14 17:44:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bdd07bf000 | * Fix Break oracle, but disable the Break transition for now, while we finalize the gold-standard experiments | 2015-06-14 17:44:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 399f15fbdf | * Add flag to toggle handling of multi-root inputs without the Break transition. Clear up now unused best_valid stuff. | 2015-06-14 00:28:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 75289b4761 | * Don't refuse to parse single token sentences, incase some transition system needs them, e.g. single word entity. Instead fix error in _init_state. | 2015-06-13 22:55:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 77d7e79c7e | * Fix r/l and distance features. | 2015-06-12 13:06:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 15e177d7a1 | * Fixes to unshift/fast-forward strategy. Getting 91.55 greedy on NW dev, gold preproc | 2015-06-12 01:50:23 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | afd77a529b | * Prepare for break transition, with fast-forwarding. 86.5 on 1k nw gold preproc | 2015-06-10 14:08:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 495f528709 | * Add support for sentence breaks in stateclass | 2015-06-10 12:34:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b7b18c279d | * Fix Reduce oracle. Getting 86.35 | 2015-06-10 11:33:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb09b5d91a | * Fix shifted bit vector in stateclass --- should reflect whether the word has been *unshifted*. | 2015-06-10 11:33:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aa9625f688 | * Do non-monotonic Unshift. Every word can be shifted at most 1 time. When the Reduce move is used, if S0 has no head, we put the word back on the buffer. Gets 86.4 on nw 1k with gold pre-proc. Break transition not yet implemented for this. | 2015-06-10 10:15:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7bf6b7de3e | * Add unshift action to StateClass, and track which moves have been shifted | 2015-06-10 10:13:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f7c8069e65 | * Fix bug in distance feature | 2015-06-10 10:12:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | abd07c067a | * Inline B and S methods on stateclass | 2015-06-10 07:22:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e2f9a80713 | * Remove old _state imports | 2015-06-10 07:09:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e9aaecc619 | * Remove from_struct method from StateClass | 2015-06-10 06:58:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 18cc326dc0 | * Bug fixes to ner.pyx | 2015-06-10 06:57:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e5570c9700 | * Set nogil for oracle functions | 2015-06-10 06:56:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4575e7a60f | * Fix beam search with new StateClass | 2015-06-10 06:33:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 04b1cd9b8c | * Greedy parsing working with new StateClass. Beam parsing broken | 2015-06-10 04:20:23 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6a94b64eca | * Remove State* from parser.pyx entirely, switching over to StateClass. Beam parsing still untested. | 2015-06-10 02:03:38 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f14a1526aa | * Remove version of fill_context that takes State* | 2015-06-10 01:39:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d68c686ec1 | * Move StateClass into interface of transition functions | 2015-06-10 01:35:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4b98b3e9c8 | * Cost functions now take StateClass argument, instead of State*. | 2015-06-10 00:40:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e0cf61f591 | * Move StateClass into the interface for is_valid | 2015-06-09 23:23:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0895d454fb | * Prepare to switch to using state class, instead of state struct | 2015-06-09 21:20:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2b9629ed62 | * Begin adding stateclass to ArcEager | 2015-06-09 01:41:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ba10fd8af5 | * Add StateClass, to replace/refactor the mess in _state | 2015-06-09 01:39:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c7e3dfc1dc | * Don't automatically push words when stack is empty, as it messes up beam parsing. Add hash method to beam state. | 2015-06-08 14:49:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6e2564239d | * Bug fixes to beam parser. Search still broken on non-gold sentences | 2015-06-07 19:12:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 731e5f1e46 | * Add get() function in spacy/syntax/Config | 2015-06-07 19:09:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8f142c1838 | * Refactor transition system oracles, to split out move and label cost. Preparing to add Unshift move. Will exclude non-monotonic. | 2015-06-07 03:21:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1fee7ade61 | * Tweak to ner | 2015-06-05 23:48:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 33e70b167f | * Remove dead code from ner.pyx | 2015-06-05 17:12:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 88ac5c6e98 | * Send beam_width < 0 to greedy parser | 2015-06-05 17:12:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0114e7600d | * Fix NER oracle | 2015-06-05 17:11:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6bf35cecc3 | * Refactor transition system to use classes with staticmethods. | 2015-06-05 02:27:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 36a34d544b | * Refactoring arc_eager, grouping oracle functions into transitions | 2015-06-04 22:43:03 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4433396005 | * Impove efficiency of dynamic oracle, making beam training faster | 2015-06-04 21:15:14 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 079dad28a7 | * Update for faster beam training | 2015-06-04 19:32:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a2627b6102 | * Fix bug in refactored init_transition | 2015-06-03 06:01:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dd0867645d | * Remove stray const from State header | 2015-06-03 00:10:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c47b10a6e | * Make optimization to children_in_buffer: stop searching when we would cross a bracket. | 2015-06-02 21:05:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a513ec500f | * Have oracle functions take a struct instead of a Python object | 2015-06-02 20:01:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d1b55310a1 | * Refactor _advance_beam function | 2015-06-02 18:38:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0786d9b3c7 | * Refactor TransitionSystem, adding set_valid method | 2015-06-02 18:38:07 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a3964957f6 | * Add profiling for _state.pyx | 2015-06-02 18:36:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e822df0867 | * Fix bugs in new greedy/beam parser | 2015-06-02 02:01:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 66dfa95847 | * Revise greedy_parse/beam_parse ownership goof | 2015-06-02 01:34:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 75658b2ed3 | * Remove use of new beam.loss property, to maintain compatibility with older versions of thinc for now. | 2015-06-02 00:57:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7c29362d60 | * Rename parser class in parser.pxd, now that beam parsing is supported | 2015-06-02 00:53:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 58d5ac0944 | * Add beam search capabilities to Parser. Rename GreedyParser to Parser. | 2015-06-02 00:28:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e09a08bd00 | * Add copy_state function | 2015-06-01 23:06:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c7876aa8b6 | * Add get_valid method | 2015-06-01 23:06:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5e99ff94c8 | * Edits to arc eager oracle. Couldn't figure out how the non-monotonic lines made sense. They seem covered by children_in_stack | 2015-05-31 15:14:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c5632b71c | * Roll back proposed change to Break transition while investigate effect | 2015-05-31 06:49:52 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e77940565d | * Add length cap to distance feature | 2015-05-31 05:25:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fd596351ba | * Fix valency features | 2015-05-31 05:24:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 76300bbb1b | * Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag. | 2015-05-30 01:25:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8f31d3b864 | * Relax constraint on Break transition for non-monotonic parsing. | 2015-05-28 23:39:52 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4010b9b6d9 | * Pass parameter for regularization in parser.pyx | 2015-05-27 03:18:50 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fc75210941 | * Move spacy.syntax.conll to spacy.gold | 2015-05-24 21:35:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | efe7a7d7d6 | * Clean unused functions from spacy.syntax.conll | 2015-05-24 20:06:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 78487f3e66 | * Update parser oracle for missing heads | 2015-05-24 20:05:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | acd1245ad4 | * Remove cruft from conll.pyx --- unused stuff about evlauation, which now lives in spacy.scorer | 2015-05-24 17:35:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 20f1d868a3 | * Tmp commit. Working on whole document parsing | 2015-05-24 02:49:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f2ee9c4feb | * Comment out constituency parsing stuff, so that code compiles | 2015-05-20 16:55:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9dfc9c039c | * Work on constituency parsing. | 2015-05-20 16:02:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ba07b925a7 | * Fix compile error in conll.pyx | 2015-05-12 22:33:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f1e0272b18 | * Disable c-parsing transitions | 2015-05-12 22:33:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 03a6626545 | * Tmp commit | 2015-05-12 20:27:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9568ebed08 | * Fix off-by-one in head reading | 2015-05-12 20:27:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d2ac8d8007 | * Add ctnt field to State, in preparation for constituency parsing | 2015-05-12 20:27:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab67693393 | * Add read_json_file to conll.pyx | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aff9359a8d | * Update ner.pyx to expect brackets from gold_tuples | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 53cf77e1c8 | * Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list | 2015-05-12 20:26:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a4e2af54f9 | * Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible | 2015-05-12 20:26:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fb8d50b3d5 | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-04-30 12:45:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ed8e8c3bd0 | * Whitespace | 2015-04-29 14:22:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 763ef01575 | * Fix two bugs in feature calculation | 2015-04-28 23:25:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3fd48c97b | * Fix missing root labels bug identified in Issue #57 | 2015-04-28 20:45:51 +02:00 |  | 
			
				
					| 
							
							
								 Jordan Suchow | 3a8d9b37a6 | Remove trailing whitespace | 2015-04-19 13:01:38 -07:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99dbf8a38c | * Fix error type in lookup_transition | 2015-04-16 01:36:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f16848b60 | * Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend' | 2015-04-15 06:01:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 507048dc45 | * Rename StandardError to Exception, for Python 3 compatibility | 2015-04-12 07:28:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1d05e6da00 | * Add ne_iob and ne_type features to NER | 2015-04-10 19:07:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4df8a3d90f | * Add ne_iob and ne_type attributes to context vector | 2015-04-10 05:02:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8c354c432b | * Add ValueError condition to ner_tag reading | 2015-04-10 04:59:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 435cccf098 | * Add read_conll03_file function to conll.pyx | 2015-04-10 04:59:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99c9ecfc18 | * Fix bug in prefix, suffix and word shape features in parser and NER | 2015-04-10 03:53:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5a075ea3fc | * Ensure NER moves are available for single-word tokens | 2015-04-05 22:30:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a60a366b2c | * Support 'punct' dep label in conll.pyx | 2015-04-05 22:30:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a3af6b7c3d | * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. | 2015-03-27 17:39:16 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db5a43318c | * Improve print_state debug printer | 2015-03-27 17:29:58 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1705eccbbe | * Remove whitespace | 2015-03-27 15:22:39 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3feb52374c | * Break apart a condition, for ease of debug printing | 2015-03-27 15:21:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b32f581acb | * Fix bug in ArcEager.get_labels | 2015-03-27 15:21:06 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1320bd19db | * Move Span class to own file | 2015-03-26 16:45:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e854ba0a13 | * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6a6085f8b9 | * Clean up GreedyParser.train function a bit | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3157927e6 | * Clean up unused feature templates | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 411bf377d4 | * Remove dependency on ner_util module | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01c892f583 | * Add comment to fill_context | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2741179aff | * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 71648205d9 | * Add support for debug feature set. Just use unigrams for this. | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3b70b304b2 | * Add words to gold_tuples from gold conll file | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 05d6065e2e | * Add assertion | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 377e9b29b1 | * Whitespace | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f4ad8fdfb | * Assign root words the ROOT label via the Break transition. Something is still wrong here... | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f729164c01 | * Fix bug in label assignment: ensure null-label transitions receive the label 0 | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 31fad99518 | * Use StringStore to encode label names, instead of label_ids | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b9b695fb1b | * Remove debug word list | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1c843934be | * Fix oracle bug in NER. Now getting 77% F on ontonotes | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e181c051d5 | * Improve features for NER | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8057a95f20 | * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae235e07b9 | * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3eda03c9c | * Tmp | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6b6bce9e7a | * Fix label loading for transition system | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5278c7504b | * Hacks to conll.pyx. Should clean these up. | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f321b2b2eb | * Remove TODO comment | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fdabd93bfb | * Ensure high loss for invalid moves, and fix label reading for arc-eager | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 10ed738df2 | * Tmp commit | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4f83c9b3d5 | * Make costs label-sensitive | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8c883cef58 | * Refactored transition system code now compiling. Still need to hook up label oracle, and test | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f0159ab4b6 | * Add file to hold GoldParse class | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8eadb984cb | * Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b063001596 | * Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dc986dbc0b | * Work on refactored parser, where TransitionSystem can be easily subclassed | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 135756ac3d | * Tmp commit of NER refactoring | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0ff078876a | * Commit some work on ner.yx done on the plane | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d81b7be6a2 | * Merge train.py | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d0570685c | * Add NER transition system | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ea90d136e8 | * Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy. | 2015-02-27 03:56:10 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 312b3a45f3 | * Fix issue #19: Allow parsing/pos tagging of empty strings | 2015-02-10 10:15:58 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5c3513583d | * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. | 2015-02-09 03:57:10 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c55a33d045 | * Catch oracle errors | 2015-02-02 23:02:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d68678a93e | * Add Exception class, OracleError | 2015-02-02 11:57:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 88170e6295 | * Supply dep_strings as a tuple, for the changed API on Tokens | 2015-01-31 13:42:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0981d68022 | * Set a sent_end flag during parsing, for later use | 2015-01-31 13:41:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0f95712189 | * Improve accuracy reporting during training | 2015-01-30 18:05:06 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 67d6e53a69 | * Ensure parser and tagger function correctly when training from missing values, indicated by -1 | 2015-01-30 14:08:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ebf7d2fab1 | * Use non-joint sbd, for more simplicity and fewer classes | 2015-01-29 06:22:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d05c5bf141 | * Remove comment | 2015-01-29 05:19:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 320b045daa | * Oracle now consistent over gold standard derivation | 2015-01-29 03:41:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f590382134 | * Work on sbd | 2015-01-29 03:18:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1884a7a0be | * Attach comment with paper | 2015-01-28 03:18:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a2d6b195db | * Add messy Break transitions, carefully following the scheme of Dd Zhang et al (2013) | 2015-01-28 03:09:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f9ee5d9934 | * Build a python list of word strings, for debugging | 2015-01-28 01:06:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d819101571 | * Improve error message on oracle failure | 2015-01-28 00:58:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7431c133d8 | * Add error if try to access head and not is_parsed | 2015-01-25 15:33:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a97bed9359 | * Fix POS and dependency label tag names.  Add parse and string navigation functions. | 2015-01-24 17:29:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ed8b2b98f | * Rename sic to orth | 2015-01-23 02:08:25 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c7e44140b | * Work on word vectors, and other stuff | 2015-01-17 16:21:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aacaf1a0f0 | * Fix parser | 2015-01-08 01:19:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9a21127bf7 | * Fix parser, which was importing the wrong model | 2015-01-08 00:10:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3f1944d688 | * Make PyPy work | 2015-01-05 17:54:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae7c811fd1 | * Use Exception instead of StandardError | 2015-01-04 01:22:12 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d9a096e2f | * Some minor clean-up after HastyModel | 2014-12-31 19:46:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aafaf58cbe | * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. | 2014-12-31 19:40:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1ffb0229ed | * Import tokens in parser.pxd | 2014-12-30 21:21:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bb80937544 | * Upd docstrings | 2014-12-27 18:45:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b8b65903fc | * Tmp | 2014-12-24 17:42:00 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4c4aa2c5c9 | * Work on train | 2014-12-22 07:25:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b34a1325d3 | * Everything compiling after reorg. About to start testing. | 2014-12-21 05:42:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e1c1a4b868 | * Tmp | 2014-12-21 05:36:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ff252dd535 | * Clean up 'guess_cache' idea, which didnt work well enough | 2014-12-20 03:49:11 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bed680c632 | * Remove commented-out features | 2014-12-20 03:47:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d178c03ae | * Prune the features a bit | 2014-12-20 02:46:14 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7920ea72b4 | * Working parser with the decision memory idea. Disabling that for now, for simplicity | 2014-12-20 01:43:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a2f2a48da9 | * Add some extra features | 2014-12-20 01:42:24 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 53b8bc1f3c | * Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency | 2014-12-19 09:30:50 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f72243b156 | * Set const-correctness for Feature* array | 2014-12-18 20:41:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6ab7e40590 | * Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set | 2014-12-18 11:33:25 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7e0c692daf | * Automatically push when the stack is empty | 2014-12-18 09:16:10 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 61142a8eff | * Tweak features | 2014-12-18 09:15:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8446ebfbbb | * Work on parser. Up to 92 UAS on YM labels | 2014-12-18 09:05:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 55de747bfc | * Remove .cpp files | 2014-12-18 02:43:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4448a840f7 | * Work on greedy parsing. Scoring about 91.2 | 2014-12-18 02:42:55 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9d7d97978d | * Work on greedy parser | 2014-12-17 21:09:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d524dd306a | * Work on greedy parser | 2014-12-17 03:19:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 95ccea03b2 | * Work on greedy parser | 2014-12-16 22:46:55 +11:00 |  |