| 
							
							
								 Matthew Honnibal | d2ac8d8007 | * Add ctnt field to State, in preparation for constituency parsing | 2015-05-12 20:27:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab67693393 | * Add read_json_file to conll.pyx | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aff9359a8d | * Update ner.pyx to expect brackets from gold_tuples | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0ad72a77ce | * Write JSON files, with both dependency and PSG parses | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d48218f4b2 | * Add left_edge and right_edge properties | 2015-05-12 20:27:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 53cf77e1c8 | * Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list | 2015-05-12 20:26:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a4e2af54f9 | * Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible | 2015-05-12 20:26:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d634038eb6 | * Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token | 2015-05-12 20:26:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 03ebf70a66 | * Inc version to 0.84 | 2015-05-12 02:38:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e73eaf2d05 | * Replace some assertions with proper errors | 2015-05-08 16:52:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fb8d50b3d5 | Merge branch 'master' of ssh://github.com/honnibal/spaCy | 2015-04-30 12:45:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ed8e8c3bd0 | * Whitespace | 2015-04-29 14:22:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 378c2a6435 | * Fix POS model: make it use tag instead of pos in history features | 2015-04-29 00:02:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 763ef01575 | * Fix two bugs in feature calculation | 2015-04-28 23:25:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3fd48c97b | * Fix missing root labels bug identified in Issue #57 | 2015-04-28 20:45:51 +02:00 |  | 
			
				
					| 
							
							
								 Jordan Suchow | 3a8d9b37a6 | Remove trailing whitespace | 2015-04-19 13:01:38 -07:00 |  | 
			
				
					| 
							
							
								 Jordan Suchow | 5f0f940a1f | Remove unused imports | 2015-04-19 01:05:22 -07:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cc4e395927 | * Add some ad hoc regexes, for multi-word location prepositions | 2015-04-17 04:44:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f7ffd94e6a | * Add Token.conjuncts property | 2015-04-17 01:40:53 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 684d0e5e85 | * Download updated data | 2015-04-16 04:29:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2ef170a991 | * Fix Issue #54: Error merging multi-word token when there's a mid-token match. | 2015-04-16 04:28:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 42617548af | * Disable merge_mwes by default | 2015-04-16 04:20:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99dbf8a38c | * Fix error type in lookup_transition | 2015-04-16 01:36:22 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 77d0700caf | * Add on X way regexes | 2015-04-16 01:35:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f16848b60 | * Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend' | 2015-04-15 06:01:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c6707778dd | * Fix Issue #51: Handle non-ascii lemmas correctly | 2015-04-13 22:28:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bf0aff5124 | * Fix bug in Tokens.ents where entity wasn't being emitted if another started immediately after | 2015-04-13 21:34:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2b84a90bbb | * Fix Issue #50: Python 3 compatibility of v0.80 | 2015-04-13 05:59:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fbd48c571d | * Rearrange code in tokens.pyx | 2015-04-13 05:41:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 507048dc45 | * Rename StandardError to Exception, for Python 3 compatibility | 2015-04-12 07:28:34 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 761a19113a | * Fix /tmp moving thing in download.py | 2015-04-12 07:04:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 248a2b4b0f | * Remove Spans class | 2015-04-12 04:07:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1d05e6da00 | * Add ne_iob and ne_type features to NER | 2015-04-10 19:07:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4df8a3d90f | * Add ne_iob and ne_type attributes to context vector | 2015-04-10 05:02:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8c354c432b | * Add ValueError condition to ner_tag reading | 2015-04-10 04:59:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 435cccf098 | * Add read_conll03_file function to conll.pyx | 2015-04-10 04:59:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99c9ecfc18 | * Fix bug in prefix, suffix and word shape features in parser and NER | 2015-04-10 03:53:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cff2b13fef | * Fix Issue #44: Broken Token.string attribute when single word sentence | 2015-04-07 06:08:25 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6640386b25 | * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. | 2015-04-07 06:00:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b64b2bd910 | * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. | 2015-04-07 06:00:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f9e510a893 | * Whitespace | 2015-04-07 04:53:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 66c7ccf6cc | * Fix Spans.orth_ | 2015-04-07 04:53:40 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b8d34531c4 | * Add support for units to English.__init__, by loading and applying regular expressions | 2015-04-07 04:02:32 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0ea5af88b6 | * Add multi-word expression RegexMatcher | 2015-04-07 03:45:40 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2fee67cfa3 | * Add regular expressions for English multi-word expressions | 2015-04-07 03:45:18 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5a075ea3fc | * Ensure NER moves are available for single-word tokens | 2015-04-05 22:30:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a60a366b2c | * Support 'punct' dep label in conll.pyx | 2015-04-05 22:30:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 021c972137 | * Print parse if verbose in scorer | 2015-04-05 22:29:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fbf19049cf | * Add ent_type_ property | 2015-03-31 02:01:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e70b87efeb | * Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data | 2015-03-30 01:37:41 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 557856e84c | * Allow regular expressions to specify labels for merged spans | 2015-03-27 17:40:52 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a3af6b7c3d | * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. | 2015-03-27 17:39:16 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db5a43318c | * Improve print_state debug printer | 2015-03-27 17:29:58 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1705eccbbe | * Remove whitespace | 2015-03-27 15:22:39 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3feb52374c | * Break apart a condition, for ease of debug printing | 2015-03-27 15:21:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b32f581acb | * Fix bug in ArcEager.get_labels | 2015-03-27 15:21:06 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5f2a4ff36d | * Fix spans.lemma_ | 2015-03-26 16:45:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f4cc222ec3 | * Fix NER scoring | 2015-03-26 16:45:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1320bd19db | * Move Span class to own file | 2015-03-26 16:45:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f47a667cf | * Move Span class to own file | 2015-03-26 16:45:38 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f02c39dfaf | * Compare to is not None, for more robustness | 2015-03-26 16:44:48 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8f68b864c4 | * Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens | 2015-03-26 16:44:48 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e854ba0a13 | * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6a6085f8b9 | * Clean up GreedyParser.train function a bit | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3157927e6 | * Clean up unused feature templates | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 411bf377d4 | * Remove dependency on ner_util module | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01c892f583 | * Add comment to fill_context | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2741179aff | * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2b2dec95d3 | * Add comment to set_parse | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e770fade1e | * Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up... | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 71648205d9 | * Add support for debug feature set. Just use unigrams for this. | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3b70b304b2 | * Add words to gold_tuples from gold conll file | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e12dec76e | * Adjust scorer to account for tokenization mistakes | 2015-03-26 16:44:47 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 05d6065e2e | * Add assertion | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 377e9b29b1 | * Whitespace | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 670959f40c | * Fix iteration order on Tokens.rights | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 231ce2dae5 | * Assign ROOT label by default. May be papering over another bug. | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9f4ad8fdfb | * Assign root words the ROOT label via the Break transition. Something is still wrong here... | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f729164c01 | * Fix bug in label assignment: ensure null-label transitions receive the label 0 | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7237c805c7 | * Load tag for specials.json token | 2015-03-26 16:44:46 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 567388e38d | * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3105c7f8ba | * Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 801bf14f4f | * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 31fad99518 | * Use StringStore to encode label names, instead of label_ids | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 64db61bff1 | * Add Span class to Python API | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b9b695fb1b | * Remove debug word list | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f21ab2d7fb | * Fix bug in ugly ent_strings hack on English class | 2015-03-26 16:44:45 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1c843934be | * Fix oracle bug in NER. Now getting 77% F on ontonotes | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 903f196b3f | * Fix verbose printing for scorer | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e181c051d5 | * Improve features for NER | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7ecb52c0ed | * Add scorer script | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8057a95f20 | * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae235e07b9 | * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3eda03c9c | * Tmp | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 220ce8bfed | * Prepare English class for NER | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f5830dc1c1 | * Remove _transitions.pyx | 2015-03-26 16:44:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6865c2fb4d | * Fix assignment of dep strings in tokens.pyx | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6b6bce9e7a | * Fix label loading for transition system | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5278c7504b | * Hacks to conll.pyx. Should clean these up. | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f321b2b2eb | * Remove TODO comment | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fdabd93bfb | * Ensure high loss for invalid moves, and fix label reading for arc-eager | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 10ed738df2 | * Tmp commit | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4f83c9b3d5 | * Make costs label-sensitive | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 179b7eb0a7 | * Specify parser transition system in language | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8c883cef58 | * Refactored transition system code now compiling. Still need to hook up label oracle, and test | 2015-03-26 16:44:43 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f0159ab4b6 | * Add file to hold GoldParse class | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8eadb984cb | * Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b063001596 | * Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01bc4d6815 | * Add set_parse method, to assign parse to tokens in a less hacky way. | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dc986dbc0b | * Work on refactored parser, where TransitionSystem can be easily subclassed | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1cc6329b18 | * Add base class to do transitions | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 135756ac3d | * Tmp commit of NER refactoring | 2015-03-26 16:44:42 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 23c1f6fc04 | * Merge changes from stash | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0ff078876a | * Commit some work on ner.yx done on the plane | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d81b7be6a2 | * Merge train.py | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e3dc3dfe2 | * Merge changes in tokens.pyx | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8cc3524dc9 | * Ws | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3d0570685c | * Add NER transition system | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 043b758cf4 | * Resurrect old NER code. This version won't be the one that runs; we want to re-use the parser code. But for now this is a useful reference. | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b139aa92ba | * Start setting out how NER will be implemented in the data model | 2015-03-26 16:44:41 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0962ffc095 | * Fix issue #37: missing check_flag attribute from Token class | 2015-03-26 15:06:26 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2e8d0e5d45 | * Upd download script | 2015-03-03 05:47:16 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dbe26f5793 | * Add children and subtree methods to Token, which are generators to assist parse-tree navigation. | 2015-03-03 04:18:41 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ea90d136e8 | * Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy. | 2015-02-27 03:56:10 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | caf046b220 | * Hastily add method to apply tags from a list of strings, instead of predicting the tags. | 2015-02-23 15:40:17 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cae077b583 | * Work on fixing orphaned Token objects bug | 2015-02-16 15:20:31 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7572e31f5e | * Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive. | 2015-02-11 18:05:06 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 64645a1c2f | * Improve docstring on English | 2015-02-11 15:13:20 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 594e50bd45 | * Add option to download speech-parsing data set. | 2015-02-11 14:20:29 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0b7e769211 | * Add POS tags to support SWBD tag set | 2015-02-11 14:08:28 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 312b3a45f3 | * Fix issue #19: Allow parsing/pos tagging of empty strings | 2015-02-10 10:15:58 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2a0615104b | * Upd download script | 2015-02-09 10:22:59 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5c3513583d | * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. | 2015-02-09 03:57:10 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | be5536d239 | * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. | 2015-02-08 18:36:18 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0492cee8b4 | * Fix Issue #24: Lemmas are empty when the L field is missing for special-cased tokens | 2015-02-08 18:30:30 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d229fbd228 | * Give better error on out-of-bounds array access | 2015-02-07 12:59:12 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ab8bb047d0 | * Fix negative index for __getitem__ | 2015-02-07 12:58:46 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 44c7eafe44 | * Fix download.py | 2015-02-07 12:00:36 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6ca7f2eedc | * Upd download script | 2015-02-07 11:32:33 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f0e0588833 | * Fill L2 norm attribute on LexemeC struct | 2015-02-07 08:44:42 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 75f9b7d6bf | * Add L2 norm field to LexemeC struct | 2015-02-07 08:43:17 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 51b618d646 | * Add a has_repvec property to Lexeme, and a check function to check flags | 2015-02-07 08:42:44 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 321b402739 | * Store the l2 norm of the word's vector | 2015-02-07 08:42:16 -05:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c7d8644149 | * Fix regression on 'prob' attr of Token. | 2015-02-03 03:32:18 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c55a33d045 | * Catch oracle errors | 2015-02-02 23:02:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | de772088e6 | * Use parse tree for sbd in Tokens.sents | 2015-02-02 12:17:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 56c2ef2982 | * Tweak POS features for web text | 2015-02-02 11:59:36 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d68678a93e | * Add Exception class, OracleError | 2015-02-02 11:57:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a20fdbd8ee | * Upd download script | 2015-02-01 13:22:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 76d9394cb4 | * Fix vocab.pyx for Python3 | 2015-02-01 13:14:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 63abdf154c | * Hastily hack download file | 2015-01-31 22:48:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7de00c5a79 | * Try not holding a reference to Pool, since that seems to confuse the GC | 2015-01-31 22:10:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ce3ae8b5d9 | * Fix platform-specific lexicon bug. | 2015-01-31 16:38:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a1ed574b7b | * Fix default model path for English | 2015-01-31 16:38:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 018e0bfa24 | * Bug fixes to parse navigation | 2015-01-31 16:37:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e013555b25 | * Add option to download script | 2015-01-31 13:51:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 08ca5c8970 | * Add sent_end flag to TokenC struct | 2015-01-31 13:44:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 024cfd485c | * Pass tag_strings as a tuple, to support new Tokens API | 2015-01-31 13:43:37 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 77d62d0179 | * Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation. | 2015-01-31 13:42:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 88170e6295 | * Supply dep_strings as a tuple, for the changed API on Tokens | 2015-01-31 13:42:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0981d68022 | * Set a sent_end flag during parsing, for later use | 2015-01-31 13:41:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 251dbf24d7 | * Fix unintialised variable error | 2015-01-30 20:46:34 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 83a4df5a1a | * Fix download script | 2015-01-30 20:40:42 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f9ebc2f34 | * Fix download script | 2015-01-30 20:33:19 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8b85d0bb8a | * Only download small data if no data dir exists | 2015-01-30 20:27:14 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1a7a1c2771 | * Fix Issue #16: tokens recurse when printing | 2015-01-30 19:47:50 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cb95ef6934 | * Fix download script | 2015-01-30 19:28:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e578bd37bd | * Fix download script | 2015-01-30 18:59:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | df52014d12 | * Fix download script | 2015-01-30 18:36:24 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0f95712189 | * Improve accuracy reporting during training | 2015-01-30 18:05:06 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b68f563c2f | * Fix Issue #14: Improve parsing API | 2015-01-30 18:04:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 998b607f65 | * Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source | 2015-01-30 18:04:01 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 67d6e53a69 | * Ensure parser and tagger function correctly when training from missing values, indicated by -1 | 2015-01-30 14:08:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4ff180db74 | * Fix off-by-one error in commit 0a7fceb | 2015-01-30 12:49:33 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0a7fcebdf7 | * Fix Issue #12: Incorrect token.idx calculations for some punctuation, in the presence of token cache | 2015-01-30 12:33:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ebf7d2fab1 | * Use non-joint sbd, for more simplicity and fewer classes | 2015-01-29 06:22:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d05c5bf141 | * Remove comment | 2015-01-29 05:19:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 320b045daa | * Oracle now consistent over gold standard derivation | 2015-01-29 03:41:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f590382134 | * Work on sbd | 2015-01-29 03:18:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1884a7a0be | * Attach comment with paper | 2015-01-28 03:18:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a2d6b195db | * Add messy Break transitions, carefully following the scheme of Dd Zhang et al (2013) | 2015-01-28 03:09:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f9ee5d9934 | * Build a python list of word strings, for debugging | 2015-01-28 01:06:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d819101571 | * Improve error message on oracle failure | 2015-01-28 00:58:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e6c3d3471f | * Tweak documentation for Tokens, and hide constructor as __cinit__ | 2015-01-27 18:57:52 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c38c62d4a3 | * Add docstring to English class | 2015-01-27 02:45:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d4c99f7dec | * Add attrs.pxd | 2015-01-26 22:22:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d4a493855e | * Fix error msg | 2015-01-25 23:01:30 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7f87716cf7 | * Fix download script | 2015-01-25 23:01:10 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 92fb9257dd | * Add parts-of-speech file | 2015-01-25 22:00:39 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c1c3dba4cb | * Check whether vector files are present before trying to load them. | 2015-01-25 18:16:48 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5049d4c2e6 | * Add parts_of_speech.pyx | 2015-01-25 16:32:26 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12b034e3ef | * Move POS tag definitions to parts_of_speech.pxd | 2015-01-25 16:31:07 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7431c133d8 | * Add error if try to access head and not is_parsed | 2015-01-25 15:33:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 951d06c824 | * Silently don't parse if data is not present | 2015-01-25 14:47:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e857ab7a6 | * Fix bug in POS tagger feature | 2015-01-25 02:20:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dd56e298e2 | * Ensure tagging is applied if parse=True | 2015-01-25 02:19:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 94750819cd | * Set parse=True by default --- i.e. parse unless told not to. | 2015-01-25 01:28:28 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 71b95202eb | * Add docstring to StringStore | 2015-01-24 20:49:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6d1c08dafd | * Add docstring to Lexeme | 2015-01-24 20:48:34 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a97bed9359 | * Fix POS and dependency label tag names.  Add parse and string navigation functions. | 2015-01-24 17:29:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 76cd024095 | * Add whitespace property to Token | 2015-01-24 07:41:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5fd72bc220 | * Have 'string' refer to the whitespace-padded string | 2015-01-24 07:32:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fda94271af | * Rename NORM1 and NORM2 attrs to lower and norm | 2015-01-24 06:17:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ed8b2b98f | * Rename sic to orth | 2015-01-23 02:08:25 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a27b23cc8f | * Have SBD return start/end indices | 2015-01-22 22:24:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d460c28838 | * Rename vec to repvec | 2015-01-22 02:06:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8b9d913d97 | * Rename vec to repvec | 2015-01-22 02:05:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9cd0b6b3e9 | * Various tweaks to Tokens class | 2015-01-22 02:05:37 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5928d158ce | * Pass the string to Tokens | 2015-01-22 02:04:58 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 45264e356b | * Rename vec to repvec | 2015-01-22 02:04:24 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5e63c606ad | * Rename vec to repvec | 2015-01-22 02:03:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 56e6cf0672 | * Add _string attr to Tokens object | 2015-01-21 18:57:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d6ac60e91c | * Bug fixes to sentences method, and improved vector transport for tokens | 2015-01-21 18:56:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f2a229136c | * Fix data_dir=None argument to English class | 2015-01-21 18:27:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ef49b8c179 | * Add stop-word flag | 2015-01-21 18:22:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6646bfc5df | * Add LOWER attr | 2015-01-21 18:19:08 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f149259bf5 | * Fix negative indices in tokens | 2015-01-20 01:16:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b65b0c07bf | * Messily hook up vector in tokens | 2015-01-19 19:59:55 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8ff5b8bd84 | * Add attribute for POS scheme | 2015-01-17 17:33:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c7e44140b | * Work on word vectors, and other stuff | 2015-01-17 16:21:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 802867e96a | * Revise interface to Token. Strings now have attribute names like norm1_ | 2015-01-15 03:51:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7d3c40de7d | * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme | 2015-01-15 00:33:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0930892fc1 | * Tmp. Working on refactor. Compiles, must hook up lexical feats. | 2015-01-14 00:03:48 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 46da3d74d2 | * Tmp. Refactoring, introducing a Lexeme PyObject. | 2015-01-12 11:23:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ce2edd6312 | * Tmp commit. Refactoring to create a Python Lexeme class. | 2015-01-12 10:26:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aacaf1a0f0 | * Fix parser | 2015-01-08 01:19:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9a21127bf7 | * Fix parser, which was importing the wrong model | 2015-01-08 00:10:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6a3e39cdd1 | * Add typedefs.pyx | 2015-01-06 04:51:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a58920cc5e | * Import orth.word_shape as a C module | 2015-01-06 03:18:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6b68f7ef75 | * Finally get string types right for orth function | 2015-01-06 03:17:39 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 90c143bd85 | * Fix orth import | 2015-01-05 18:49:19 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7689dccd0f | * Remove unused import | 2015-01-05 18:48:48 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3f1944d688 | * Make PyPy work | 2015-01-05 17:54:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a510d9f677 | * Another assertion removed | 2015-01-05 13:01:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2856946a66 | * Remove assertion that doesn't work on Python 3 | 2015-01-05 12:51:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 94034f1112 | * Fix encoding in lemmatization | 2015-01-05 11:54:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b132b3caa6 | * Fix unicode error in lemmatizer | 2015-01-05 11:53:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 477e7fbffe | * Fix data reading for lemmatizer | 2015-01-05 06:01:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 58f75abaca | * Fix unicode error in orth | 2015-01-05 05:53:08 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e085d5166 | * Fix lemmatizer for Python3 | 2015-01-05 05:51:26 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ae7c811fd1 | * Use Exception instead of StandardError | 2015-01-04 01:22:12 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0e4c2ba036 | * Fix loading of special morph words | 2015-01-03 23:13:00 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f5d41028b5 | * Move around data files for test release | 2015-01-03 01:59:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a24321b63a | * Add downloader | 2015-01-02 21:44:41 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d9a096e2f | * Some minor clean-up after HastyModel | 2014-12-31 19:46:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | aafaf58cbe | * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. | 2014-12-31 19:40:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bcd038e7b6 | * Implement HastyModel | 2014-12-31 01:16:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1a075f77ff | * Don't over-ride pre-loaded POS tags, if set by special-cases | 2014-12-30 23:26:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 785c7ba76a | * Embed signature on attrs | 2014-12-30 23:25:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 30e5805656 | * Lazy-load tagger and parser | 2014-12-30 23:25:09 +11:00 |  |