Commit Graph

  • ed8942a096 * Add train function to fabfile Matthew Honnibal 2015-04-08 22:47:59 +0200
  • baff0f8ad8 * Add docstring explaining script a bit, and add handling of word vectors Matthew Honnibal 2015-04-08 08:20:15 +0200
  • c0a3e25b43 * Upd gitignore Matthew Honnibal 2015-04-08 07:48:04 +0200
  • 156b70ed82 * Add new script to replace make_lexicon, that does full setup of data Matthew Honnibal 2015-04-08 07:46:53 +0200
  • e775e05313 * Use merge_mwe=False in evaluation in train.py Matthew Honnibal 2015-04-08 00:35:19 +0200
  • cff2b13fef * Fix Issue #44: Broken Token.string attribute when single word sentence Matthew Honnibal 2015-04-07 06:08:25 +0200
  • 085574ccc1 * Add test for Issue #44 Matthew Honnibal 2015-04-07 06:05:18 +0200
  • 6640386b25 * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. Matthew Honnibal 2015-04-07 06:00:43 +0200
  • b64b2bd910 * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. Matthew Honnibal 2015-04-07 05:54:53 +0200
  • 6674d719a5 * Test for Issue #43: TAG attribute not working in array export Matthew Honnibal 2015-04-07 05:51:44 +0200
  • f9e510a893 * Whitespace Matthew Honnibal 2015-04-07 04:53:59 +0200
  • 66c7ccf6cc * Fix Spans.orth_ Matthew Honnibal 2015-04-07 04:53:40 +0200
  • 3b5ea3731a * Add tests for Span stuff Matthew Honnibal 2015-04-07 04:52:25 +0200
  • c2b9a61ee2 * Upd merge test Matthew Honnibal 2015-04-07 04:51:31 +0200
  • b8d34531c4 * Add support for units to English.__init__, by loading and applying regular expressions Matthew Honnibal 2015-04-07 04:02:32 +0200
  • 0ea5af88b6 * Add multi-word expression RegexMatcher Matthew Honnibal 2015-04-07 03:45:40 +0200
  • 2fee67cfa3 * Add regular expressions for English multi-word expressions Matthew Honnibal 2015-04-07 03:45:18 +0200
  • 5a075ea3fc * Ensure NER moves are available for single-word tokens Matthew Honnibal 2015-04-05 22:30:58 +0200
  • a60a366b2c * Support 'punct' dep label in conll.pyx Matthew Honnibal 2015-04-05 22:30:19 +0200
  • 021c972137 * Print parse if verbose in scorer Matthew Honnibal 2015-04-05 22:29:30 +0200
  • f26f381b0e * Add simple ner_tag script Matthew Honnibal 2015-04-03 17:26:58 +0200
  • bb27979352 * Add prepare_vecs script Matthew Honnibal 2015-04-02 06:19:39 +0200
  • fbf19049cf * Add ent_type_ property Matthew Honnibal 2015-03-31 02:01:29 +0200
  • 3f1e17bd3c * Add tests for new merge() method Matthew Honnibal 2015-03-30 01:37:57 +0200
  • e70b87efeb * Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data Matthew Honnibal 2015-03-30 01:37:41 +0200
  • 557856e84c * Allow regular expressions to specify labels for merged spans Matthew Honnibal 2015-03-27 17:40:52 +0100
  • a3af6b7c3d * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. Matthew Honnibal 2015-03-27 17:39:16 +0100
  • db5a43318c * Improve print_state debug printer Matthew Honnibal 2015-03-27 17:29:58 +0100
  • 1705eccbbe * Remove whitespace Matthew Honnibal 2015-03-27 15:22:39 +0100
  • 3feb52374c * Break apart a condition, for ease of debug printing Matthew Honnibal 2015-03-27 15:21:38 +0100
  • b32f581acb * Fix bug in ArcEager.get_labels Matthew Honnibal 2015-03-27 15:21:06 +0100
  • cd054c6c9f * Remove stray print statement Matthew Honnibal 2015-03-27 15:20:42 +0100
  • 5f2a4ff36d * Fix spans.lemma_ Matthew Honnibal 2015-03-26 03:45:11 +0100
  • f4cc222ec3 * Fix NER scoring Matthew Honnibal 2015-03-26 03:20:00 +0100
  • 1320bd19db * Move Span class to own file Matthew Honnibal 2015-03-26 03:19:07 +0100
  • 6f47a667cf * Move Span class to own file Matthew Honnibal 2015-03-26 03:18:34 +0100
  • f02c39dfaf * Compare to is not None, for more robustness Matthew Honnibal 2015-03-26 03:17:24 +0100
  • 8f68b864c4 * Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens Matthew Honnibal 2015-03-26 03:16:40 +0100
  • 056c672caf * Bug fixes to tokenization, and support for times Matthew Honnibal 2015-03-25 01:09:22 +0100
  • ee385b439a * Ensure StringStore is dumped during training Matthew Honnibal 2015-03-25 01:08:24 +0100
  • e854ba0a13 * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter Matthew Honnibal 2015-03-24 05:12:37 +0100
  • 6a6085f8b9 * Clean up GreedyParser.train function a bit Matthew Honnibal 2015-03-24 05:11:37 +0100
  • b3157927e6 * Clean up unused feature templates Matthew Honnibal 2015-03-24 05:08:35 +0100
  • 411bf377d4 * Remove dependency on ner_util module Matthew Honnibal 2015-03-24 05:03:13 +0100
  • 01c892f583 * Add comment to fill_context Matthew Honnibal 2015-03-24 04:39:58 +0100
  • 2741179aff * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. Matthew Honnibal 2015-03-24 04:32:11 +0100
  • 2b2dec95d3 * Add comment to set_parse Matthew Honnibal 2015-03-24 04:31:20 +0100
  • e770fade1e * Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up... Matthew Honnibal 2015-03-24 04:30:04 +0100
  • 71648205d9 * Add support for debug feature set. Just use unigrams for this. Matthew Honnibal 2015-03-24 04:29:01 +0100
  • 3b70b304b2 * Add words to gold_tuples from gold conll file Matthew Honnibal 2015-03-24 04:27:20 +0100
  • 2e12dec76e * Adjust scorer to account for tokenization mistakes Matthew Honnibal 2015-03-24 04:26:37 +0100
  • 221f43c370 * Ensure better separation between score printing and training in train.py Matthew Honnibal 2015-03-24 04:25:38 +0100
  • 6d49f8717b * Move scoring away from training. Does not support scoring on gold preproc. Matthew Honnibal 2015-03-23 17:32:55 +0100
  • 05d6065e2e * Add assertion Matthew Honnibal 2015-03-23 15:34:45 +0100
  • 377e9b29b1 * Whitespace Matthew Honnibal 2015-03-23 15:34:08 +0100
  • 670959f40c * Fix iteration order on Tokens.rights Matthew Honnibal 2015-03-22 15:35:00 +0000
  • 231ce2dae5 * Assign ROOT label by default. May be papering over another bug. Matthew Honnibal 2015-03-20 01:16:12 +0100
  • 9f4ad8fdfb * Assign root words the ROOT label via the Break transition. Something is still wrong here... Matthew Honnibal 2015-03-20 01:15:27 +0100
  • 52429625f0 * Add write_parses function Matthew Honnibal 2015-03-20 01:14:20 +0100
  • 0c91dd9e15 * Re-enable entity training Matthew Honnibal 2015-03-16 08:45:54 -0400
  • f729164c01 * Fix bug in label assignment: ensure null-label transitions receive the label 0 Matthew Honnibal 2015-03-16 08:32:23 -0400
  • ee927fbbb4 * Fix test_morph_exceptions Matthew Honnibal 2015-03-15 19:25:27 -0400
  • 7237c805c7 * Load tag for specials.json token Matthew Honnibal 2015-03-15 19:24:47 -0400
  • 13520e6cf0 * Add i.e. to specials.json Matthew Honnibal 2015-03-15 19:23:41 -0400
  • 567388e38d * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags Matthew Honnibal 2015-03-15 17:01:58 -0400
  • 3105c7f8ba * Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels Matthew Honnibal 2015-03-14 14:53:50 -0400
  • 27d9df49e7 * Upd sbd tests Matthew Honnibal 2015-03-14 11:10:42 -0400
  • 801bf14f4f * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. Matthew Honnibal 2015-03-14 11:10:27 -0400
  • 9061bbaf61 * Move to fixing up ent_strings and dep_strings passing Matthew Honnibal 2015-03-14 11:09:55 -0400
  • 31fad99518 * Use StringStore to encode label names, instead of label_ids Matthew Honnibal 2015-03-14 11:06:35 -0400
  • 64db61bff1 * Add Span class to Python API Matthew Honnibal 2015-03-13 19:21:16 -0400
  • b9b695fb1b * Remove debug word list Matthew Honnibal 2015-03-13 14:43:53 -0400
  • 8f7eeb1c2d * Add verbose flag for Scorer, for debugging, and fix ent_strings bug Matthew Honnibal 2015-03-11 02:27:22 -0400
  • f21ab2d7fb * Fix bug in ugly ent_strings hack on English class Matthew Honnibal 2015-03-11 02:26:40 -0400
  • 1c843934be * Fix oracle bug in NER. Now getting 77% F on ontonotes Matthew Honnibal 2015-03-11 02:25:08 -0400
  • 903f196b3f * Fix verbose printing for scorer Matthew Honnibal 2015-03-11 02:24:22 -0400
  • e181c051d5 * Improve features for NER Matthew Honnibal 2015-03-10 21:26:13 -0400
  • 7ecb52c0ed * Add scorer script Matthew Honnibal 2015-03-10 21:07:03 -0400
  • 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. Matthew Honnibal 2015-03-10 13:00:23 -0400
  • e99f19dd6c * Fix clean function Matthew Honnibal 2015-03-09 07:06:33 -0400
  • ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. Matthew Honnibal 2015-03-09 07:06:01 -0400
  • 4539c70542 * Work on updating train script for named entity recognition Matthew Honnibal 2015-03-09 01:46:53 -0400
  • 357dcdcc01 * Fix clean function Matthew Honnibal 2015-03-09 01:46:35 -0400
  • b3eda03c9c * Tmp Matthew Honnibal 2015-03-09 01:46:22 -0400
  • 220ce8bfed * Prepare English class for NER Matthew Honnibal 2015-03-08 19:04:00 -0400
  • f5830dc1c1 * Remove _transitions.pyx Matthew Honnibal 2015-03-08 00:26:54 -0500
  • 7a1a333f04 * Allow gold tokenization training, for debugging Matthew Honnibal 2015-03-08 00:17:12 -0500
  • 8da53cbe3c * Fix setup.py, so that when compiling, only the necessary files are compiled Matthew Honnibal 2015-03-08 00:16:32 -0500
  • 6865c2fb4d * Fix assignment of dep strings in tokens.pyx Matthew Honnibal 2015-03-08 00:16:06 -0500
  • 6b6bce9e7a * Fix label loading for transition system Matthew Honnibal 2015-03-08 00:15:20 -0500
  • 5278c7504b * Hacks to conll.pyx. Should clean these up. Matthew Honnibal 2015-03-08 00:14:48 -0500
  • f321b2b2eb * Remove TODO comment Matthew Honnibal 2015-03-08 00:14:06 -0500
  • fdabd93bfb * Ensure high loss for invalid moves, and fix label reading for arc-eager Matthew Honnibal 2015-03-08 00:13:20 -0500
  • f5f15a1ef2 * Tmp commit Matthew Honnibal 2015-02-23 14:05:04 -0500
  • 10ed738df2 * Tmp commit Matthew Honnibal 2015-02-23 14:04:53 -0500
  • 4f83c9b3d5 * Make costs label-sensitive Matthew Honnibal 2015-02-22 18:18:54 -0500
  • 179b7eb0a7 * Specify parser transition system in language Matthew Honnibal 2015-02-22 00:32:33 -0500
  • 8c883cef58 * Refactored transition system code now compiling. Still need to hook up label oracle, and test Matthew Honnibal 2015-02-22 00:32:07 -0500
  • 6e86790a4e * Add new syntax modules to setup.py Matthew Honnibal 2015-02-22 00:31:23 -0500
  • 34215de61b * Upd train script, moving lots of functionality to new GoldParse class Matthew Honnibal 2015-02-21 20:06:29 -0500