Commit Graph

954 Commits

Author SHA1 Message Date
Matthew Honnibal
f9e510a893 * Whitespace 2015-04-07 04:53:59 +02:00
Matthew Honnibal
66c7ccf6cc * Fix Spans.orth_ 2015-04-07 04:53:40 +02:00
Matthew Honnibal
3b5ea3731a * Add tests for Span stuff 2015-04-07 04:52:25 +02:00
Matthew Honnibal
c2b9a61ee2 * Upd merge test 2015-04-07 04:51:31 +02:00
Matthew Honnibal
b8d34531c4 * Add support for units to English.__init__, by loading and applying regular expressions 2015-04-07 04:02:32 +02:00
Matthew Honnibal
0ea5af88b6 * Add multi-word expression RegexMatcher 2015-04-07 03:45:40 +02:00
Matthew Honnibal
2fee67cfa3 * Add regular expressions for English multi-word expressions 2015-04-07 03:45:18 +02:00
Matthew Honnibal
5a075ea3fc * Ensure NER moves are available for single-word tokens 2015-04-05 22:30:58 +02:00
Matthew Honnibal
a60a366b2c * Support 'punct' dep label in conll.pyx 2015-04-05 22:30:19 +02:00
Matthew Honnibal
021c972137 * Print parse if verbose in scorer 2015-04-05 22:29:30 +02:00
Matthew Honnibal
f26f381b0e * Add simple ner_tag script 2015-04-03 17:26:58 +02:00
Matthew Honnibal
bb27979352 * Add prepare_vecs script 2015-04-02 06:19:39 +02:00
Matthew Honnibal
fbf19049cf * Add ent_type_ property 2015-03-31 02:01:29 +02:00
Matthew Honnibal
3f1e17bd3c * Add tests for new merge() method 2015-03-30 01:37:57 +02:00
Matthew Honnibal
e70b87efeb * Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data 2015-03-30 01:37:41 +02:00
Matthew Honnibal
557856e84c * Allow regular expressions to specify labels for merged spans 2015-03-27 17:40:52 +01:00
Matthew Honnibal
a3af6b7c3d * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. 2015-03-27 17:39:16 +01:00
Matthew Honnibal
db5a43318c * Improve print_state debug printer 2015-03-27 17:29:58 +01:00
Matthew Honnibal
1705eccbbe * Remove whitespace 2015-03-27 15:22:39 +01:00
Matthew Honnibal
3feb52374c * Break apart a condition, for ease of debug printing 2015-03-27 15:21:38 +01:00
Matthew Honnibal
b32f581acb * Fix bug in ArcEager.get_labels 2015-03-27 15:21:06 +01:00
Matthew Honnibal
cd054c6c9f * Remove stray print statement 2015-03-27 15:20:42 +01:00
Matthew Honnibal
5f2a4ff36d * Fix spans.lemma_ 2015-03-26 16:45:38 +01:00
Matthew Honnibal
f4cc222ec3 * Fix NER scoring 2015-03-26 16:45:38 +01:00
Matthew Honnibal
1320bd19db * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal
6f47a667cf * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal
f02c39dfaf * Compare to is not None, for more robustness 2015-03-26 16:44:48 +01:00
Matthew Honnibal
8f68b864c4 * Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens 2015-03-26 16:44:48 +01:00
Matthew Honnibal
056c672caf * Bug fixes to tokenization, and support for times 2015-03-26 16:44:48 +01:00
Matthew Honnibal
ee385b439a * Ensure StringStore is dumped during training 2015-03-26 16:44:47 +01:00
Matthew Honnibal
e854ba0a13 * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter 2015-03-26 16:44:47 +01:00
Matthew Honnibal
6a6085f8b9 * Clean up GreedyParser.train function a bit 2015-03-26 16:44:47 +01:00
Matthew Honnibal
b3157927e6 * Clean up unused feature templates 2015-03-26 16:44:47 +01:00
Matthew Honnibal
411bf377d4 * Remove dependency on ner_util module 2015-03-26 16:44:47 +01:00
Matthew Honnibal
01c892f583 * Add comment to fill_context 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2741179aff * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2b2dec95d3 * Add comment to set_parse 2015-03-26 16:44:47 +01:00
Matthew Honnibal
e770fade1e * Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up... 2015-03-26 16:44:47 +01:00
Matthew Honnibal
71648205d9 * Add support for debug feature set. Just use unigrams for this. 2015-03-26 16:44:47 +01:00
Matthew Honnibal
3b70b304b2 * Add words to gold_tuples from gold conll file 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2e12dec76e * Adjust scorer to account for tokenization mistakes 2015-03-26 16:44:47 +01:00
Matthew Honnibal
221f43c370 * Ensure better separation between score printing and training in train.py 2015-03-26 16:44:46 +01:00
Matthew Honnibal
6d49f8717b * Move scoring away from training. Does not support scoring on gold preproc. 2015-03-26 16:44:46 +01:00
Matthew Honnibal
05d6065e2e * Add assertion 2015-03-26 16:44:46 +01:00
Matthew Honnibal
377e9b29b1 * Whitespace 2015-03-26 16:44:46 +01:00
Matthew Honnibal
670959f40c * Fix iteration order on Tokens.rights 2015-03-26 16:44:46 +01:00
Matthew Honnibal
231ce2dae5 * Assign ROOT label by default. May be papering over another bug. 2015-03-26 16:44:46 +01:00
Matthew Honnibal
9f4ad8fdfb * Assign root words the ROOT label via the Break transition. Something is still wrong here... 2015-03-26 16:44:46 +01:00
Matthew Honnibal
52429625f0 * Add write_parses function 2015-03-26 16:44:46 +01:00
Matthew Honnibal
0c91dd9e15 * Re-enable entity training 2015-03-26 16:44:46 +01:00