Commit Graph

5731 Commits

Author SHA1 Message Date
Matthew Honnibal
1320bd19db * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal
6f47a667cf * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal
f02c39dfaf * Compare to is not None, for more robustness 2015-03-26 16:44:48 +01:00
Matthew Honnibal
8f68b864c4 * Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens 2015-03-26 16:44:48 +01:00
Matthew Honnibal
056c672caf * Bug fixes to tokenization, and support for times 2015-03-26 16:44:48 +01:00
Matthew Honnibal
ee385b439a * Ensure StringStore is dumped during training 2015-03-26 16:44:47 +01:00
Matthew Honnibal
e854ba0a13 * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter 2015-03-26 16:44:47 +01:00
Matthew Honnibal
6a6085f8b9 * Clean up GreedyParser.train function a bit 2015-03-26 16:44:47 +01:00
Matthew Honnibal
b3157927e6 * Clean up unused feature templates 2015-03-26 16:44:47 +01:00
Matthew Honnibal
411bf377d4 * Remove dependency on ner_util module 2015-03-26 16:44:47 +01:00
Matthew Honnibal
01c892f583 * Add comment to fill_context 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2741179aff * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2b2dec95d3 * Add comment to set_parse 2015-03-26 16:44:47 +01:00
Matthew Honnibal
e770fade1e * Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up... 2015-03-26 16:44:47 +01:00
Matthew Honnibal
71648205d9 * Add support for debug feature set. Just use unigrams for this. 2015-03-26 16:44:47 +01:00
Matthew Honnibal
3b70b304b2 * Add words to gold_tuples from gold conll file 2015-03-26 16:44:47 +01:00
Matthew Honnibal
2e12dec76e * Adjust scorer to account for tokenization mistakes 2015-03-26 16:44:47 +01:00
Matthew Honnibal
221f43c370 * Ensure better separation between score printing and training in train.py 2015-03-26 16:44:46 +01:00
Matthew Honnibal
6d49f8717b * Move scoring away from training. Does not support scoring on gold preproc. 2015-03-26 16:44:46 +01:00
Matthew Honnibal
05d6065e2e * Add assertion 2015-03-26 16:44:46 +01:00
Matthew Honnibal
377e9b29b1 * Whitespace 2015-03-26 16:44:46 +01:00
Matthew Honnibal
670959f40c * Fix iteration order on Tokens.rights 2015-03-26 16:44:46 +01:00
Matthew Honnibal
231ce2dae5 * Assign ROOT label by default. May be papering over another bug. 2015-03-26 16:44:46 +01:00
Matthew Honnibal
9f4ad8fdfb * Assign root words the ROOT label via the Break transition. Something is still wrong here... 2015-03-26 16:44:46 +01:00
Matthew Honnibal
52429625f0 * Add write_parses function 2015-03-26 16:44:46 +01:00
Matthew Honnibal
0c91dd9e15 * Re-enable entity training 2015-03-26 16:44:46 +01:00
Matthew Honnibal
f729164c01 * Fix bug in label assignment: ensure null-label transitions receive the label 0 2015-03-26 16:44:46 +01:00
Matthew Honnibal
ee927fbbb4 * Fix test_morph_exceptions 2015-03-26 16:44:46 +01:00
Matthew Honnibal
7237c805c7 * Load tag for specials.json token 2015-03-26 16:44:46 +01:00
Matthew Honnibal
13520e6cf0 * Add i.e. to specials.json 2015-03-26 16:44:45 +01:00
Matthew Honnibal
567388e38d * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags 2015-03-26 16:44:45 +01:00
Matthew Honnibal
3105c7f8ba * Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels 2015-03-26 16:44:45 +01:00
Matthew Honnibal
27d9df49e7 * Upd sbd tests 2015-03-26 16:44:45 +01:00
Matthew Honnibal
801bf14f4f * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. 2015-03-26 16:44:45 +01:00
Matthew Honnibal
9061bbaf61 * Move to fixing up ent_strings and dep_strings passing 2015-03-26 16:44:45 +01:00
Matthew Honnibal
31fad99518 * Use StringStore to encode label names, instead of label_ids 2015-03-26 16:44:45 +01:00
Matthew Honnibal
64db61bff1 * Add Span class to Python API 2015-03-26 16:44:45 +01:00
Matthew Honnibal
b9b695fb1b * Remove debug word list 2015-03-26 16:44:45 +01:00
Matthew Honnibal
8f7eeb1c2d * Add verbose flag for Scorer, for debugging, and fix ent_strings bug 2015-03-26 16:44:45 +01:00
Matthew Honnibal
f21ab2d7fb * Fix bug in ugly ent_strings hack on English class 2015-03-26 16:44:45 +01:00
Matthew Honnibal
1c843934be * Fix oracle bug in NER. Now getting 77% F on ontonotes 2015-03-26 16:44:44 +01:00
Matthew Honnibal
903f196b3f * Fix verbose printing for scorer 2015-03-26 16:44:44 +01:00
Matthew Honnibal
e181c051d5 * Improve features for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal
7ecb52c0ed * Add scorer script 2015-03-26 16:44:44 +01:00
Matthew Honnibal
8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal
e99f19dd6c * Fix clean function 2015-03-26 16:44:44 +01:00
Matthew Honnibal
ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. 2015-03-26 16:44:44 +01:00
Matthew Honnibal
4539c70542 * Work on updating train script for named entity recognition 2015-03-26 16:44:44 +01:00
Matthew Honnibal
357dcdcc01 * Fix clean function 2015-03-26 16:44:44 +01:00
Matthew Honnibal
b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00