Matthew Honnibal
|
2741179aff
|
* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2b2dec95d3
|
* Add comment to set_parse
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
e770fade1e
|
* Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up...
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
71648205d9
|
* Add support for debug feature set. Just use unigrams for this.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
3b70b304b2
|
* Add words to gold_tuples from gold conll file
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2e12dec76e
|
* Adjust scorer to account for tokenization mistakes
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
221f43c370
|
* Ensure better separation between score printing and training in train.py
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
6d49f8717b
|
* Move scoring away from training. Does not support scoring on gold preproc.
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
05d6065e2e
|
* Add assertion
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
377e9b29b1
|
* Whitespace
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
670959f40c
|
* Fix iteration order on Tokens.rights
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
231ce2dae5
|
* Assign ROOT label by default. May be papering over another bug.
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
9f4ad8fdfb
|
* Assign root words the ROOT label via the Break transition. Something is still wrong here...
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
52429625f0
|
* Add write_parses function
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
0c91dd9e15
|
* Re-enable entity training
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
f729164c01
|
* Fix bug in label assignment: ensure null-label transitions receive the label 0
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
ee927fbbb4
|
* Fix test_morph_exceptions
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
7237c805c7
|
* Load tag for specials.json token
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
13520e6cf0
|
* Add i.e. to specials.json
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
567388e38d
|
* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
3105c7f8ba
|
* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
27d9df49e7
|
* Upd sbd tests
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
801bf14f4f
|
* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
9061bbaf61
|
* Move to fixing up ent_strings and dep_strings passing
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
31fad99518
|
* Use StringStore to encode label names, instead of label_ids
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
64db61bff1
|
* Add Span class to Python API
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
b9b695fb1b
|
* Remove debug word list
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
8f7eeb1c2d
|
* Add verbose flag for Scorer, for debugging, and fix ent_strings bug
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
f21ab2d7fb
|
* Fix bug in ugly ent_strings hack on English class
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
1c843934be
|
* Fix oracle bug in NER. Now getting 77% F on ontonotes
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
903f196b3f
|
* Fix verbose printing for scorer
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
e181c051d5
|
* Improve features for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
7ecb52c0ed
|
* Add scorer script
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
8057a95f20
|
* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
e99f19dd6c
|
* Fix clean function
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
ae235e07b9
|
* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
4539c70542
|
* Work on updating train script for named entity recognition
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
357dcdcc01
|
* Fix clean function
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
b3eda03c9c
|
* Tmp
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
220ce8bfed
|
* Prepare English class for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
f5830dc1c1
|
* Remove _transitions.pyx
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
7a1a333f04
|
* Allow gold tokenization training, for debugging
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
8da53cbe3c
|
* Fix setup.py, so that when compiling, only the necessary files are compiled
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
6865c2fb4d
|
* Fix assignment of dep strings in tokens.pyx
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
6b6bce9e7a
|
* Fix label loading for transition system
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
5278c7504b
|
* Hacks to conll.pyx. Should clean these up.
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f321b2b2eb
|
* Remove TODO comment
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
fdabd93bfb
|
* Ensure high loss for invalid moves, and fix label reading for arc-eager
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f5f15a1ef2
|
* Tmp commit
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
10ed738df2
|
* Tmp commit
|
2015-03-26 16:44:43 +01:00 |
|