Matthew Honnibal
|
378c2a6435
|
* Fix POS model: make it use tag instead of pos in history features
|
2015-04-29 00:02:53 +02:00 |
|
Matthew Honnibal
|
763ef01575
|
* Fix two bugs in feature calculation
|
2015-04-28 23:25:09 +02:00 |
|
Matthew Honnibal
|
b3fd48c97b
|
* Fix missing root labels bug identified in Issue #57
|
2015-04-28 20:45:51 +02:00 |
|
Jordan Suchow
|
3a8d9b37a6
|
Remove trailing whitespace
|
2015-04-19 13:01:38 -07:00 |
|
Jordan Suchow
|
5f0f940a1f
|
Remove unused imports
|
2015-04-19 01:05:22 -07:00 |
|
Matthew Honnibal
|
cc4e395927
|
* Add some ad hoc regexes, for multi-word location prepositions
|
2015-04-17 04:44:24 +02:00 |
|
Matthew Honnibal
|
f7ffd94e6a
|
* Add Token.conjuncts property
|
2015-04-17 01:40:53 +02:00 |
|
Matthew Honnibal
|
684d0e5e85
|
* Download updated data
|
2015-04-16 04:29:15 +02:00 |
|
Matthew Honnibal
|
2ef170a991
|
* Fix Issue #54: Error merging multi-word token when there's a mid-token match.
|
2015-04-16 04:28:06 +02:00 |
|
Matthew Honnibal
|
42617548af
|
* Disable merge_mwes by default
|
2015-04-16 04:20:31 +02:00 |
|
Matthew Honnibal
|
99dbf8a38c
|
* Fix error type in lookup_transition
|
2015-04-16 01:36:22 +02:00 |
|
Matthew Honnibal
|
77d0700caf
|
* Add on X way regexes
|
2015-04-16 01:35:46 +02:00 |
|
Matthew Honnibal
|
9f16848b60
|
* Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend'
|
2015-04-15 06:01:18 +02:00 |
|
Matthew Honnibal
|
c6707778dd
|
* Fix Issue #51: Handle non-ascii lemmas correctly
|
2015-04-13 22:28:59 +02:00 |
|
Matthew Honnibal
|
bf0aff5124
|
* Fix bug in Tokens.ents where entity wasn't being emitted if another started immediately after
|
2015-04-13 21:34:33 +02:00 |
|
Matthew Honnibal
|
2b84a90bbb
|
* Fix Issue #50: Python 3 compatibility of v0.80
|
2015-04-13 05:59:43 +02:00 |
|
Matthew Honnibal
|
fbd48c571d
|
* Rearrange code in tokens.pyx
|
2015-04-13 05:41:25 +02:00 |
|
Matthew Honnibal
|
507048dc45
|
* Rename StandardError to Exception, for Python 3 compatibility
|
2015-04-12 07:28:34 +02:00 |
|
Matthew Honnibal
|
761a19113a
|
* Fix /tmp moving thing in download.py
|
2015-04-12 07:04:10 +02:00 |
|
Matthew Honnibal
|
248a2b4b0f
|
* Remove Spans class
|
2015-04-12 04:07:29 +02:00 |
|
Matthew Honnibal
|
1d05e6da00
|
* Add ne_iob and ne_type features to NER
|
2015-04-10 19:07:08 +02:00 |
|
Matthew Honnibal
|
4df8a3d90f
|
* Add ne_iob and ne_type attributes to context vector
|
2015-04-10 05:02:15 +02:00 |
|
Matthew Honnibal
|
8c354c432b
|
* Add ValueError condition to ner_tag reading
|
2015-04-10 04:59:59 +02:00 |
|
Matthew Honnibal
|
435cccf098
|
* Add read_conll03_file function to conll.pyx
|
2015-04-10 04:59:11 +02:00 |
|
Matthew Honnibal
|
99c9ecfc18
|
* Fix bug in prefix, suffix and word shape features in parser and NER
|
2015-04-10 03:53:33 +02:00 |
|
Matthew Honnibal
|
cff2b13fef
|
* Fix Issue #44: Broken Token.string attribute when single word sentence
|
2015-04-07 06:08:25 +02:00 |
|
Matthew Honnibal
|
6640386b25
|
* Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.
|
2015-04-07 06:00:57 +02:00 |
|
Matthew Honnibal
|
b64b2bd910
|
* Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way.
|
2015-04-07 06:00:30 +02:00 |
|
Matthew Honnibal
|
f9e510a893
|
* Whitespace
|
2015-04-07 04:53:59 +02:00 |
|
Matthew Honnibal
|
66c7ccf6cc
|
* Fix Spans.orth_
|
2015-04-07 04:53:40 +02:00 |
|
Matthew Honnibal
|
b8d34531c4
|
* Add support for units to English.__init__, by loading and applying regular expressions
|
2015-04-07 04:02:32 +02:00 |
|
Matthew Honnibal
|
0ea5af88b6
|
* Add multi-word expression RegexMatcher
|
2015-04-07 03:45:40 +02:00 |
|
Matthew Honnibal
|
2fee67cfa3
|
* Add regular expressions for English multi-word expressions
|
2015-04-07 03:45:18 +02:00 |
|
Matthew Honnibal
|
5a075ea3fc
|
* Ensure NER moves are available for single-word tokens
|
2015-04-05 22:30:58 +02:00 |
|
Matthew Honnibal
|
a60a366b2c
|
* Support 'punct' dep label in conll.pyx
|
2015-04-05 22:30:19 +02:00 |
|
Matthew Honnibal
|
021c972137
|
* Print parse if verbose in scorer
|
2015-04-05 22:29:30 +02:00 |
|
Matthew Honnibal
|
fbf19049cf
|
* Add ent_type_ property
|
2015-03-31 02:01:29 +02:00 |
|
Matthew Honnibal
|
e70b87efeb
|
* Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data
|
2015-03-30 01:37:41 +02:00 |
|
Matthew Honnibal
|
557856e84c
|
* Allow regular expressions to specify labels for merged spans
|
2015-03-27 17:40:52 +01:00 |
|
Matthew Honnibal
|
a3af6b7c3d
|
* Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty.
|
2015-03-27 17:39:16 +01:00 |
|
Matthew Honnibal
|
db5a43318c
|
* Improve print_state debug printer
|
2015-03-27 17:29:58 +01:00 |
|
Matthew Honnibal
|
1705eccbbe
|
* Remove whitespace
|
2015-03-27 15:22:39 +01:00 |
|
Matthew Honnibal
|
3feb52374c
|
* Break apart a condition, for ease of debug printing
|
2015-03-27 15:21:38 +01:00 |
|
Matthew Honnibal
|
b32f581acb
|
* Fix bug in ArcEager.get_labels
|
2015-03-27 15:21:06 +01:00 |
|
Matthew Honnibal
|
5f2a4ff36d
|
* Fix spans.lemma_
|
2015-03-26 16:45:38 +01:00 |
|
Matthew Honnibal
|
f4cc222ec3
|
* Fix NER scoring
|
2015-03-26 16:45:38 +01:00 |
|
Matthew Honnibal
|
1320bd19db
|
* Move Span class to own file
|
2015-03-26 16:45:38 +01:00 |
|
Matthew Honnibal
|
6f47a667cf
|
* Move Span class to own file
|
2015-03-26 16:45:38 +01:00 |
|
Matthew Honnibal
|
f02c39dfaf
|
* Compare to is not None, for more robustness
|
2015-03-26 16:44:48 +01:00 |
|
Matthew Honnibal
|
8f68b864c4
|
* Move Span/Spans to separate files. Currently duplicates lots of Tokens functionality. Should probably be integrated into Tokens
|
2015-03-26 16:44:48 +01:00 |
|
Matthew Honnibal
|
e854ba0a13
|
* Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
6a6085f8b9
|
* Clean up GreedyParser.train function a bit
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
b3157927e6
|
* Clean up unused feature templates
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
411bf377d4
|
* Remove dependency on ner_util module
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
01c892f583
|
* Add comment to fill_context
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2741179aff
|
* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2b2dec95d3
|
* Add comment to set_parse
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
e770fade1e
|
* Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up...
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
71648205d9
|
* Add support for debug feature set. Just use unigrams for this.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
3b70b304b2
|
* Add words to gold_tuples from gold conll file
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2e12dec76e
|
* Adjust scorer to account for tokenization mistakes
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
05d6065e2e
|
* Add assertion
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
377e9b29b1
|
* Whitespace
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
670959f40c
|
* Fix iteration order on Tokens.rights
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
231ce2dae5
|
* Assign ROOT label by default. May be papering over another bug.
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
9f4ad8fdfb
|
* Assign root words the ROOT label via the Break transition. Something is still wrong here...
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
f729164c01
|
* Fix bug in label assignment: ensure null-label transitions receive the label 0
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
7237c805c7
|
* Load tag for specials.json token
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
567388e38d
|
* Use values encoded by StringStore in POS tagging, rather than indices into a list of tags
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
3105c7f8ba
|
* Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
801bf14f4f
|
* Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names.
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
31fad99518
|
* Use StringStore to encode label names, instead of label_ids
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
64db61bff1
|
* Add Span class to Python API
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
b9b695fb1b
|
* Remove debug word list
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
f21ab2d7fb
|
* Fix bug in ugly ent_strings hack on English class
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
1c843934be
|
* Fix oracle bug in NER. Now getting 77% F on ontonotes
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
903f196b3f
|
* Fix verbose printing for scorer
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
e181c051d5
|
* Improve features for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
7ecb52c0ed
|
* Add scorer script
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
8057a95f20
|
* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
ae235e07b9
|
* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
b3eda03c9c
|
* Tmp
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
220ce8bfed
|
* Prepare English class for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
f5830dc1c1
|
* Remove _transitions.pyx
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
6865c2fb4d
|
* Fix assignment of dep strings in tokens.pyx
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
6b6bce9e7a
|
* Fix label loading for transition system
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
5278c7504b
|
* Hacks to conll.pyx. Should clean these up.
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f321b2b2eb
|
* Remove TODO comment
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
fdabd93bfb
|
* Ensure high loss for invalid moves, and fix label reading for arc-eager
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
10ed738df2
|
* Tmp commit
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
4f83c9b3d5
|
* Make costs label-sensitive
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
179b7eb0a7
|
* Specify parser transition system in language
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
8c883cef58
|
* Refactored transition system code now compiling. Still need to hook up label oracle, and test
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f0159ab4b6
|
* Add file to hold GoldParse class
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
8eadb984cb
|
* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
b063001596
|
* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
01bc4d6815
|
* Add set_parse method, to assign parse to tokens in a less hacky way.
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
dc986dbc0b
|
* Work on refactored parser, where TransitionSystem can be easily subclassed
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
1cc6329b18
|
* Add base class to do transitions
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
135756ac3d
|
* Tmp commit of NER refactoring
|
2015-03-26 16:44:42 +01:00 |
|