Matthew Honnibal
|
d1b55310a1
|
* Refactor _advance_beam function
|
2015-06-02 18:38:41 +02:00 |
|
Matthew Honnibal
|
0786d9b3c7
|
* Refactor TransitionSystem, adding set_valid method
|
2015-06-02 18:38:07 +02:00 |
|
Matthew Honnibal
|
a3964957f6
|
* Add profiling for _state.pyx
|
2015-06-02 18:36:27 +02:00 |
|
Matthew Honnibal
|
e822df0867
|
* Fix bugs in new greedy/beam parser
|
2015-06-02 02:01:33 +02:00 |
|
Matthew Honnibal
|
66dfa95847
|
* Revise greedy_parse/beam_parse ownership goof
|
2015-06-02 01:34:19 +02:00 |
|
Matthew Honnibal
|
75658b2ed3
|
* Remove use of new beam.loss property, to maintain compatibility with older versions of thinc for now.
|
2015-06-02 00:57:09 +02:00 |
|
Matthew Honnibal
|
7c29362d60
|
* Rename parser class in parser.pxd, now that beam parsing is supported
|
2015-06-02 00:53:49 +02:00 |
|
Matthew Honnibal
|
58d5ac0944
|
* Add beam search capabilities to Parser. Rename GreedyParser to Parser.
|
2015-06-02 00:28:02 +02:00 |
|
Matthew Honnibal
|
e09a08bd00
|
* Add copy_state function
|
2015-06-01 23:06:30 +02:00 |
|
Matthew Honnibal
|
c7876aa8b6
|
* Add get_valid method
|
2015-06-01 23:06:00 +02:00 |
|
Matthew Honnibal
|
5e99ff94c8
|
* Edits to arc eager oracle. Couldn't figure out how the non-monotonic lines made sense. They seem covered by children_in_stack
|
2015-05-31 15:14:37 +02:00 |
|
Matthew Honnibal
|
6c5632b71c
|
* Roll back proposed change to Break transition while investigate effect
|
2015-05-31 06:49:52 +02:00 |
|
Matthew Honnibal
|
e77940565d
|
* Add length cap to distance feature
|
2015-05-31 05:25:30 +02:00 |
|
Matthew Honnibal
|
fd596351ba
|
* Fix valency features
|
2015-05-31 05:24:33 +02:00 |
|
Matthew Honnibal
|
76300bbb1b
|
* Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag.
|
2015-05-30 01:25:46 +02:00 |
|
Matthew Honnibal
|
8f31d3b864
|
* Relax constraint on Break transition for non-monotonic parsing.
|
2015-05-28 23:39:52 +02:00 |
|
Matthew Honnibal
|
4010b9b6d9
|
* Pass parameter for regularization in parser.pyx
|
2015-05-27 03:18:50 +02:00 |
|
Matthew Honnibal
|
fc75210941
|
* Move spacy.syntax.conll to spacy.gold
|
2015-05-24 21:35:02 +02:00 |
|
Matthew Honnibal
|
efe7a7d7d6
|
* Clean unused functions from spacy.syntax.conll
|
2015-05-24 20:06:46 +02:00 |
|
Matthew Honnibal
|
78487f3e66
|
* Update parser oracle for missing heads
|
2015-05-24 20:05:58 +02:00 |
|
Matthew Honnibal
|
acd1245ad4
|
* Remove cruft from conll.pyx --- unused stuff about evlauation, which now lives in spacy.scorer
|
2015-05-24 17:35:49 +02:00 |
|
Matthew Honnibal
|
20f1d868a3
|
* Tmp commit. Working on whole document parsing
|
2015-05-24 02:49:56 +02:00 |
|
Matthew Honnibal
|
f2ee9c4feb
|
* Comment out constituency parsing stuff, so that code compiles
|
2015-05-20 16:55:05 +02:00 |
|
Matthew Honnibal
|
9dfc9c039c
|
* Work on constituency parsing.
|
2015-05-20 16:02:51 +02:00 |
|
Matthew Honnibal
|
ba07b925a7
|
* Fix compile error in conll.pyx
|
2015-05-12 22:33:47 +02:00 |
|
Matthew Honnibal
|
f1e0272b18
|
* Disable c-parsing transitions
|
2015-05-12 22:33:25 +02:00 |
|
Matthew Honnibal
|
03a6626545
|
* Tmp commit
|
2015-05-12 20:27:56 +02:00 |
|
Matthew Honnibal
|
9568ebed08
|
* Fix off-by-one in head reading
|
2015-05-12 20:27:56 +02:00 |
|
Matthew Honnibal
|
d2ac8d8007
|
* Add ctnt field to State, in preparation for constituency parsing
|
2015-05-12 20:27:56 +02:00 |
|
Matthew Honnibal
|
ab67693393
|
* Add read_json_file to conll.pyx
|
2015-05-12 20:27:55 +02:00 |
|
Matthew Honnibal
|
aff9359a8d
|
* Update ner.pyx to expect brackets from gold_tuples
|
2015-05-12 20:27:55 +02:00 |
|
Matthew Honnibal
|
53cf77e1c8
|
* Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list
|
2015-05-12 20:26:41 +02:00 |
|
Matthew Honnibal
|
a4e2af54f9
|
* Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible
|
2015-05-12 20:26:41 +02:00 |
|
Matthew Honnibal
|
fb8d50b3d5
|
Merge branch 'master' of ssh://github.com/honnibal/spaCy
|
2015-04-30 12:45:15 +02:00 |
|
Matthew Honnibal
|
ed8e8c3bd0
|
* Whitespace
|
2015-04-29 14:22:47 +02:00 |
|
Matthew Honnibal
|
763ef01575
|
* Fix two bugs in feature calculation
|
2015-04-28 23:25:09 +02:00 |
|
Matthew Honnibal
|
b3fd48c97b
|
* Fix missing root labels bug identified in Issue #57
|
2015-04-28 20:45:51 +02:00 |
|
Jordan Suchow
|
3a8d9b37a6
|
Remove trailing whitespace
|
2015-04-19 13:01:38 -07:00 |
|
Matthew Honnibal
|
99dbf8a38c
|
* Fix error type in lookup_transition
|
2015-04-16 01:36:22 +02:00 |
|
Matthew Honnibal
|
9f16848b60
|
* Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend'
|
2015-04-15 06:01:18 +02:00 |
|
Matthew Honnibal
|
507048dc45
|
* Rename StandardError to Exception, for Python 3 compatibility
|
2015-04-12 07:28:34 +02:00 |
|
Matthew Honnibal
|
1d05e6da00
|
* Add ne_iob and ne_type features to NER
|
2015-04-10 19:07:08 +02:00 |
|
Matthew Honnibal
|
4df8a3d90f
|
* Add ne_iob and ne_type attributes to context vector
|
2015-04-10 05:02:15 +02:00 |
|
Matthew Honnibal
|
8c354c432b
|
* Add ValueError condition to ner_tag reading
|
2015-04-10 04:59:59 +02:00 |
|
Matthew Honnibal
|
435cccf098
|
* Add read_conll03_file function to conll.pyx
|
2015-04-10 04:59:11 +02:00 |
|
Matthew Honnibal
|
99c9ecfc18
|
* Fix bug in prefix, suffix and word shape features in parser and NER
|
2015-04-10 03:53:33 +02:00 |
|
Matthew Honnibal
|
5a075ea3fc
|
* Ensure NER moves are available for single-word tokens
|
2015-04-05 22:30:58 +02:00 |
|
Matthew Honnibal
|
a60a366b2c
|
* Support 'punct' dep label in conll.pyx
|
2015-04-05 22:30:19 +02:00 |
|
Matthew Honnibal
|
a3af6b7c3d
|
* Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty.
|
2015-03-27 17:39:16 +01:00 |
|
Matthew Honnibal
|
db5a43318c
|
* Improve print_state debug printer
|
2015-03-27 17:29:58 +01:00 |
|
Matthew Honnibal
|
1705eccbbe
|
* Remove whitespace
|
2015-03-27 15:22:39 +01:00 |
|
Matthew Honnibal
|
3feb52374c
|
* Break apart a condition, for ease of debug printing
|
2015-03-27 15:21:38 +01:00 |
|
Matthew Honnibal
|
b32f581acb
|
* Fix bug in ArcEager.get_labels
|
2015-03-27 15:21:06 +01:00 |
|
Matthew Honnibal
|
1320bd19db
|
* Move Span class to own file
|
2015-03-26 16:45:38 +01:00 |
|
Matthew Honnibal
|
e854ba0a13
|
* Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
6a6085f8b9
|
* Clean up GreedyParser.train function a bit
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
b3157927e6
|
* Clean up unused feature templates
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
411bf377d4
|
* Remove dependency on ner_util module
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
01c892f583
|
* Add comment to fill_context
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
2741179aff
|
* Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
71648205d9
|
* Add support for debug feature set. Just use unigrams for this.
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
3b70b304b2
|
* Add words to gold_tuples from gold conll file
|
2015-03-26 16:44:47 +01:00 |
|
Matthew Honnibal
|
05d6065e2e
|
* Add assertion
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
377e9b29b1
|
* Whitespace
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
9f4ad8fdfb
|
* Assign root words the ROOT label via the Break transition. Something is still wrong here...
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
f729164c01
|
* Fix bug in label assignment: ensure null-label transitions receive the label 0
|
2015-03-26 16:44:46 +01:00 |
|
Matthew Honnibal
|
31fad99518
|
* Use StringStore to encode label names, instead of label_ids
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
b9b695fb1b
|
* Remove debug word list
|
2015-03-26 16:44:45 +01:00 |
|
Matthew Honnibal
|
1c843934be
|
* Fix oracle bug in NER. Now getting 77% F on ontonotes
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
e181c051d5
|
* Improve features for NER
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
8057a95f20
|
* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
ae235e07b9
|
* Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc.
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
b3eda03c9c
|
* Tmp
|
2015-03-26 16:44:44 +01:00 |
|
Matthew Honnibal
|
6b6bce9e7a
|
* Fix label loading for transition system
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
5278c7504b
|
* Hacks to conll.pyx. Should clean these up.
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f321b2b2eb
|
* Remove TODO comment
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
fdabd93bfb
|
* Ensure high loss for invalid moves, and fix label reading for arc-eager
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
10ed738df2
|
* Tmp commit
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
4f83c9b3d5
|
* Make costs label-sensitive
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
8c883cef58
|
* Refactored transition system code now compiling. Still need to hook up label oracle, and test
|
2015-03-26 16:44:43 +01:00 |
|
Matthew Honnibal
|
f0159ab4b6
|
* Add file to hold GoldParse class
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
8eadb984cb
|
* Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
b063001596
|
* Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
dc986dbc0b
|
* Work on refactored parser, where TransitionSystem can be easily subclassed
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
135756ac3d
|
* Tmp commit of NER refactoring
|
2015-03-26 16:44:42 +01:00 |
|
Matthew Honnibal
|
0ff078876a
|
* Commit some work on ner.yx done on the plane
|
2015-03-26 16:44:41 +01:00 |
|
Matthew Honnibal
|
d81b7be6a2
|
* Merge train.py
|
2015-03-26 16:44:41 +01:00 |
|
Matthew Honnibal
|
3d0570685c
|
* Add NER transition system
|
2015-03-26 16:44:41 +01:00 |
|
Matthew Honnibal
|
ea90d136e8
|
* Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy.
|
2015-02-27 03:56:10 -05:00 |
|
Matthew Honnibal
|
312b3a45f3
|
* Fix issue #19: Allow parsing/pos tagging of empty strings
|
2015-02-10 10:15:58 -05:00 |
|
Matthew Honnibal
|
5c3513583d
|
* Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens.
|
2015-02-09 03:57:10 -05:00 |
|
Matthew Honnibal
|
c55a33d045
|
* Catch oracle errors
|
2015-02-02 23:02:04 +11:00 |
|
Matthew Honnibal
|
d68678a93e
|
* Add Exception class, OracleError
|
2015-02-02 11:57:32 +11:00 |
|
Matthew Honnibal
|
88170e6295
|
* Supply dep_strings as a tuple, for the changed API on Tokens
|
2015-01-31 13:42:09 +11:00 |
|
Matthew Honnibal
|
0981d68022
|
* Set a sent_end flag during parsing, for later use
|
2015-01-31 13:41:46 +11:00 |
|
Matthew Honnibal
|
0f95712189
|
* Improve accuracy reporting during training
|
2015-01-30 18:05:06 +11:00 |
|
Matthew Honnibal
|
67d6e53a69
|
* Ensure parser and tagger function correctly when training from missing values, indicated by -1
|
2015-01-30 14:08:56 +11:00 |
|
Matthew Honnibal
|
ebf7d2fab1
|
* Use non-joint sbd, for more simplicity and fewer classes
|
2015-01-29 06:22:03 +11:00 |
|
Matthew Honnibal
|
d05c5bf141
|
* Remove comment
|
2015-01-29 05:19:27 +11:00 |
|
Matthew Honnibal
|
320b045daa
|
* Oracle now consistent over gold standard derivation
|
2015-01-29 03:41:58 +11:00 |
|