Matthew Honnibal
0fb188c76c
Minibatch beam candidates, for faster decoding
2016-08-08 01:38:50 +02:00
Matthew Honnibal
eb145dc1b8
Relax minimum gradient by factor 10. Important for learning
2016-08-08 01:38:06 +02:00
Matthew Honnibal
d3b0447898
Fix minimum gradient and beam density
2016-08-06 16:23:42 +02:00
Matthew Honnibal
2db43e9662
Pass parameter for gradient noise
2016-08-05 18:25:38 +02:00
Matthew Honnibal
d1511e816a
Shuffle histories in beam parser
2016-08-05 18:25:16 +02:00
Matthew Honnibal
de82552a13
Add config for beam density
2016-08-05 18:24:54 +02:00
Matthew Honnibal
a664aa8180
Fix beam_parser for new API
2016-07-31 19:03:10 +02:00
Matthew Honnibal
2f09b041d1
Reset is_valid and costs during beam training
2016-07-31 19:02:45 +02:00
Matthew Honnibal
3e46b491b9
Update call to beam_parser for new thinc API
2016-07-31 11:43:23 +02:00
Matthew Honnibal
86862f3586
Update parser.pyx for new thinc API
2016-07-31 11:43:04 +02:00
Matthew Honnibal
ff36cd43df
Fix call to updateC
2016-07-31 11:42:44 +02:00
Matthew Honnibal
25513b8389
Remove use of ExampleC from beam parser
2016-07-29 19:58:49 +02:00
Matthew Honnibal
6b912731f8
Refactor model for beam parser, to avoid conditionals on model type
2016-07-29 19:33:01 +02:00
Matthew Honnibal
eb8234181c
Tmp
2016-07-27 02:56:50 +02:00
Matthew Honnibal
ac63274e15
Tmp
2016-07-27 02:56:36 +02:00
Matthew Honnibal
6a98a3142f
More work on beam parser.
2016-07-26 19:13:39 +02:00
Matthew Honnibal
1ee6b468a9
* Adjust arc_eager oracle, so that recovering errors via non-monotonic actions gives negative cost. Need to test this with greedy parser.
2016-07-26 19:12:00 +02:00
Matthew Honnibal
0bf448461e
Work on beam parser, with max violation
2016-07-24 14:26:52 +02:00
Matthew Honnibal
a1281835a8
Clean up commented out code from beam parser.
2016-07-24 11:02:39 +02:00
Matthew Honnibal
476977ef62
Start work on max violation update. About to clean up commented out code.
2016-07-24 11:01:54 +02:00
Matthew Honnibal
8b4abc24e3
Fix beam parsing. Starting to work with early update.
2016-07-24 10:45:50 +02:00
Matthew Honnibal
27176c3d2f
Fix beam parser. Starting to work
2016-07-24 01:14:56 +02:00
Matthew Honnibal
e2a9a68b66
* Work on beam parser
2016-07-23 06:07:09 +02:00
Matthew Honnibal
de7c6c48d8
Working NN, but very messy. Relies on BLIS.
2016-07-20 16:28:02 +02:00
Matthew Honnibal
7c2f1a673b
* Working neural net, but features hacky. Switching to extractor.
2016-05-26 19:06:10 +02:00
Matthew Honnibal
13fad36e49
* Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop
2016-05-20 10:11:05 +02:00
Wolfgang Seeker
7b78239436
add fix for German noun chunk iterator (issue #365 )
2016-05-06 01:41:26 +02:00
Matthew Honnibal
bb94022975
* Fix Issue #365 : Error introduced during noun phrase chunking, due to use of corrected PRON/PROPN/etc tags.
2016-05-06 00:21:05 +02:00
Wolfgang Seeker
dbf8f5f3ec
fix bug in StateC.set_break()
2016-05-05 15:15:34 +02:00
Wolfgang Seeker
3c44b5dc1a
call deprojectivization after parsing
2016-05-05 15:10:36 +02:00
Matthew Honnibal
472f576b82
* Deprojectivize German parses
2016-05-05 15:01:10 +02:00
Wolfgang Seeker
e4ea2bea01
fix whitespace
2016-05-04 07:40:38 +02:00
Wolfgang Seeker
5bf2fd1f78
make the code less cryptic
2016-05-03 17:19:05 +02:00
Wolfgang Seeker
a06fca9fdf
German noun chunk iterator now doesn't return tokens more than once
2016-05-03 16:58:59 +02:00
Wolfgang Seeker
7b246c13cb
reformulate noun chunk tests for English
2016-05-03 14:24:35 +02:00
Matthew Honnibal
1f1532142f
* Fix cost calculation on non-monotonic oracle
2016-05-03 00:21:08 +02:00
Matthew Honnibal
508fd1f6dc
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
2016-05-02 14:25:10 +02:00
Matthew Honnibal
77609588b6
* Fix assignment of root label to words left as root implicitly, after parsing ends.
2016-04-25 19:41:59 +00:00
Matthew Honnibal
7c2d2deaa7
* Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322
2016-04-25 19:41:59 +00:00
Wolfgang Seeker
12024b0b0a
bugfix: introducing multiple roots now updates original head's properties
...
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Wolfgang Seeker
b98cc3266d
bugfix: iterators now reset properly when called a second time
2016-04-15 17:49:16 +02:00
Wolfgang Seeker
289b10f441
remove some comments
2016-04-14 15:37:51 +02:00
Wolfgang Seeker
d99a9cbce9
different handling of space tokens
...
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Wolfgang Seeker
d328e0b4a8
Merge branch 'master' into space_head_bug
2016-04-11 12:11:01 +02:00
Wolfgang Seeker
80bea62842
bugfix in unit test
2016-04-08 16:46:44 +02:00
Wolfgang Seeker
1fe911cdb0
bigfix
2016-04-07 18:19:51 +02:00
Matthew Honnibal
872695759d
Merge pull request #306 from wbwseeker/german_noun_chunks
...
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Wolfgang Seeker
7195b6742d
add restrictions to L-arc and R-arc to prevent space heads
2016-03-28 10:40:52 +02:00
Wolfgang Seeker
5e2e8e951a
add baseclass DocIterator for iterators over documents
...
add classes for English and German noun chunks
the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Wolfgang Seeker
46e3f979f1
add function for setting head and label to token
...
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00