Commit Graph

389 Commits

Author SHA1 Message Date
Matthew Honnibal
3e46b491b9 Update call to beam_parser for new thinc API 2016-07-31 11:43:23 +02:00
Matthew Honnibal
86862f3586 Update parser.pyx for new thinc API 2016-07-31 11:43:04 +02:00
Matthew Honnibal
ff36cd43df Fix call to updateC 2016-07-31 11:42:44 +02:00
Matthew Honnibal
25513b8389 Remove use of ExampleC from beam parser 2016-07-29 19:58:49 +02:00
Matthew Honnibal
6b912731f8 Refactor model for beam parser, to avoid conditionals on model type 2016-07-29 19:33:01 +02:00
Matthew Honnibal
eb8234181c Tmp 2016-07-27 02:56:50 +02:00
Matthew Honnibal
ac63274e15 Tmp 2016-07-27 02:56:36 +02:00
Matthew Honnibal
6a98a3142f More work on beam parser. 2016-07-26 19:13:39 +02:00
Matthew Honnibal
1ee6b468a9 * Adjust arc_eager oracle, so that recovering errors via non-monotonic actions gives negative cost. Need to test this with greedy parser. 2016-07-26 19:12:00 +02:00
Matthew Honnibal
0bf448461e Work on beam parser, with max violation 2016-07-24 14:26:52 +02:00
Matthew Honnibal
a1281835a8 Clean up commented out code from beam parser. 2016-07-24 11:02:39 +02:00
Matthew Honnibal
476977ef62 Start work on max violation update. About to clean up commented out code. 2016-07-24 11:01:54 +02:00
Matthew Honnibal
8b4abc24e3 Fix beam parsing. Starting to work with early update. 2016-07-24 10:45:50 +02:00
Matthew Honnibal
27176c3d2f Fix beam parser. Starting to work 2016-07-24 01:14:56 +02:00
Matthew Honnibal
e2a9a68b66 * Work on beam parser 2016-07-23 06:07:09 +02:00
Matthew Honnibal
de7c6c48d8 Working NN, but very messy. Relies on BLIS. 2016-07-20 16:28:02 +02:00
Matthew Honnibal
7c2f1a673b * Working neural net, but features hacky. Switching to extractor. 2016-05-26 19:06:10 +02:00
Matthew Honnibal
13fad36e49 * Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop 2016-05-20 10:11:05 +02:00
Wolfgang Seeker
7b78239436 add fix for German noun chunk iterator (issue #365) 2016-05-06 01:41:26 +02:00
Matthew Honnibal
bb94022975 * Fix Issue #365: Error introduced during noun phrase chunking, due to use of corrected PRON/PROPN/etc tags. 2016-05-06 00:21:05 +02:00
Wolfgang Seeker
dbf8f5f3ec fix bug in StateC.set_break() 2016-05-05 15:15:34 +02:00
Wolfgang Seeker
3c44b5dc1a call deprojectivization after parsing 2016-05-05 15:10:36 +02:00
Matthew Honnibal
472f576b82 * Deprojectivize German parses 2016-05-05 15:01:10 +02:00
Wolfgang Seeker
e4ea2bea01 fix whitespace 2016-05-04 07:40:38 +02:00
Wolfgang Seeker
5bf2fd1f78 make the code less cryptic 2016-05-03 17:19:05 +02:00
Wolfgang Seeker
a06fca9fdf German noun chunk iterator now doesn't return tokens more than once 2016-05-03 16:58:59 +02:00
Wolfgang Seeker
7b246c13cb reformulate noun chunk tests for English 2016-05-03 14:24:35 +02:00
Matthew Honnibal
1f1532142f * Fix cost calculation on non-monotonic oracle 2016-05-03 00:21:08 +02:00
Matthew Honnibal
508fd1f6dc * Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples. 2016-05-02 14:25:10 +02:00
Matthew Honnibal
77609588b6 * Fix assignment of root label to words left as root implicitly, after parsing ends. 2016-04-25 19:41:59 +00:00
Matthew Honnibal
7c2d2deaa7 * Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322 2016-04-25 19:41:59 +00:00
Wolfgang Seeker
12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Wolfgang Seeker
b98cc3266d bugfix: iterators now reset properly when called a second time 2016-04-15 17:49:16 +02:00
Wolfgang Seeker
289b10f441 remove some comments 2016-04-14 15:37:51 +02:00
Wolfgang Seeker
d99a9cbce9 different handling of space tokens
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Wolfgang Seeker
d328e0b4a8 Merge branch 'master' into space_head_bug 2016-04-11 12:11:01 +02:00
Wolfgang Seeker
80bea62842 bugfix in unit test 2016-04-08 16:46:44 +02:00
Wolfgang Seeker
1fe911cdb0 bigfix 2016-04-07 18:19:51 +02:00
Matthew Honnibal
872695759d Merge pull request #306 from wbwseeker/german_noun_chunks
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Wolfgang Seeker
7195b6742d add restrictions to L-arc and R-arc to prevent space heads 2016-03-28 10:40:52 +02:00
Wolfgang Seeker
5e2e8e951a add baseclass DocIterator for iterators over documents
add classes for English and German noun chunks

the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Wolfgang Seeker
46e3f979f1 add function for setting head and label to token
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Wolfgang Seeker
7adbd7a785 replace Counter with normal dict 2016-03-03 21:36:27 +01:00
Wolfgang Seeker
1ae487a4f6 add backwards compatibility with python 2.6 2016-03-03 21:18:12 +01:00
Wolfgang Seeker
72b8df0684 turned PseudoProjectivity into a normal python class 2016-03-03 19:05:08 +01:00
Wolfgang Seeker
690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker
3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Wolfgang Seeker
56b7210e82 moved nonproj.py to syntax/nonproj.pyx 2016-02-25 15:08:49 +01:00
Matthew Honnibal
1b83cb9dfa * Fix Issue #251: Incorrect right edge calculation on left-clobber low in the tree 2016-02-07 00:00:42 +01:00
Matthew Honnibal
4412a70dc5 * Initialize StateC._empty_token to 0, to avoid undefined behaviour. 2016-02-06 13:34:38 +01:00