spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-17 22:09:14 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	aa5ecf7fd2	Update ArcEager for changes to GoldParse class	2018-04-02 23:53:13 +02:00
Matthew Honnibal	9c3612d40b	Draft ArcEager.preprocess_gold for fused tokens	2018-04-01 22:11:35 +02:00
Matthew Honnibal	5f68e491e1	Prepare ArcEager.preprocess_gold to handle subtokens	2018-04-01 18:31:33 +02:00
Matthew Honnibal	b8461e71b7	Prepare ArcEager.preprocess_gold to handle subtokens	2018-04-01 18:03:48 +02:00
Matthew Honnibal	2d929ffc5d	Handle list-valued GoldParse values	2018-04-01 17:42:33 +02:00
Matthew Honnibal	19ac03ce09	Go back to letting Break work with deeper stacks It seems very appealing to restrict Break so that it only works when there's one word on the stack. Then we can pop that word, mark it as the root, and continue. However, results are suggesting it's nice to be able to predict Break when the last word of the previous sentence is on the stack, and the first word of the next sentence is at the buffer. This does make sense! Consider that the last word is often a period or something --- a pretty huge clue. We otherwise have to go out of our way to get that feature in. The really decisive thing is we have to handle upcoming sentence breaks anyway, because we need to conform to preset SBD constraints. So, we may as well let the parser predict the Break when it's at a stack/queue position that is most revealing.	2018-04-01 14:32:15 +02:00
Matthew Honnibal	ad70b91e1e	Comment	2018-04-01 13:47:16 +02:00
Matthew Honnibal	dc7f879281	Set USE_SPLIT=False feature flag	2018-04-01 13:46:25 +02:00
Matthew Honnibal	a2f07ab57f	Start sketching out Split transition implementation	2018-04-01 13:45:41 +02:00
Matthew Honnibal	d8dec1134c	Simplify Break transition to require stack depth 1. Hopefully as accurate	2018-04-01 12:53:25 +02:00
Matthew Honnibal	e887b2330e	Rewrite oracle to not use fast-forward. Seems to work?	2018-04-01 10:43:11 +02:00
Matthew Honnibal	bc2a2c81c8	Add some methods to ArcEager that make testing easier	2018-04-01 10:41:28 +02:00
Matthew Honnibal	e5ad35787c	WIP on adding split-token actions to parser This patch starts getting the StateC object ready to split tokens. The split function is implemented by pushing indices into the buffer that indicate an out-of-length token. Still todo: * Update the oracles * Update GoldParseC * Interpret the parse once it's complete * Add retokenizer.split() method	2018-03-31 20:05:27 +02:00
Matthew Honnibal	d399843576	WIP on split parsing	2018-03-28 01:44:05 +02:00
Matthew Honnibal	1f7229f40f	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `c9ba3d3c2d`, reversing changes made to `92c26a35d4`.	2018-03-27 19:23:02 +02:00
Matthew Honnibal	59b7cf9db8	Add get_beam_parse method in ArcEager, for Prodigy	2018-02-15 21:03:16 +01:00
Matthew Honnibal	2512ea9eeb	Fix memory leak in beam parser	2017-11-14 02:11:40 +01:00
ines	b4d226a3f1	Tidy up syntax	2017-10-27 19:45:57 +02:00
Matthew Honnibal	f111b228e0	Fix re-parsing of previously parsed text If a Doc object had been previously parsed, it was possible for invalid parses to be added. There were two problems: 1) The parse was only being partially erased 2) The RightArc action was able to create a 1-cycle. This patch fixes both errors, and avoids resetting the parse if one is present. In theory this might allow a better parse to be predicted by running the parser twice. Closes #1253.	2017-10-20 16:27:36 +02:00
Matthew Honnibal	4cc84b0234	Prohibit Break when sent_start < 0	2017-10-09 00:02:45 +02:00
Matthew Honnibal	e938bce320	Adjust parsing transition system to allow preset sentence segments.	2017-10-08 23:53:34 +02:00
Matthew Honnibal	44589fb38c	Fix Break oracle	2017-08-25 19:50:55 -05:00
Matthew Honnibal	20dd66ddc2	Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens	2017-08-25 19:35:47 +02:00
Matthew Honnibal	c307a0ffb8	Restore patches from nn-beam-parser to spacy/syntax	2017-08-18 22:38:59 +02:00
Matthew Honnibal	5f81d700ff	Restore patches from nn-beam-parser to spacy/syntax	2017-08-18 22:23:03 +02:00
Matthew Honnibal	426f84937f	Resolve conflicts when merging new beam parsing stuff	2017-08-18 13:38:32 -05:00
Matthew Honnibal	a6d8d7c82e	Add is_gold_parse method to transition system	2017-08-16 18:24:09 -05:00
Matthew Honnibal	52c180ecf5	Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" This reverts commit `ea8de11ad5`, reversing changes made to `08e443e083`.	2017-08-14 13:00:23 +02:00
Matthew Honnibal	78498a072d	Return Transition for missing actions in lookup_action	2017-08-06 14:16:36 +02:00
Matthew Honnibal	8fce187de4	Fix ArcEager for missing values	2017-08-01 22:10:05 +02:00
Matthew Honnibal	3da1063b36	Add beam decoding to parser, to allow NER uncertainties	2017-07-20 15:02:55 +02:00
Matthew Honnibal	be4a640f0c	Fix arc eager label costs for uint64	2017-05-30 20:37:58 +02:00
Matthew Honnibal	84e66ca6d4	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
Matthew Honnibal	39293ab2ee	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-05-28 11:46:57 +02:00
Matthew Honnibal	dd052572d4	Update arc eager for SBD changes	2017-05-28 11:46:51 +02:00
Matthew Honnibal	99316fa631	Use ordered dict to specify actions	2017-05-27 15:50:21 -05:00
Matthew Honnibal	3d5a536eaa	Improve efficiency of parser batching	2017-05-26 11:31:23 -05:00
Matthew Honnibal	e2136232f9	Exclude states with no matching gold annotations from parsing	2017-05-22 10:30:12 -05:00
Matthew Honnibal	aae97f00e9	Fix nonproj import	2017-05-22 05:15:06 -05:00
Matthew Honnibal	1d5d9838a2	Fix action collection for parser	2017-05-22 04:51:08 -05:00
Matthew Honnibal	8b04b0af9f	Remove freqs from transition_system	2017-05-20 02:20:48 -05:00
ines	0739ae7b76	Tidy up and fix formatting and imports	2017-04-15 13:05:15 +02:00
Matthew Honnibal	354458484c	WIP on add_label bug during NER training Currently when a new label is introduced to NER during training, it causes the labels to be read in in an unexpected order. This invalidates the model.	2017-04-14 23:52:17 +02:00
Matthew Honnibal	47a3ef06a6	Unhack deprojetivization, moving it into pipeline Previously the deprojectivize() call was attached to the transition system, and only called for German. Instead it should be a separate process, called after the parser. This makes it available for any language. Closes #898.	2017-03-31 12:31:50 +02:00
Matthew Honnibal	4ef68c413f	Approximate cost in Break transition, to speed things up a bit.	2017-03-15 16:40:27 -05:00
Matthew Honnibal	931feb3360	Allow beam parsing for NER	2017-03-11 11:12:01 -06:00
Matthew Honnibal	d11f1a4ddf	Record negative costs in non-monotonic arc eager oracle	2017-03-10 11:22:04 -06:00
Matthew Honnibal	ca773a1f53	Tweak arc_eager n_gold to deal with negative costs, and improve error message.	2016-11-25 09:01:52 -06:00
Matthew Honnibal	a209b10579	Improve error message when oracle fails for non-projective trees, re Issue #571 .	2016-10-24 20:31:30 +02:00
Matthew Honnibal	59038f7efa	Restore support for prior data format -- specifically, the labels field of the config.	2016-10-17 00:53:26 +02:00

1 2 3 4

162 Commits