Commit Graph

97 Commits

Author SHA1 Message Date
Matthew Honnibal
29b77fd0eb Add tests for gold alignment and parser state 2018-04-01 17:26:37 +02:00
Matthew Honnibal
19ac03ce09 Go back to letting Break work with deeper stacks
It seems very appealing to restrict Break so that it only works when
there's one word on the stack. Then we can pop that word, mark it as the
root, and continue.

However, results are suggesting it's nice to be able to predict Break
when the last word of the previous sentence is on the stack, and the
first word of the next sentence is at the buffer. This does make sense!
Consider that the last word is often a period or something --- a pretty
huge clue. We otherwise have to go out of our way to get that feature
in.

The really decisive thing is we have to handle upcoming sentence breaks
anyway, because we need to conform to preset SBD constraints. So, we may
as well let the parser predict the Break when it's at a stack/queue
position that is most revealing.
2018-04-01 14:32:15 +02:00
Matthew Honnibal
d8dec1134c Simplify Break transition to require stack depth 1. Hopefully as accurate 2018-04-01 12:53:25 +02:00
Matthew Honnibal
e887b2330e Rewrite oracle to not use fast-forward. Seems to work? 2018-04-01 10:43:11 +02:00
Matthew Honnibal
c5574f48c7 Add better arc-eager oracle tests 2018-04-01 10:41:52 +02:00
Matthew Honnibal
e5ad35787c WIP on adding split-token actions to parser
This patch starts getting the StateC object ready to split tokens. The
split function is implemented by pushing indices into the buffer that
indicate an out-of-length token.

Still todo:

* Update the oracles
* Update GoldParseC
* Interpret the parse once it's complete
* Add retokenizer.split() method
2018-03-31 20:05:27 +02:00
Matthew Honnibal
e0375132bd Add state tests, esp. for split function 2018-03-30 13:25:46 +02:00
Matthew Honnibal
1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal
f5b1ad4100 Limit parser model size, to hopefully reduce memory during CI tests 2018-01-28 21:00:32 +01:00
Matthew Honnibal
00435d8f0c Add extra beam parsing test 2017-11-05 14:39:57 +01:00
Matthew Honnibal
711278b667 Make test less flakey 2017-11-03 14:36:08 +01:00
Matthew Honnibal
64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Matthew Honnibal
b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal
4174477161 Fix equality check in test 2017-10-16 19:50:35 +02:00
Matthew Honnibal
462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Matthew Honnibal
fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal
d84136b4a9 Update add label test 2017-10-10 22:57:41 +02:00
Matthew Honnibal
09d61ada5e Merge pull request #1396 from explosion/feature/pipeline-management
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal
d8a2506023 Merge pull request #1401 from explosion/feature/add-parser-action
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f Merge pull request #1400 from explosion/feature/sentence-parsing
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
fad2b8315f Merge branch 'develop' into feature/add-parser-action 2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d Fix tests for history features 2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d Add tests for adding parser actions 2017-10-09 03:42:35 +02:00
Matthew Honnibal
81a64119db Fix string-to-unicode problem 2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119 Fix test 2017-10-09 00:29:37 +02:00
Matthew Honnibal
5a67efeccc Add tests for sentence segmentation presetting 2017-10-09 00:02:23 +02:00
ines
0adadcb3f0 Fix beam parse model test 2017-10-07 02:15:15 +02:00
Matthew Honnibal
20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal
c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
Matthew Honnibal
2da96a0ec7 Fix beam test 2017-08-19 04:15:46 +02:00
Matthew Honnibal
de7e8703e3 Restore tests for beam parser 2017-08-18 22:27:42 +02:00
Matthew Honnibal
52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
92ebab6073 Update beam-update tests 2017-08-13 08:56:02 +02:00
Matthew Honnibal
24b45b45c6 Add test for beam update 2017-08-12 17:15:28 -05:00
Matthew Honnibal
b353e4d843 Work on parser beam training 2017-08-12 14:47:45 -05:00
Matthew Honnibal
d6a5c2c85a Add test for NER 2017-07-22 01:48:58 +02:00
Matthew Honnibal
28244df4da Add test for beam parsing 2017-07-22 01:48:35 +02:00
Matthew Honnibal
2424493970 Remove unnecessary import of Mock 2017-07-22 01:13:54 +02:00
Matthew Honnibal
289f23df51 Test beam parsing 2017-07-20 15:03:10 +02:00
Matthew Honnibal
f014138c11 Fix parser tests 2017-07-20 00:16:52 +02:00
ines
20a7003c0d Update model fixtures and reorganise tests 2017-05-29 22:14:31 +02:00
Matthew Honnibal
ff26aa6c37 Work on to/from bytes/disk serialization methods 2017-05-29 11:45:45 +02:00
ines
fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
ines
b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal
2f78413a02 PseudoProjectivity->nonproj 2017-05-22 05:39:03 -05:00
Matthew Honnibal
836fe1d880 Update neural net tests 2017-05-19 18:11:29 -05:00
Matthew Honnibal
c9a5d5d24b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-16 16:22:05 +02:00
Matthew Honnibal
8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal
221b4c1ee8 Fix test for Python 3 2017-05-16 13:06:30 +02:00
Matthew Honnibal
a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00