spaCy/spacy
Matthew Honnibal 19ac03ce09 Go back to letting Break work with deeper stacks
It seems very appealing to restrict Break so that it only works when
there's one word on the stack. Then we can pop that word, mark it as the
root, and continue.

However, results are suggesting it's nice to be able to predict Break
when the last word of the previous sentence is on the stack, and the
first word of the next sentence is at the buffer. This does make sense!
Consider that the last word is often a period or something --- a pretty
huge clue. We otherwise have to go out of our way to get that feature
in.

The really decisive thing is we have to handle upcoming sentence breaks
anyway, because we need to conform to preset SBD constraints. So, we may
as well let the parser predict the Break when it's at a stack/queue
position that is most revealing.
2018-04-01 14:32:15 +02:00
..
cli Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Don't use deprecated Doc.merge call in displaCy 2018-01-27 11:25:05 +01:00
lang Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
syntax Go back to letting Break work with deeper stacks 2018-04-01 14:32:15 +02:00
tests Go back to letting Break work with deeper stacks 2018-04-01 14:32:15 +02:00
tokens WIP on adding split-token actions to parser 2018-03-31 20:05:27 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Remove dummy variable from function calls 2018-01-05 09:37:05 +01:00
__main__.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_align.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_matcher2_notes.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
_ml.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
about.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
glossary.py Fix typo in glossary (resolves #1964) 2018-02-10 11:58:41 +01:00
gold.pxd Allocate fused tokens array in GoldParseC 2018-04-01 13:43:56 +02:00
gold.pyx Allocate fused tokens array in GoldParseC 2018-04-01 13:43:56 +02:00
language.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
lemmatizer.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
lexeme.pxd WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
lexeme.pyx added new lexical feat to lexeme 2018-02-11 18:51:48 +01:00
matcher.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
scorer.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
structs.pxd Make TokenC.sent_tart an int, to allow ternary value 2017-10-08 19:58:54 +02:00
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
tokenizer.pyx Merge pull request #1611 from fsonntag/master 2017-11-29 23:11:23 +01:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
vectors.pyx Fix Vectors pickling 2018-03-14 16:59:37 +01:00
vocab.pxd Add Vocab.cfg attr, to hold stuff like oov probs 2017-10-30 16:08:50 +01:00
vocab.pyx Make Vocab.__contains__ work with ints. Fixes #1868 2018-01-23 23:26:47 +01:00