spaCy/spacy/tokens
Matthew Honnibal a902b5f217
Record whether Doc objects are built from known spacing (#5697)
* Tell convert CLI to store user data for Doc

* Remove assert

* Add has_unknwon_spaces flag on Doc

* Do not tokenize docs with unknown spaces in Corpus

* Handle conversion of unknown spaces in Example

* Fixes

* Fixes

* Draft has_known_spaces support in DocBin

* Add test for serialize has_unknown_spaces

* Fix DocBin serialization when has_unknown_spaces

* Use serialization in test
2020-07-03 12:58:16 +02:00
..
__init__.pxd * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
__init__.py Modify morphology to support arbitrary features (#4932) 2020-01-23 22:01:54 +01:00
_retokenize.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
_serialize.py Record whether Doc objects are built from known spacing (#5697) 2020-07-03 12:58:16 +02:00
doc.pxd Record whether Doc objects are built from known spacing (#5697) 2020-07-03 12:58:16 +02:00
doc.pyx Record whether Doc objects are built from known spacing (#5697) 2020-07-03 12:58:16 +02:00
morphanalysis.pxd Modify morphology to support arbitrary features (#4932) 2020-01-23 22:01:54 +01:00
morphanalysis.pyx refactor fixes (#5664) 2020-06-29 14:33:00 +02:00
span.pxd annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
span.pyx Remove deprecated methods 2020-07-01 22:33:39 +02:00
token.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
token.pyx Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
underscore.py Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00