spaCy/spacy/tests/regression/test_issue3468.py

# coding: utf8
from __future__ import unicode_literals

from spacy.lang.en import English
from spacy.tokens import Doc


def test_issue3468():
    """Test that sentence boundaries are set correctly so Doc.is_sentenced can
    be restored after serialization."""
    nlp = English()
    nlp.add_pipe(nlp.create_pipe("sentencizer"))
    doc = nlp("Hello world")
    assert doc[0].is_sent_start
    assert doc.is_sentenced
    assert len(list(doc.sents)) == 1
    doc_bytes = doc.to_bytes()
    new_doc = Doc(nlp.vocab).from_bytes(doc_bytes)
    assert new_doc[0].is_sent_start
    assert new_doc.is_sentenced
    assert len(list(new_doc.sents)) == 1
Add xfailing test for #3468 2019-03-23 13:19:11 +03:00			`# coding: utf8`
			`from __future__ import unicode_literals`

			`from spacy.lang.en import English`
			`from spacy.tokens import Doc`


			`def test_issue3468():`
💫 Add better and serializable sentencizer (#3471) * Add better serializable sentencizer component * Replace default factory * Add tests * Tidy up * Pass test * Update docs 2019-03-23 17:45:02 +03:00			`"""Test that sentence boundaries are set correctly so Doc.is_sentenced can`
			`be restored after serialization."""`
Add xfailing test for #3468 2019-03-23 13:19:11 +03:00			`nlp = English()`
			`nlp.add_pipe(nlp.create_pipe("sentencizer"))`
			`doc = nlp("Hello world")`
			`assert doc[0].is_sent_start`
Slightly modify test for #3468 Check for Token.is_sent_start first (which is serialized/deserialized correctly) 2019-03-23 13:22:44 +03:00			`assert doc.is_sentenced`
Add xfailing test for #3468 2019-03-23 13:19:11 +03:00			`assert len(list(doc.sents)) == 1`
			`doc_bytes = doc.to_bytes()`
			`new_doc = Doc(nlp.vocab).from_bytes(doc_bytes)`
Fix test for #3468 2019-03-23 13:24:29 +03:00			`assert new_doc[0].is_sent_start`
Slightly modify test for #3468 Check for Token.is_sent_start first (which is serialized/deserialized correctly) 2019-03-23 13:22:44 +03:00			`assert new_doc.is_sentenced`
Add xfailing test for #3468 2019-03-23 13:19:11 +03:00			`assert len(list(new_doc.sents)) == 1`