spaCy/spacy/training
Paul O'Leary McCann c362006cb9
Fix is_sent_start when converting from JSON (fix #7635) (#7655)
Data in the JSON format is split into sentences, and each sentence is
saved with is_sent_start flags. Currently the flags are 1 for the first
token and 0 for the others. When deserialized this results in a pattern
of True, None, None, None... which makes single-sentence documents look
as though they haven't had sentence boundaries set.

Since items saved in JSON format have been split into sentences already,
the is_sent_start values should all be True or False.
2021-04-08 18:24:52 +10:00
..
converters Switch converters to generator functions (#6547) 2020-12-15 16:47:16 +08:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
align.pyx Fix alignment for 1-to-1 tokens and lowercasing (#6476) 2020-12-08 14:25:16 +08:00
alignment.py Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
augment.py Fix lowercase augmentation (#7336) 2021-03-09 14:02:32 +11:00
batchers.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
corpus.py Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Support doc.spans in Example.from_dict (#7197) 2021-03-03 01:12:54 +11:00
gold_io.pyx Fix is_sent_start when converting from JSON (fix #7635) (#7655) 2021-04-08 18:24:52 +10:00
initialize.py Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
iob_utils.py Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2 2020-09-24 14:44:42 +02:00
loggers.py W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429) 2021-04-01 19:36:23 +02:00
loop.py Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
pretrain.py replace "is not" with != 2021-03-18 21:09:11 +01:00