spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-05 20:31:30 +03:00

History

Paul O'Leary McCann c362006cb9 Fix is_sent_start when converting from JSON (fix #7635 ) (#7655 ) Data in the JSON format is split into sentences, and each sentence is saved with is_sent_start flags. Currently the flags are 1 for the first token and 0 for the others. When deserialized this results in a pattern of True, None, None, None... which makes single-sentence documents look as though they haven't had sentence boundaries set. Since items saved in JSON format have been split into sentences already, the is_sent_start values should all be True or False.		2021-04-08 18:24:52 +10:00
..
converters	Switch converters to generator functions (#6547 )	2020-12-15 16:47:16 +08:00
__init__.pxd	Renaming gold & annotation_setter (#6042 )	2020-09-09 10:31:03 +02:00
__init__.py	Replace pytokenizations with internal alignment (#6293 )	2020-11-03 16:24:38 +01:00
align.pyx	Fix alignment for 1-to-1 tokens and lowercasing (#6476 )	2020-12-08 14:25:16 +08:00
alignment.py	Replace pytokenizations with internal alignment (#6293 )	2020-11-03 16:24:38 +01:00
augment.py	Fix lowercase augmentation (#7336 )	2021-03-09 14:02:32 +11:00
batchers.py	Renaming gold & annotation_setter (#6042 )	2020-09-09 10:31:03 +02:00
corpus.py	Support large/infinite training corpora (#7208 )	2021-04-08 18:08:04 +10:00
example.pxd	Make a pre-check to speed up alignment cache (#6139 )	2020-09-24 18:13:39 +02:00
example.pyx	Support doc.spans in Example.from_dict (#7197 )	2021-03-03 01:12:54 +11:00
gold_io.pyx	Fix is_sent_start when converting from JSON (fix #7635 ) (#7655 )	2021-04-08 18:24:52 +10:00
initialize.py	Support large/infinite training corpora (#7208 )	2021-04-08 18:08:04 +10:00
iob_utils.py	Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2	2020-09-24 14:44:42 +02:00
loggers.py	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 )	2021-04-01 19:36:23 +02:00
loop.py	Support large/infinite training corpora (#7208 )	2021-04-08 18:08:04 +10:00
pretrain.py	replace "is not" with !=	2021-03-18 21:09:11 +01:00