spaCy/spacy/training
Adriane Boyd 1c4df8fd09
Replace pytokenizations with internal alignment (#6293)
* Replace pytokenizations with internal alignment

Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.

* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`

* Refactor trailing whitespace handling

* Remove unnecessary exception for empty docs

Allow a non-empty whitespace-only doc to be aligned with an empty doc

* Remove empty docs exceptions completely
2020-11-03 16:24:38 +01:00
..
converters fix E902 and E903 numbering 2020-10-05 13:43:32 +02:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
align.pyx Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
alignment.py Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
augment.py Auto-format [ci skip] 2020-10-05 21:58:18 +02:00
batchers.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
corpus.py Integrate file readers 2020-10-02 01:36:06 +02:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
gold_io.pyx Use null raw for has_unknown_spaces in docs_to_json 2020-10-15 09:57:54 +02:00
initialize.py TextCat updates and fixes (#6263) 2020-10-18 14:50:41 +02:00
iob_utils.py Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2 2020-09-24 14:44:42 +02:00
loggers.py Make console logger table more compact 2020-10-11 12:55:46 +02:00
loop.py Fix success message [ci skip] 2020-10-15 14:42:08 +02:00
pretrain.py fix resolving of dot notation (#6326) 2020-10-31 12:17:06 +01:00