* factor out the WandB logger into spacy-loggers
Signed-off-by: Elia Robyn Speer <gh@arborelia.net>
* depend on spacy-loggers so they are available
Signed-off-by: Elia Robyn Speer <gh@arborelia.net>
* remove docs of spacy.WandbLogger.v2 (moved to spacy-loggers)
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* Version number suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* update references to WandbLogger
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* make order of deps more consistent
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add callback to copy vocab/tokenizer from model
Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer
settings and/or vocab (including vectors) from a base model.
* Move spacy.copy_from_base_model.v1 to spacy.training.callbacks
* Add documentation
* Modify to specify model as tokenizer and vocab params
* Replace pytokenizations with internal alignment
Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.
* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`
* Refactor trailing whitespace handling
* Remove unnecessary exception for empty docs
Allow a non-empty whitespace-only doc to be aligned with an empty doc
* Remove empty docs exceptions completely
* rename Pipe to TrainablePipe
* split functionality between Pipe and TrainablePipe
* remove unnecessary methods from certain components
* cleanup
* hasattr(component, "pipe") should be sufficient again
* remove serialization and vocab/cfg from Pipe
* unify _ensure_examples and validate_examples
* small fixes
* hasattr checks for self.cfg and self.vocab
* make is_resizable and is_trainable properties
* serialize strings.json instead of vocab
* fix KB IO + tests
* fix typos
* more typos
* _added_strings as a set
* few more tests specifically for _added_strings field
* bump to 3.0.0a36
* Support data augmentation in Corpus
* Note initial docs for data augmentation
* Add augmenter to quickstart
* Fix flake8
* Format
* Fix test
* Update spacy/tests/training/test_training.py
* Improve data augmentation arguments
* Update templates
* Move randomization out into caller
* Refactor
* Update spacy/training/augment.py
* Update spacy/tests/training/test_training.py
* Fix augment
* Fix test