* Minor updates to quickstart settings/instructions
* set default value of textcat exclusive to `false` until the default
checkbox behavior is updated
* add the `morphologizer` to the list of components
* add a note that v3.0.6+ is required
* Switch to warning above quickstart
* Undo changes to textcat default in quickstart
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Set up CI for tests with GPU agent
* Update tests for enabled GPU
* Fix steps filename
* Add parallel build jobs as a setting
* Fix test requirements
* Fix install test requirements condition
* Fix pipeline models test
* Reset current ops in prefer/require testing
* Fix more tests
* Remove separate test_models test
* Fix regression 5551
* fix StaticVectors for GPU use
* fix vocab tests
* Fix regression test 5082
* Move azure steps to .github and reenable default pool jobs
* Consolidate/rename azure steps
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
* Add callback to copy vocab/tokenizer from model
Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer
settings and/or vocab (including vectors) from a base model.
* Move spacy.copy_from_base_model.v1 to spacy.training.callbacks
* Add documentation
* Modify to specify model as tokenizer and vocab params
* Update sent_starts in Example.from_dict
Update `sent_starts` for `Example.from_dict` so that `Optional[bool]`
values have the same meaning as for `Token.is_sent_start`.
Use `Optional[bool]` as the type for sent start values in the docs.
* Use helper function for conversion to ternary ints
* Fix tokenizer cache flushing
Fix/simplify tokenizer init detection in order to fix cache flushing
when properties are modified.
* Remove init reloading logic
* Remove logic disabling `_reload_special_cases` on init
* Setting `rules` last in `__init__` (as before) means that setting
other properties doesn't reload any special cases
* Reset `rules` first in `from_bytes` so that setting other properties
during deserialization doesn't reload any special cases
unnecessarily
* Reset all properties in `Tokenizer.from_bytes` to allow any settings
to be `None`
* Also reset special matcher when special cache is flushed
* Remove duplicate special case validation
* Add test for special cases flushing
* Extend test for tokenizer deserialization of None values
* Replace negative rows with 0 in StaticVectors
Replace negative row indices with 0-vectors in `StaticVectors`.
* Increase versions related to StaticVectors
* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations
Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5
* Update config defaults to new versions
* Update docs