Serialize `morph_rules` with the tagger alongside the `tag_map`.
Use `Morphology.load_tag_map` and `Morphology.load_morph_exceptions` to
load these settings rather than reinitializing the morphology each time
they are changed.
Update `Morphology` to load exceptions in `Morphology.__init__` and
`Morphology.load_morph_exceptions` from the format used in `MORPH_RULES`
rather than the internal format with tuple keys.
* Rename to `Morphology.exc` to `Morphology._exc` for internal use with
tuple keys
* Add `Morphology.exc` as a property that converts the internal `_exc`
back to `MORPH_RULES` format, primarily for serialization
Remove corpus-specific tag maps from the language data for languages
without custom tokenizers. For languages with custom word segmenters
that also provide tags (Japanese and Korean), the tag maps for the
custom tokenizers are kept as the default.
The default tag maps for languages without custom tokenizers are now the
default tag map from `lang/tag_map/py`, UPOS -> UPOS.
* Add morph to morphology in Doc.from_array
Add morphological analyses to morphology table in `Doc.from_array`.
* Use separate vocab in DocBin roundtrip test
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
* Update project CLI hashes, directories, skipping
* Improve clone success message
* Remove unused context args
* Move project-specific utils to project utils
The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util
* Improve run help and add workflows
* Add note re: directory checksum speed
* Fix cloning from subdirectories and output messages
* Remove hard-coded dirs