* Improve tag map initialization and updating
Generalize tag map initialization and updating so that a provided tag
map can be loaded correctly in the CLI.
* normalize provided tag map as necessary
* use the same method for initializing and overwriting the tag map
* Reinitialize cache after loading new tag map
Reinitialize the cache with the right size after loading a new tag map.
* update `Morphologizer.begin_training` for use with `Example`
* make init and begin_training more consistent
* add `Morphology.normalize_features` to normalize outside of
`Morphology.add`
* make sure `get_loss` doesn't create unknown labels when the POS and
morph alignments differ
Serialize `morph_rules` with the tagger alongside the `tag_map`.
Use `Morphology.load_tag_map` and `Morphology.load_morph_exceptions` to
load these settings rather than reinitializing the morphology each time
they are changed.
Update `Morphology` to load exceptions in `Morphology.__init__` and
`Morphology.load_morph_exceptions` from the format used in `MORPH_RULES`
rather than the internal format with tuple keys.
* Rename to `Morphology.exc` to `Morphology._exc` for internal use with
tuple keys
* Add `Morphology.exc` as a property that converts the internal `_exc`
back to `MORPH_RULES` format, primarily for serialization
Remove corpus-specific tag maps from the language data for languages
without custom tokenizers. For languages with custom word segmenters
that also provide tags (Japanese and Korean), the tag maps for the
custom tokenizers are kept as the default.
The default tag maps for languages without custom tokenizers are now the
default tag map from `lang/tag_map/py`, UPOS -> UPOS.
* Add morph to morphology in Doc.from_array
Add morphological analyses to morphology table in `Doc.from_array`.
* Use separate vocab in DocBin roundtrip test
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes