* Improve tag map initialization and updating
Generalize tag map initialization and updating so that the tag map can
be loaded correctly prior to loading a `Corpus` with `spacy debug-data`
and `spacy train`.
* normalize provided tag map as necessary
* use the same method for initializing and updating the tag map
* Replace rather than update tag map
Replace rather than update tag map when loading a custom tag map.
Updating the tag map is problematic due to the sorted list of tag names
and the fact that the tag map will contain lingering/unwanted tags from
the default tag map.
* Update CLI scripts
* Reinitialize cache after loading new tag map
Reinitialize the cache with the right size after loading a new tag map.
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
* Update project CLI hashes, directories, skipping
* Improve clone success message
* Remove unused context args
* Move project-specific utils to project utils
The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util
* Improve run help and add workflows
* Add note re: directory checksum speed
* Fix cloning from subdirectories and output messages
* Remove hard-coded dirs
* Make project command a submodule
* Update with WIP
* Add helper for joining commands
* Update docstrins, formatting and types
* Update assets and add support for copying local files
* Fix type
* Update success messages
* remove _convert_examples
* fix test_gold, raise TypeError if tuples are used instead of Example's
* throwing proper errors when the wrong type of objects are passed
* fix deprectated format in tests
* fix deprectated format in parser tests
* fix tests for NEL, morph, senter, tagger, textcat
* update regression tests with new Example format
* use make_doc
* more fixes to nlp.update calls
* few more small fixes for rehearse and evaluate
* only import ml_datasets if really necessary
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test