* Make serialization methods consistent
exclude keyword argument instead of random named keyword arguments and deprecation handling
* Update docs and add section on serialization fields
Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉
See here: https://github.com/explosion/srsly
Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.
At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel.
srsly currently includes forks of the following packages:
ujson
msgpack
msgpack-numpy
cloudpickle
* WIP: replace json/ujson with srsly
* Replace ujson in examples
Use regular json instead of srsly to make code easier to read and follow
* Update requirements
* Fix imports
* Fix typos
* Replace msgpack with srsly
* Fix warning
* Work on refactoring greedy parser
* Compile updated parser
* Fix refactored parser
* Update test
* Fix refactored parser
* Fix refactored parser
* Readd beam search after refactor
* Fix beam search after refactor
* Fix parser
* Fix beam parsing
* Support oracle segmentation in ud-train CLI command
* Avoid relying on final gold check in beam search
* Add a keyword argument sink to GoldParse
* Bug fixes to beam search after refactor
* Avoid importing fused token symbol in ud-run-test, untl that's added
* Avoid importing fused token symbol in ud-run-test, untl that's added
* Don't modify Token in global scope
* Fix error in beam gradient calculation
* Default to beam_update_prob 1
* Set a more aggressive threshold on the max violn update
* Disable some tests to figure out why CI fails
* Disable some tests to figure out why CI fails
* Add some diagnostics to travis.yml to try to figure out why build fails
* Tell Thinc to link against system blas on Travis
* Point thinc to libblas on Travis
* Try running sudo=true for travis
* Unhack travis.sh
* Restore beam_density argument for parser beam
* Require thinc 6.11.1.dev16
* Revert hacks to tests
* Revert hacks to travis.yml
* Update thinc requirement
* Fix parser model loading
* Fix size limits in training data
* Add missing name attribute for parser
* Fix appveyor for Windows
* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.