* add test for multi-label textcat reproducibility
* remove positive_label
* fix lengths dtype
* fix comments
* remove comment that we should not have forgotten :-)
* Remove blis version constraints
After updating the blis sdist in v0.7.4, remove python version
constraints for blis build and install dependencies.
* Install sdist with --prefer-binary for python 3.5
* Fix duplicate sdist install steps
* Fix sdist install step types
* Fix blis pins in requirements.txt
* Remove wheel hack for python 3.5 from CI
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
* define new architectures for the pretraining objective
* add loss function as attr of the omdel
* cleanup
* cleanup
* shorten name
* fix typo
* remove unused error
* Fix blis build dependencies
* Add blis with python_version constraints to pyproject.toml
* Add blis to setup_requires
* Remove --only-binary from CI
* Reduce number of builds to speed up CI
* Add hack to install wheel for python 3.5 in linux
* Remove os spec from CI
* Remove detailed numpy build constraints
* Remove detailed numpy build constraints from `pyproject.toml` because
it is too difficult to maintain for many architectures
* These constraints are more a reflection of what is available on
pypi as binary wheels rather than any real build requirements that
it is necessary for users to follow when building from source
* Users building their own binary packages will need to enforce the
constraints that make sense in their environments, e.g., the `conda`
compatible numpy pins
* Keep the build constraints in `build-constraints.txt` for use with our
builds
* Our builds with wheelwright are built against the earliest
compatible binary versions of numpy on pypi
* These constraints are documented within the distribution
* Revert "Remove os spec from CI"
This reverts commit 7489476688.
Preserve `token.spacy` corresponding to the span end token in the
original doc rather than adjusting for the current offset.
* If not modifying in place, this checks in the original document
(`doc.c` rather than `tokens`).
* If modifying in place, the document has not been modified past the
current span start position so the value at the current span end
position is valid.
* When checking for token alignments, check not only that the tokens are
identical but that the character positions are both at the start of a
token.
It's possible for the tokens to be identical even though the two
tokens aren't aligned one-to-one in a case like `["a'", "''"]` vs.
`["a", "''", "'"]`, where the middle tokens are identical but should not
be aligned on the token level at character position 2 since it's the
start of one token but the middle of another.
* Use the lowercased version of the token texts to create the
character-to-token alignment because lowercasing can change the string
length (e.g., for `İ`, see the not-a-bug bug report:
https://bugs.python.org/issue34723)