* Update `TextCatBOW` to use the fixed `SparseLinear` layer
A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754
This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.
While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.
* Replace TexCatBOW `length_exponent` parameter by `length`
We now round up the length to the next power of two if it isn't
a power of two.
* Remove some tests for TextCatBOW.v2
* Fix missing import
* add language extensions for norwegian nynorsk and faroese
* update docstring for nn/examples.py
* use relative imports
* add fo and nn tokenizers to pytest fixtures
* add unittests for fo and nn and fix bug in nn
* remove module docstring from fo/__init__.py
* add comments about example sentences' origin
* add license information to faroese data credit
* format unittests using black
* add __init__ files to test/lang/nn and tests/lang/fo
* fix import order and use relative imports in fo/__nit__.py and nn/__init__.py
* Make the tests a bit more compact
* Add fo and nn to website languages
* Add note about jul.
* Add "jul." as exception
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update the "Missing factory" error message
This accounts for model installations that took place during the current Python session.
* Add a note about Jupyter notebooks
* Move error to `spacy.cli.download`
Add extra message for Jupyter sessions
* Add additional note for interactive sessions
* Remove note about `spacy-transformers` from error message
* `isort`
* Improve checks for colab (also helps displacy)
* Update warning messages
* Improve flow for multiple checks
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update Tokenizer.explain for special cases with whitespace
Update `Tokenizer.explain` to skip special case matches if the exact
text has not been matched due to intervening whitespace.
Enable fuzzy `Tokenizer.explain` tests with additional whitespace
normalization.
* Add unit test for special cases with whitespace, xfail fuzzy tests again
* Fix displacy span stacking.
* Format. Remove counter.
* Remove test files.
* Add unit test. Refactor to allow for unit test.
* Fix off-by-one error in tests.
* Add note on score_weight if using a non-default span_key for SpanCat.
* Fix formatting.
* Fix formatting.
* Fix typo.
* Use warning infobox.
* Fix infobox formatting.