spaCy/spacy
Daniël de Kok da7ad97519
Update TextCatBOW to use the fixed SparseLinear layer (#13149)
* Update `TextCatBOW` to use the fixed `SparseLinear` layer

A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754

This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.

While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.

* Replace TexCatBOW `length_exponent` parameter by `length`

We now round up the length to the next power of two if it isn't
a power of two.

* Remove some tests for TextCatBOW.v2

* Fix missing import
2023-11-29 09:11:54 +01:00
..
cli Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
displacy Fix displacy span stacking (#13068) 2023-11-02 12:02:18 +01:00
kb Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
lang Feature/nn and fo language extensions (#13116) 2023-11-20 07:49:59 +01:00
matcher Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
ml Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
pipeline Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
tests Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
tokens Update for numpy 2.0 deprecations (#13103) 2023-11-06 08:47:53 +01:00
training Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Revert "Load the cli module lazily for spacy.info (#12962)" 2023-10-04 12:33:33 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.7.2 (#13066) 2023-10-16 15:10:55 +02:00
attrs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
attrs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
compat.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
errors.py Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
glossary.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
language.py Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
lexeme.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
lookups.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
morphology.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
morphology.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
parts_of_speech.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
parts_of_speech.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
pipe_analysis.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
scorer.py Update for numpy 2.0 deprecations (#13103) 2023-11-06 08:47:53 +01:00
strings.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
structs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
tokenizer.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
tokenizer.pyx Update Tokenizer.explain for special cases with whitespace (#13086) 2023-11-06 17:29:59 +01:00
ty.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
util.py Warn about reloading dependencies after downloading models (#13081) 2023-11-10 08:05:07 +01:00
vectors.pyx Remove profile=True from currently profiled cython 2023-09-28 17:09:41 +02:00
vocab.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
vocab.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
vocab.pyx Remove profile=True from currently profiled cython 2023-09-28 17:09:41 +02:00