This was the only registry that expected the registered objects to be dictionaries instead of functions that return something. We can still support plain dicts but we should also support functions for consistency
The Language.use_params method was failing if you passed in None, which
meant we had to use awkward conditionals for the parameter averaging.
This solves the problem.
Follow-ups to the parser efficiency fix.
* Avoid introducing new counter for number of pushes
* Base cut on number of transitions, keeping it more even
* Reintroduce the randomization we had in v2.
The parser training makes use of a trick for long documents, where we
use the oracle to cut up the document into sections, so that we can have
batch items in the middle of a document. For instance, if we have one
document of 600 words, we might make 6 states, starting at words 0, 100,
200, 300, 400 and 500.
The problem is for v3, I screwed this up and didn't stop parsing! So
instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch
of [600, 500, 400, 300, 200, 100]. Oops.
The implementation here could probably be improved, it's annoying to
have this extra variable in the state. But this'll do.
This makes the v3 parser training 5-10 times faster, depending on document
lengths. This problem wasn't in v2.
A long time ago we went to some trouble to try to clean up "unused"
strings, to avoid the `StringStore` growing in long-running processes.
This never really worked reliably, and I think it was a really wrong
approach. It's much better to let the user reload the `nlp` object as
necessary, now that the string encoding is stable (in v1, the string IDs
were sequential integers, making reloading the NLP object really
annoying.)
The extra book-keeping does make some performance difference, and the
feature is unsed, so it's past time we killed it.
* Prevent Tagger model init with 0 labels
Raise an error before trying to initialize a tagger model with 0 labels.
* Add dummy tagger label for test
* Remove tagless tagger model initializiation
* Fix error number after merge
* Add dummy tagger label to test
* Fix formatting
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* rename to spacy-transformers.TransformerListener
* add some more tok2vec tests
* use select_pipes
* fix docs - annotation setter was not changed in the end