* Use cosine loss in Cloze multitask
* Fix char_embed for gpu
* Call resume_training for base model in train CLI
* Fix bilstm_depth default in pretrain command
* Implement character-based pretraining objective
* Use chars loss in ClozeMultitask
* Add method to decode predicted characters
* Fix number characters
* Rescale gradients for mlm
* Fix char embed+vectors in ml
* Fix pipes
* Fix pretrain args
* Move get_characters_loss
* Fix import
* Fix import
* Mention characters loss option in pretrain
* Remove broken 'self attention' option in pretrain
* Revert "Remove broken 'self attention' option in pretrain"
This reverts commit 56b820f6af.
* Document 'characters' objective of pretrain
The model registry refactor of the Tok2Vec function broke loading models
trained with the previous function, because the model tree was slightly
different. Specifically, the new function wrote:
concatenate(norm, prefix, suffix, shape)
To build the embedding layer. In the previous implementation, I had used
the operator overloading shortcut:
( norm | prefix | suffix | shape )
This actually gets mapped to a binary association, giving something
like:
concatenate(norm, concatenate(prefix, concatenate(suffix, shape)))
This is a different tree, so the layers iterate differently and we
loaded the weights wrongly.
* Add arch for MishWindowEncoder
* Support mish in tok2vec and conv window >=2
* Pass new tok2vec settings from parser
* Syntax error
* Fix tok2vec setting
* Fix registration of MishWindowEncoder
* Fix receptive field setting
* Fix mish arch
* Pass more options from parser
* Support more tok2vec options in pretrain
* Require thinc 7.3
* Add docs [ci skip]
* Require thinc 7.3.0.dev0 to run CI
* Run black
* Fix typo
* Update Thinc version
Co-authored-by: Ines Montani <ines@ines.io>