* Update for python 3.10
* Update mac image
* Update build constraints for python 3.10
* Add extras for cupy cuda 11.3-11.5
* Remove cupy-cuda115 extra
* Require thinc>=8.0.12
* Switch CI to windows-2019
* Skip mypy for python 3.10
* 🚨 Ignore all existing Mypy errors
* 🏗 Add Mypy check to CI
* Add types-mock and types-requests as dev requirements
* Add additional type ignore directives
* Add types packages to dev-only list in reqs test
* Add types-dataclasses for python 3.6
* Add ignore to pretrain
* 🏷 Improve type annotation on `run_command` helper
The `run_command` helper previously declared that it returned an
`Optional[subprocess.CompletedProcess]`, but it isn't actually possible
for the function to return `None`. These changes modify the type
annotation of the `run_command` helper and remove all now-unnecessary
`# type: ignore` directives.
* 🔧 Allow variable type redefinition in limited contexts
These changes modify how Mypy is configured to allow variables to have
their type automatically redefined under certain conditions. The Mypy
documentation contains the following example:
```python
def process(items: List[str]) -> None:
# 'items' has type List[str]
items = [item.split() for item in items]
# 'items' now has type List[List[str]]
...
```
This configuration change is especially helpful in reducing the number
of `# type: ignore` directives needed to handle the common pattern of:
* Accepting a filepath as a string
* Overwriting the variable using `filepath = ensure_path(filepath)`
These changes enable redefinition and remove all `# type: ignore`
directives rendered redundant by this change.
* 🏷 Add type annotation to converters mapping
* 🚨 Fix Mypy error in convert CLI argument verification
* 🏷 Improve type annotation on `resolve_dot_names` helper
* 🏷 Add type annotations for `Vocab` attributes `strings` and `vectors`
* 🏷 Add type annotations for more `Vocab` attributes
* 🏷 Add loose type annotation for gold data compilation
* 🏷 Improve `_format_labels` type annotation
* 🏷 Fix `get_lang_class` type annotation
* 🏷 Loosen return type of `Language.evaluate`
* 🏷 Don't accept `Scorer` in `handle_scores_per_type`
* 🏷 Add `string_to_list` overloads
* 🏷 Fix non-Optional command-line options
* 🙈 Ignore redefinition of `wandb_logger` in `loggers.py`
* ➕ Install `typing_extensions` in Python 3.8+
The `typing_extensions` package states that it should be used when
"writing code that must be compatible with multiple Python versions".
Since SpaCy needs to support multiple Python versions, it should be used
when newer `typing` module members are required. One example of this is
`Literal`, which is available starting with Python 3.8.
Previously SpaCy tried to import `Literal` from `typing`, falling back
to `typing_extensions` if the import failed. However, Mypy doesn't seem
to be able to understand what `Literal` means when the initial import
means. Therefore, these changes modify how `compat` imports `Literal` by
always importing it from `typing_extensions`.
These changes also modify how `typing_extensions` is installed, so that
it is a requirement for all Python versions, including those greater
than or equal to 3.8.
* 🏷 Improve type annotation for `Language.pipe`
These changes add a missing overload variant to the type signature of
`Language.pipe`. Additionally, the type signature is enhanced to allow
type checkers to differentiate between the two overload variants based
on the `as_tuple` parameter.
Fixes#8772
* ➖ Don't install `typing-extensions` in Python 3.8+
After more detailed analysis of how to implement Python version-specific
type annotations using SpaCy, it has been determined that by branching
on a comparison against `sys.version_info` can be statically analyzed by
Mypy well enough to enable us to conditionally use
`typing_extensions.Literal`. This means that we no longer need to
install `typing_extensions` for Python versions greater than or equal to
3.8! 🎉
These changes revert previous changes installing `typing-extensions`
regardless of Python version and modify how we import the `Literal` type
to ensure that Mypy treats it properly.
* resolve mypy errors for Strict pydantic types
* refactor code to avoid missing return statement
* fix types of convert CLI command
* avoid list-set confustion in debug_data
* fix typo and formatting
* small fixes to avoid type ignores
* fix types in profile CLI command and make it more efficient
* type fixes in projects CLI
* put one ignore back
* type fixes for render
* fix render types - the sequel
* fix BaseDefault in language definitions
* fix type of noun_chunks iterator - yields tuple instead of span
* fix types in language-specific modules
* 🏷 Expand accepted inputs of `get_string_id`
`get_string_id` accepts either a string (in which case it returns its
ID) or an ID (in which case it immediately returns the ID). These
changes extend the type annotation of `get_string_id` to indicate that
it can accept either strings or IDs.
* 🏷 Handle override types in `combine_score_weights`
The `combine_score_weights` function allows users to pass an `overrides`
mapping to override data extracted from the `weights` argument. Since it
allows `Optional` dictionary values, the return value may also include
`Optional` dictionary values.
These changes update the type annotations for `combine_score_weights` to
reflect this fact.
* 🏷 Fix tokenizer serialization method signatures in `DummyTokenizer`
* 🏷 Fix redefinition of `wandb_logger`
These changes fix the redefinition of `wandb_logger` by giving a
separate name to each `WandbLogger` version. For
backwards-compatibility, `spacy.train` still exports `wandb_logger_v3`
as `wandb_logger` for now.
* more fixes for typing in language
* type fixes in model definitions
* 🏷 Annotate `_RandomWords.probs` as `NDArray`
* 🏷 Annotate `tok2vec` layers to help Mypy
* 🐛 Fix `_RandomWords.probs` type annotations for Python 3.6
Also remove an import that I forgot to move to the top of the module 😅
* more fixes for matchers and other pipeline components
* quick fix for entity linker
* fixing types for spancat, textcat, etc
* bugfix for tok2vec
* type annotations for scorer
* add runtime_checkable for Protocol
* type and import fixes in tests
* mypy fixes for training utilities
* few fixes in util
* fix import
* 🐵 Remove unused `# type: ignore` directives
* 🏷 Annotate `Language._components`
* 🏷 Annotate `spacy.pipeline.Pipe`
* add doc as property to span.pyi
* small fixes and cleanup
* explicit type annotations instead of via comment
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: svlandeg <svlandeg@github.com>
* use language-matching to allow language code aliases
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* link to "IETF language tags" in docs
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* Make requirements consistent
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* change "two-letter language ID" to "IETF language tag" in language docs
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* use langcodes 3.2 and handle language-tag errors better
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* all unknown language codes are ImportErrors
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
* factor out the WandB logger into spacy-loggers
Signed-off-by: Elia Robyn Speer <gh@arborelia.net>
* depend on spacy-loggers so they are available
Signed-off-by: Elia Robyn Speer <gh@arborelia.net>
* remove docs of spacy.WandbLogger.v2 (moved to spacy-loggers)
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* Version number suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* update references to WandbLogger
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
* make order of deps more consistent
Signed-off-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Draft spancat model
* Add spancat model
* Add test for extract_spans
* Add extract_spans layer
* Upd extract_spans
* Add spancat model
* Add test for spancat model
* Upd spancat model
* Update spancat component
* Upd spancat
* Update spancat model
* Add quick spancat test
* Import SpanCategorizer
* Fix SpanCategorizer component
* Import SpanGroup
* Fix span extraction
* Fix import
* Fix import
* Upd model
* Update spancat models
* Add scoring, update defaults
* Update and add docs
* Fix type
* Update spacy/ml/extract_spans.py
* Auto-format and fix import
* Fix comment
* Fix type
* Fix type
* Update website/docs/api/spancategorizer.md
* Fix comment
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Better defense
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix labels list
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/extract_spans.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/pipeline/spancat.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Set annotations during update
* Set annotations in spancat
* fix imports in test
* Update spacy/pipeline/spancat.py
* replace MaxoutLogistic with LinearLogistic
* fix config
* various small fixes
* remove set_annotations parameter in update
* use our beloved tupley format with recent support for doc.spans
* bugfix to allow renaming the default span_key (scores weren't showing up)
* use different key in docs example
* change defaults to better-working parameters from project (WIP)
* register spacy.extract_spans.v1 for legacy purposes
* Upd dev version so can build wheel
* layers instead of architectures for smaller building blocks
* Update website/docs/api/spancategorizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Include additional scores from overrides in combined score weights
* Parameterize spans key in scoring
Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so
that it's possible to evaluate multiple `spancat` components in the same
pipeline.
* Use the (intentionally very short) default spans key `sc` in the
`SpanCategorizer`
* Adjust the default score weights to include the default key
* Adjust the scorer to use `spans_{spans_key}` as the prefix for the
returned score
* Revert addition of `attr_name` argument to `score_spans` and adjust
the key in the `getter` instead.
Note that for `spancat` components with a custom `span_key`, the score
weights currently need to be modified manually in
`[training.score_weights]` for them to be available during training. To
suppress the default score weights `spans_sc_p/r/f` during training, set
them to `null` in `[training.score_weights]`.
* Update website/docs/api/scorer.md
* Fix scorer for spans key containing underscore
* Increment version
* Add Spans to Evaluate CLI (#8439)
* Add Spans to Evaluate CLI
* Change to spans_key
* Add spans per_type output
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Fix spancat GPU issues (#8455)
* Fix GPU issues
* Require thinc >=8.0.6
* Switch to glorot_uniform_init
* Fix and test ngram suggester
* Include final ngram in doc for all sizes
* Fix ngrams for docs of the same length as ngram size
* Handle batches of docs that result in no ngrams
* Add tests
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Nirant <NirantK@users.noreply.github.com>
* implement textcat resizing for TextCatCNN
* resizing textcat in-place
* simplify code
* ensure predictions for old textcat labels remain the same after resizing (WIP)
* fix for softmax
* store softmax as attr
* fix ensemble weight copy and cleanup
* restructure slightly
* adjust documentation, update tests and quickstart templates to use latest versions
* extend unit test slightly
* revert unnecessary edits
* fix typo
* ensemble architecture won't be resizable for now
* use resizable layer (WIP)
* revert using resizable layer
* resizable container while avoid shape inference trouble
* cleanup
* ensure model continues training after resizing
* use fill_b parameter
* use fill_defaults
* resize_layer callback
* format
* bump thinc to 8.0.4
* bump spacy-legacy to 3.0.6
* Replace negative rows with 0 in StaticVectors
Replace negative row indices with 0-vectors in `StaticVectors`.
* Increase versions related to StaticVectors
* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations
Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5
* Update config defaults to new versions
* Update docs
* Set catalogue lower pin to v2.0.2
* Update importlib-metadata pins to match
* Require catalogue v2.0.3
Switch to vendored `importlib-metadata` v3.2.0 provided by `catalogue`.
* Allow output_path to be None during training
* Fix cat scoring (?)
* Improve error message for weighted None score
* Improve messages
So we can call this in other places etc.
* FIx output path check
* Use latest wasabi
* Revert "Improve error message for weighted None score"
This reverts commit 7059926763.
* Exclude None scores from final score by default
It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc.
* Update warnings and use logger.warning
* Get basic beam tests working
* Get basic beam tests working
* Compile _beam_utils
* Remove prints
* Test beam density
* Beam parser seems to train
* Draft beam NER
* Upd beam
* Add hypothesis as dev dependency
* Implement missing is-gold-parse method
* Implement early update
* Fix state hashing
* Fix test
* Fix test
* Default to non-beam in parser constructor
* Improve oracle for beam
* Start refactoring beam
* Update test
* Refactor beam
* Update nn
* Refactor beam and weight by cost
* Update ner beam settings
* Update test
* Add __init__.pxd
* Upd test
* Fix test
* Upd test
* Fix test
* Remove ring buffer history from StateC
* WIP change arc-eager transitions
* Add state tests
* Support ternary sent start values
* Fix arc eager
* Fix NER
* Pass oracle cut size for beam
* Fix ner test
* Fix beam
* Improve StateC.clone
* Improve StateClass.borrow
* Work directly with StateC, not StateClass
* Remove print statements
* Fix state copy
* Improve state class
* Refactor parser oracles
* Fix arc eager oracle
* Fix arc eager oracle
* Use a vector to implement the stack
* Refactor state data structure
* Fix alignment of sent start
* Add get_aligned_sent_starts method
* Add test for ae oracle when bad sentence starts
* Fix sentence segment handling
* Avoid Reduce that inserts illegal sentence
* Update preset SBD test
* Fix test
* Remove prints
* Fix sent starts in Example
* Improve python API of StateClass
* Tweak comments and debug output of arc eager
* Upd test
* Fix state test
* Fix state test
* Replace pytokenizations with internal alignment
Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.
* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`
* Refactor trailing whitespace handling
* Remove unnecessary exception for empty docs
Allow a non-empty whitespace-only doc to be aligned with an empty doc
* Remove empty docs exceptions completely
* ensure Language passes on valid examples for initialization
* fix tagger model initialization
* check for valid get_examples across components
* assume labels were added before begin_training
* fix senter initialization
* fix morphologizer initialization
* use methods to check arguments
* test textcat init, requires thinc>=8.0.0a31
* fix tok2vec init
* fix entity linker init
* use islice
* fix simple NER
* cleanup debug model
* fix assert statements
* fix tests
* throw error when adding a label if the output layer can't be resized anymore
* fix test
* add failing test for simple_ner
* UX improvements
* morphologizer UX
* assume begin_training gets a representative set and processes the labels
* remove assumptions for output of untrained NER model
* restore test for original purpose
* Update with WIP
* Update with WIP
* Update with pipeline serialization
* Update types and pipe factories
* Add deep merge, tidy up and add tests
* Fix pipe creation from config
* Don't validate default configs on load
* Update spacy/language.py
Co-authored-by: Ines Montani <ines@ines.io>
* Adjust factory/component meta error
* Clean up factory args and remove defaults
* Add test for failing empty dict defaults
* Update pipeline handling and methods
* provide KB as registry function instead of as object
* small change in test to make functionality more clear
* update example script for EL configuration
* Fix typo
* Simplify test
* Simplify test
* splitting pipes.pyx into separate files
* moving default configs to each component file
* fix batch_size type
* removing default values from component constructors where possible (TODO: test 4725)
* skip instead of xfail
* Add test for config -> nlp with multiple instances
* pipeline.pipes -> pipeline.pipe
* Tidy up, document, remove kwargs
* small cleanup/generalization for Tok2VecListener
* use DEFAULT_UPSTREAM field
* revert to avoid circular imports
* Fix tests
* Replace deprecated arg
* Make model dirs require config
* fix pickling of keyword-only arguments in constructor
* WIP: clean up and integrate full config
* Add helper to handle function args more reliably
Now also includes keyword-only args
* Fix config composition and serialization
* Improve config debugging and add visual diff
* Remove unused defaults and fix type
* Remove pipeline and factories from meta
* Update spacy/default_config.cfg
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/default_config.cfg
* small UX edits
* avoid printing stack trace for debug CLI commands
* Add support for language-specific factories
* specify the section of the config which holds the model to debug
* WIP: add Language.from_config
* Update with language data refactor WIP
* Auto-format
* Add backwards-compat handling for Language.factories
* Update morphologizer.pyx
* Fix morphologizer
* Update and simplify lemmatizers
* Fix Japanese tests
* Port over tagger changes
* Fix Chinese and tests
* Update to latest Thinc
* WIP: xfail first Russian lemmatizer test
* Fix component-specific overrides
* fix nO for output layers in debug_model
* Fix default value
* Fix tests and don't pass objects in config
* Fix deep merging
* Fix lemma lookup data registry
Only load the lookups if an entry is available in the registry (and if spacy-lookups-data is installed)
* Add types
* Add Vocab.from_config
* Fix typo
* Fix tests
* Make config copying more elegant
* Fix pipe analysis
* Fix lemmatizers and is_base_form
* WIP: move language defaults to config
* Fix morphology type
* Fix vocab
* Remove comment
* Update to latest Thinc
* Add morph rules to config
* Tidy up
* Remove set_morphology option from tagger factory
* Hack use_gpu
* Move [pipeline] to top-level block and make [nlp.pipeline] list
Allows separating component blocks from component order – otherwise, ordering the config would mean a changed component order, which is bad. Also allows initial config to define more components and not use all of them
* Fix use_gpu and resume in CLI
* Auto-format
* Remove resume from config
* Fix formatting and error
* [pipeline] -> [components]
* Fix types
* Fix tagger test: requires set_morphology?
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* step_through tests: skip instead of xfail
* test_empty_doc should be fixed with new Thinc version
* remove outdated test (there are other misaligned tests now)
* xfail reason
* fix test according to french exceptions
* clarified some skipped tests
* skip ukranian test instead of xfail
* skip instead of xfail
* skip + reason instead of xfail
* removed obsolete tests referring to removed "set_frozen" functionality
* fix test 999
* remove unused AlignmentError
* remove xfail where possible, skip otherwise
* increment thinc release for empty_doc test
* fix grad_clip naming
* cleaning up pretrained_vectors out of cfg
* further refactoring Model init's
* move Model building out of pipes
* further refactor to require a model config when creating a pipe
* small fixes
* making cfg in nn_parser more consistent
* fixing nr_class for parser
* fixing nn_parser's nO
* fix printing of loss
* architectures in own file per type, consistent naming
* convenience methods default_tagger_config and default_tok2vec_config
* let create_pipe access default config if available for that component
* default_parser_config
* move defaults to separate folder
* allow reading nlp from package or dir with argument 'name'
* architecture spacy.VocabVectors.v1 to read static vectors from file
* cleanup
* default configs for nel, textcat, morphologizer, tensorizer
* fix imports
* fixing unit tests
* fixes and clean up
* fixing defaults, nO, fix unit tests
* restore parser IO
* fix IO
* 'fix' serialization test
* add *.cfg to manifest
* fix example configs with additional arguments
* replace Morpohologizer with Tagger
* add IO bit when testing overfitting of tagger (currently failing)
* fix IO - don't initialize when reading from disk
* expand overfitting tests to also check IO goes OK
* remove dropout from HashEmbed to fix Tagger performance
* add defaults for sentrec
* update thinc
* always pass a Model instance to a Pipe
* fix piped_added statement
* remove obsolete W029
* remove obsolete errors
* restore byte checking tests (work again)
* clean up test
* further test cleanup
* convert from config to Model in create_pipe
* bring back error when component is not initialized
* cleanup
* remove calls for nlp2.begin_training
* use thinc.api in imports
* allow setting charembed's nM and nC
* fix for hardcoded nM/nC + unit test
* formatting fixes
* trigger build
* Add load_from_config function
* Add train_from_config script
* Merge configs and expose via spacy.config
* Fix script
* Suggest create_evaluation_callback
* Hard-code for NER
* Fix errors
* Register command
* Add TODO
* Update train-from-config todos
* Fix imports
* Allow delayed setting of parser model nr_class
* Get train-from-config working
* Tidy up and fix scores and printing
* Hide traceback if cancelled
* Fix weighted score formatting
* Fix score formatting
* Make output_path optional
* Add Tok2Vec component
* Tidy up and add tok2vec_tensors
* Add option to copy docs in nlp.update
* Copy docs in nlp.update
* Adjust nlp.update() for set_annotations
* Don't shuffle pipes in nlp.update, decruft
* Support set_annotations arg in component update
* Support set_annotations in parser update
* Add get_gradients method
* Add get_gradients to parser
* Update errors.py
* Fix problems caused by merge
* Add _link_components method in nlp
* Add concept of 'listeners' and ControlledModel
* Support optional attributes arg in ControlledModel
* Try having tok2vec component in pipeline
* Fix tok2vec component
* Fix config
* Fix tok2vec
* Update for Example
* Update for Example
* Update config
* Add eg2doc util
* Update and add schemas/types
* Update schemas
* Fix nlp.update
* Fix tagger
* Remove hacks from train-from-config
* Remove hard-coded config str
* Calculate loss in tok2vec component
* Tidy up and use function signatures instead of models
* Support union types for registry models
* Minor cleaning in Language.update
* Make ControlledModel specifically Tok2VecListener
* Fix train_from_config
* Fix tok2vec
* Tidy up
* Add function for bilstm tok2vec
* Fix type
* Fix syntax
* Fix pytorch optimizer
* Add example configs
* Update for thinc describe changes
* Update for Thinc changes
* Update for dropout/sgd changes
* Update for dropout/sgd changes
* Unhack gradient update
* Work on refactoring _ml
* Remove _ml.py module
* WIP upgrade cli scripts for thinc
* Move some _ml stuff to util
* Import link_vectors from util
* Update train_from_config
* Import from util
* Import from util
* Temporarily add ml.component_models module
* Move ml methods
* Move typedefs
* Update load vectors
* Update gitignore
* Move imports
* Add PrecomputableAffine
* Fix imports
* Fix imports
* Fix imports
* Fix missing imports
* Update CLI scripts
* Update spacy.language
* Add stubs for building the models
* Update model definition
* Update create_default_optimizer
* Fix import
* Fix comment
* Update imports in tests
* Update imports in spacy.cli
* Fix import
* fix obsolete thinc imports
* update srsly pin
* from thinc to ml_datasets for example data such as imdb
* update ml_datasets pin
* using STATE.vectors
* small fix
* fix Sentencizer.pipe
* black formatting
* rename Affine to Linear as in thinc
* set validate explicitely to True
* rename with_square_sequences to with_list2padded
* rename with_flatten to with_list2array
* chaining layernorm
* small fixes
* revert Optimizer import
* build_nel_encoder with new thinc style
* fixes using model's get and set methods
* Tok2Vec in component models, various fixes
* fix up legacy tok2vec code
* add model initialize calls
* add in build_tagger_model
* small fixes
* setting model dims
* fixes for ParserModel
* various small fixes
* initialize thinc Models
* fixes
* consistent naming of window_size
* fixes, removing set_dropout
* work around Iterable issue
* remove legacy tok2vec
* util fix
* fix forward function of tok2vec listener
* more fixes
* trying to fix PrecomputableAffine (not succesful yet)
* alloc instead of allocate
* add morphologizer
* rename residual
* rename fixes
* Fix predict function
* Update parser and parser model
* fixing few more tests
* Fix precomputable affine
* Update component model
* Update parser model
* Move backprop padding to own function, for test
* Update test
* Fix p. affine
* Update NEL
* build_bow_text_classifier and extract_ngrams
* Fix parser init
* Fix test add label
* add build_simple_cnn_text_classifier
* Fix parser init
* Set gpu off by default in example
* Fix tok2vec listener
* Fix parser model
* Small fixes
* small fix for PyTorchLSTM parameters
* revert my_compounding hack (iterable fixed now)
* fix biLSTM
* Fix uniqued
* PyTorchRNNWrapper fix
* small fixes
* use helper function to calculate cosine loss
* small fixes for build_simple_cnn_text_classifier
* putting dropout default at 0.0 to ensure the layer gets built
* using thinc util's set_dropout_rate
* moving layer normalization inside of maxout definition to optimize dropout
* temp debugging in NEL
* fixed NEL model by using init defaults !
* fixing after set_dropout_rate refactor
* proper fix
* fix test_update_doc after refactoring optimizers in thinc
* Add CharacterEmbed layer
* Construct tagger Model
* Add missing import
* Remove unused stuff
* Work on textcat
* fix test (again :)) after optimizer refactor
* fixes to allow reading Tagger from_disk without overwriting dimensions
* don't build the tok2vec prematuraly
* fix CharachterEmbed init
* CharacterEmbed fixes
* Fix CharacterEmbed architecture
* fix imports
* renames from latest thinc update
* one more rename
* add initialize calls where appropriate
* fix parser initialization
* Update Thinc version
* Fix errors, auto-format and tidy up imports
* Fix validation
* fix if bias is cupy array
* revert for now
* ensure it's a numpy array before running bp in ParserStepModel
* no reason to call require_gpu twice
* use CupyOps.to_numpy instead of cupy directly
* fix initialize of ParserModel
* remove unnecessary import
* fixes for CosineDistance
* fix device renaming
* use refactored loss functions (Thinc PR 251)
* overfitting test for tagger
* experimental settings for the tagger: avoid zero-init and subword normalization
* clean up tagger overfitting test
* use previous default value for nP
* remove toy config
* bringing layernorm back (had a bug - fixed in thinc)
* revert setting nP explicitly
* remove setting default in constructor
* restore values as they used to be
* add overfitting test for NER
* add overfitting test for dep parser
* add overfitting test for textcat
* fixing init for linear (previously affine)
* larger eps window for textcat
* ensure doc is not None
* Require newer thinc
* Make float check vaguer
* Slop the textcat overfit test more
* Fix textcat test
* Fix exclusive classes for textcat
* fix after renaming of alloc methods
* fixing renames and mandatory arguments (staticvectors WIP)
* upgrade to thinc==8.0.0.dev3
* refer to vocab.vectors directly instead of its name
* rename alpha to learn_rate
* adding hashembed and staticvectors dropout
* upgrade to thinc 8.0.0.dev4
* add name back to avoid warning W020
* thinc dev4
* update srsly
* using thinc 8.0.0a0 !
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>