* Fix scoring normalization (#7629)
* fix scoring normalization
* score weights by total sum instead of per component
* cleanup
* more cleanup
* Use a context manager when reading model (fix#7036) (#8244)
* Fix other open calls without context managers (#8245)
* Don't add duplicate patterns all the time in EntityRuler (fix#8216) (#8246)
* Don't add duplicate patterns (fix#8216)
* Refactor EntityRuler init
This simplifies the EntityRuler init code. This is helpful as prep for
allowing the EntityRuler to reset itself.
* Make EntityRuler.clear reset matchers
Includes a new test for this.
* Tidy PhraseMatcher instantiation
Since the attr can be None safely now, the guard if is no longer
required here.
Also renamed the `_validate` attr. Maybe it's not needed?
* Fix NER test
* Add test to make sure patterns aren't increasing
* Move test to regression tests
* Exclude generated .cpp files from package (#8271)
* Fix non-deterministic deduplication in Greek lemmatizer (#8421)
* Fix setting empty entities in Example.from_dict (#8426)
* Filter W036 for entity ruler, etc. (#8424)
* Preserve paths.vectors/initialize.vectors setting in quickstart template
* Various fixes for spans in Docs.from_docs (#8487)
* Fix spans offsets if a doc ends in a single space and no space is
inserted
* Also include spans key in merged doc for empty spans lists
* Fix duplicate spacy package CLI opts (#8551)
Use `-c` for `--code` and not additionally for `--create-meta`, in line
with the docs.
* Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels
Raise an error if initializing a `textcat` component without at least
two labels.
* Add similar note to docs
* Update positive_label description in API docs
* Add Macedonian models to website (#8637)
* Fix Azerbaijani init, extend lang init tests (#8656)
* Extend langs in initialize tests
* Fix az init
* Fix ru/uk lemmatizer mp with spawn (#8657)
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
* Use 0-vector for OOV lexemes (#8639)
* Set version to v3.0.7
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* add capture argument to project_run and run_commands
* git bump to 3.0.1
* Set version to 3.0.1.dev0
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos
* rename Pipe to TrainablePipe
* split functionality between Pipe and TrainablePipe
* remove unnecessary methods from certain components
* cleanup
* hasattr(component, "pipe") should be sufficient again
* remove serialization and vocab/cfg from Pipe
* unify _ensure_examples and validate_examples
* small fixes
* hasattr checks for self.cfg and self.vocab
* make is_resizable and is_trainable properties
* serialize strings.json instead of vocab
* fix KB IO + tests
* fix typos
* more typos
* _added_strings as a set
* few more tests specifically for _added_strings field
* bump to 3.0.0a36