Commit Graph

8 Commits

Author SHA1 Message Date
Adriane Boyd
f99d6d5e39
Refactor scoring methods to use registered functions (#8766)
* Add scorer option to components

Add an optional `scorer` parameter to all pipeline components. If a
scoring function is provided, it overrides the default scoring method
for that component.

* Add registered scorers for all components

* Add `scorers` registry
* Move all scoring methods outside of components as independent
  functions and register
* Use the registered scoring methods as defaults in configs and inits

Additional:

* The scoring methods no longer have access to the full component, so
  use settings from `cfg` as default scorer options to handle settings
  such as `labels`, `threshold`, and `positive_label`
* The `attribute_ruler` scoring method no longer has access to the
  patterns, so all scoring methods are called
* Bug fix: `spancat` scoring method is updated to set `allow_overlap` to
  score overlapping spans correctly

* Update Russian lemmatizer to use direct score method

* Check type of cfg in Pipe.score

* Fix check

* Update spacy/pipeline/sentencizer.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Remove validate_examples from scoring functions

* Use Pipe.labels instead of Pipe.cfg["labels"]

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-10 15:13:39 +02:00
Sofie Van Landeghem
e796aab4b3
Resizable textcat (#7862)
* implement textcat resizing for TextCatCNN

* resizing textcat in-place

* simplify code

* ensure predictions for old textcat labels remain the same after resizing (WIP)

* fix for softmax

* store softmax as attr

* fix ensemble weight copy and cleanup

* restructure slightly

* adjust documentation, update tests and quickstart templates to use latest versions

* extend unit test slightly

* revert unnecessary edits

* fix typo

* ensemble architecture won't be resizable for now

* use resizable layer (WIP)

* revert using resizable layer

* resizable container while avoid shape inference trouble

* cleanup

* ensure model continues training after resizing

* use fill_b parameter

* use fill_defaults

* resize_layer callback

* format

* bump thinc to 8.0.4

* bump spacy-legacy to 3.0.6
2021-06-16 11:45:00 +02:00
Adriane Boyd
d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Sofie Van Landeghem
932887b950
textcat scoring fix and multi_label docs (#6974)
* add multi-label textcat to menu

* add infobox on textcat API

* add info to v3 migration guide

* small edits

* further fixes in doc strings

* add infobox to textcat architectures

* add textcat_multilabel to overview of built-in components

* spelling

* fix unrelated warn msg

* Add textcat_multilabel to quickstart [ci skip]

* remove separate documentation page for multilabel_textcategorizer

* small edits

* positive label clarification

* avoid duplicating information in self.cfg and fix textcat.score

* fix multilabel textcat too

* revert threshold to storage in cfg

* revert threshold stuff for multi-textcat

Co-authored-by: Ines Montani <ines@ines.io>
2021-03-09 23:04:22 +11:00
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Ines Montani
d0c3775712 Replace links to nightly docs [ci skip] 2021-01-30 20:09:38 +11:00
Ines Montani
b0b743597c Tidy up and auto-format 2021-01-15 11:57:36 +11:00
Sofie Van Landeghem
afc5714d32
multi-label textcat component (#6474)
* multi-label textcat component

* formatting

* fix comment

* cleanup

* fix from #6481

* random edit to push the tests

* add explicit error when textcat is called with multi-label gold data

* fix error nr

* small fix
2021-01-06 13:07:14 +11:00