Commit Graph

33 Commits

Author SHA1 Message Date
Daniël de Kok
efdbb722c5
Store activations in Docs when save_activations is enabled (#11002)
* Store activations in Doc when `store_activations` is enabled

This change adds the new `activations` attribute to `Doc`. This
attribute can be used by trainable pipes to store their activations,
probabilities, and guesses for downstream users.

As an example, this change modifies the `tagger` and `senter` pipes to
add an `store_activations` option. When this option is enabled, the
probabilities and guesses are stored in `set_annotations`.

* Change type of `store_activations` to `Union[bool, List[str]]`

When the value is:

- A bool: all activations are stored when set to `True`.
- A List[str]: the activations named in the list are stored

* Formatting fixes in Tagger

* Support store_activations in spancat and morphologizer

* Make Doc.activations type visible to MyPy

* textcat/textcat_multilabel: add store_activations option

* trainable_lemmatizer/entity_linker: add store_activations option

* parser/ner: do not currently support returning activations

* Extend tagger and senter tests

So that they, like the other tests, also check that we get no
activations if no activations were requested.

* Document `Doc.activations` and `store_activations` in the relevant pipes

* Start errors/warnings at higher numbers to avoid merge conflicts

Between the master and v4 branches.

* Add `store_activations` to docstrings.

* Replace store_activations setter by set_store_activations method

Setters that take a different type than what the getter returns are still
problematic for MyPy. Replace the setter by a method, so that type inference
works everywhere.

* Use dict comprehension suggested by @svlandeg

* Revert "Use dict comprehension suggested by @svlandeg"

This reverts commit 6e7b958f70.

* EntityLinker: add type annotations to _add_activations

* _store_activations: make kwarg-only, remove doc_scores_lens arg

* set_annotations: add type annotations

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* TextCat.predict: return dict

* Make the `TrainablePipe.store_activations` property a bool

This means that we can also bring back `store_activations` setter.

* Remove `TrainablePipe.activations`

We do not need to enumerate the activations anymore since `store_activations` is
`bool`.

* Add type annotations for activations in predict/set_annotations

* Rename `TrainablePipe.store_activations` to `save_activations`

* Error E1400 is not used anymore

This error was used when activations were still `Union[bool, List[str]]`.

* Change wording in API docs after store -> save change

* docs: tag (save_)activations as new in spaCy 4.0

* Fix copied line in morphologizer activations test

* Don't train in any test_save_activations test

* Rename activations

- "probs" -> "probabilities"
- "guesses" -> "label_ids", except in the edit tree lemmatizer, where
  "guesses" -> "tree_ids".

* Remove unused W400 warning.

This warning was used when we still allowed the user to specify
which activations to save.

* Formatting fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Replace "kb_ids" by a constant

* spancat: replace a cast by an assertion

* Fix EOF spacing

* Fix comments in test_save_activations tests

* Do not set RNG seed in activation saving tests

* Revert "spancat: replace a cast by an assertion"

This reverts commit 0bd5730d16.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-13 09:51:12 +02:00
Adriane Boyd
c44d243f25 Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master 2022-08-24 07:15:41 +02:00
Lj Miranda
d993df41e5
Update docs for pipeline initialize() methods (#11221)
* Update documentation for dependency parser

* Update documentation for trainable_lemmatizer

* Update documentation for entity_linker

* Update documentation for ner

* Update documentation for morphologizer

* Update documentation for senter

* Update documentation for spancat

* Update documentation for tagger

* Update documentation for textcat

* Update documentation for tok2vec

* Run prettier on edited files

* Apply similar changes in transformer docs

* Remove need to say annotated example explicitly

I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.

* Run prettier on transformer docs
2022-08-03 16:53:02 +02:00
Madeesh Kannan
ba18d2913d
Morphology/Morphologizer optimizations and refactoring (#11024)
* `Morphology`: Refactor to use C types, reduce allocations, remove unused code

* `Morphologzier`: Avoid unnecessary sorting of morpho features

* `Morphologizer`: Remove execessive reallocations of labels, improve hash lookups of labels, coerce `numpy` numeric types to native ints
Update docs

* Remove unused method

* Replace `unique_ptr` usage with `shared_ptr`

* Add type annotations to internal Python methods, rename `hash` variable, fix typos

* Add comment to clarify implementation detail

* Fix return type

* `Morphology`: Stop early when splitting fields and values
2022-07-15 11:14:08 +02:00
Adriane Boyd
ae1b3e960b
Update overwrite and scorer in API docs (#9384)
* Update overwrite and scorer in API docs

* Rephrase morphologizer extend + example
2021-10-11 10:35:07 +02:00
Adriane Boyd
03f234b739 Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
Paul O'Leary McCann
ba6a37d358
Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs

* Add NER docs

* Add Entity Linker docs

* Add assigned fields docs for the tagger

This also adds a preamble, since there wasn't one.

* Add morphologizer docs

* Add dependency parser docs

* Update entityrecognizer docs

This is a little weird because `Doc.ents` is the only thing assigned to,
but it's actually a bidirectional property.

* Add token fields for entityrecognizer

* Fix section name

* Add entity ruler docs

* Add lemmatizer docs

* Add sentencizer/recognizer docs

* Update website/docs/api/entityrecognizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/tagger.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/entityruler.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update type for Doc.ents

This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be
correct.

* Run prettier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Run prettier

* Add transformers section

This basically just moves and renames the "custom attributes" section
from the bottom of the page to be consistent with "assigned attributes"
on other pages.

I looked at moving the paragraph just above the section into the
section, but it includes the unrelated registry additions, so it seemed
better to leave it unchanged.

* Make table header consistent

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-09-01 12:09:39 +02:00
Adriane Boyd
b278f31ee6
Document scorers in registry and components from #8766 (#8929)
* Document scorers in registry and components from #8766

* Update spacy/pipeline/lemmatizer.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/dependencyparser.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Reformat

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-12 12:50:03 +02:00
Adriane Boyd
4d1ef8f695 Tidy up docs 2021-06-28 12:08:15 +02:00
Matthew Honnibal
f049df1715
Revert "Set annotations in update" (#6810)
* Revert "Set annotations in update (#6767)"

This reverts commit e680efc7cc.

* Fix version

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/transition_parser.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update website/docs/api/multilabel_textcategorizer.md

* Update website/docs/api/tok2vec.md

* Update website/docs/usage/layers-architectures.md

* Update website/docs/usage/layers-architectures.md

* Update website/docs/api/transformer.md

* Update website/docs/api/textcategorizer.md

* Update website/docs/api/tagger.md

* Update spacy/pipeline/entity_linker.py

* Update website/docs/api/sentencerecognizer.md

* Update website/docs/api/pipe.md

* Update website/docs/api/morphologizer.md

* Update website/docs/api/entityrecognizer.md

* Update spacy/pipeline/entity_linker.py

* Update spacy/pipeline/multitask.pyx

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/tagger.pyx

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/textcat.py

* Update spacy/pipeline/tok2vec.py

* Update spacy/pipeline/trainable_pipe.pyx

* Update spacy/pipeline/trainable_pipe.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update spacy/pipeline/transition_parser.pyx

* Update website/docs/api/entitylinker.md

* Update website/docs/api/dependencyparser.md

* Update spacy/pipeline/trainable_pipe.pyx
2021-01-25 22:18:45 +08:00
Sofie Van Landeghem
e680efc7cc
Set annotations in update (#6767)
* bump to 3.0.0rc4

* do set_annotations in component update calls

* update docs and remove set_annotations flag

* fix EL test
2021-01-20 11:49:25 +11:00
svlandeg
73fc1ed963 remove labels from morphologizer constructor 2020-11-11 21:48:50 +01:00
Ines Montani
35d695a031 Update docs 2020-10-03 16:08:24 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani
d7469283c5 Update docs [ci skip] 2020-09-29 16:59:21 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
svlandeg
c4f324d5f1 doc fixes 2020-09-12 17:38:54 +02:00
Ines Montani
8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
svlandeg
9073d99fc9 fix link to shape inference section 2020-09-10 10:22:59 +02:00
Ines Montani
2e567a47c2 Update docs and formatting 2020-09-09 21:26:10 +02:00
svlandeg
c89e07927e document individual component API pages 2020-09-09 16:18:38 +02:00
Ines Montani
3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani
b7ec06e331 Update docs [ci skip] 2020-08-11 20:57:23 +02:00
Ines Montani
64f2f84098 Update docstrings and docs [ci skip] 2020-08-10 13:45:22 +02:00
Ines Montani
d5c78c7a34 Update docs and fix consistency 2020-08-09 22:31:52 +02:00
Ines Montani
e9e8fa2466 Update docs and types 2020-07-31 17:02:54 +02:00
Ines Montani
b0f57a0cac Update docs and consistency 2020-07-29 15:14:07 +02:00
Ines Montani
e0ffe36e79 Update docstrings, docs and types 2020-07-29 11:36:42 +02:00
Ines Montani
ae4d8a6ffd Update docstrings, docs and pipe consistency 2020-07-28 13:37:31 +02:00
Ines Montani
d8b519c23c API docs, docstrings and argument consistency 2020-07-27 18:11:45 +02:00
Ines Montani
7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Adriane Boyd
986f7e4d69 Initial draft of Morphologizer API docs 2020-07-20 12:53:02 +02:00
Ines Montani
9ae4040183 Update API docs 2020-07-08 13:34:35 +02:00