spaCy/spacy/tokens/doc.pxd

72 lines
1.6 KiB
Cython
Raw Normal View History

from cymem.cymem cimport Pool
cimport numpy as np
from ..vocab cimport Vocab
from ..structs cimport TokenC, LexemeC, SpanC
from ..typedefs cimport attr_t
from ..attrs cimport attr_id_t
cdef attr_t get_token_attr(const TokenC* token, attr_id_t feat_name) nogil
cdef attr_t get_token_attr_for_matcher(const TokenC* token, attr_id_t feat_name) nogil
ctypedef const LexemeC* const_Lexeme_ptr
ctypedef const TokenC* const_TokenC_ptr
ctypedef fused LexemeOrToken:
const_Lexeme_ptr
const_TokenC_ptr
cdef int set_children_from_heads(TokenC* tokens, int start, int end) except -1
cdef int _set_lr_kids_and_edges(TokenC* tokens, int start, int end, int loop_count) except -1
cdef int token_by_start(const TokenC* tokens, int length, int start_char) except -2
cdef int token_by_end(const TokenC* tokens, int length, int end_char) except -2
cdef int [:,:] _get_lca_matrix(Doc, int start, int end)
cdef class Doc:
cdef readonly Pool mem
cdef readonly Vocab vocab
cdef public object _vector
cdef public object _vector_norm
2017-05-07 19:04:24 +03:00
cdef public object tensor
2017-07-22 01:34:15 +03:00
cdef public object cats
2016-10-17 12:43:22 +03:00
cdef public object user_data
cdef readonly object spans
2015-11-03 16:15:14 +03:00
cdef TokenC* c
Store activations in `Doc`s when `save_activations` is enabled (#11002) * Store activations in Doc when `store_activations` is enabled This change adds the new `activations` attribute to `Doc`. This attribute can be used by trainable pipes to store their activations, probabilities, and guesses for downstream users. As an example, this change modifies the `tagger` and `senter` pipes to add an `store_activations` option. When this option is enabled, the probabilities and guesses are stored in `set_annotations`. * Change type of `store_activations` to `Union[bool, List[str]]` When the value is: - A bool: all activations are stored when set to `True`. - A List[str]: the activations named in the list are stored * Formatting fixes in Tagger * Support store_activations in spancat and morphologizer * Make Doc.activations type visible to MyPy * textcat/textcat_multilabel: add store_activations option * trainable_lemmatizer/entity_linker: add store_activations option * parser/ner: do not currently support returning activations * Extend tagger and senter tests So that they, like the other tests, also check that we get no activations if no activations were requested. * Document `Doc.activations` and `store_activations` in the relevant pipes * Start errors/warnings at higher numbers to avoid merge conflicts Between the master and v4 branches. * Add `store_activations` to docstrings. * Replace store_activations setter by set_store_activations method Setters that take a different type than what the getter returns are still problematic for MyPy. Replace the setter by a method, so that type inference works everywhere. * Use dict comprehension suggested by @svlandeg * Revert "Use dict comprehension suggested by @svlandeg" This reverts commit 6e7b958f7060397965176c69649e5414f1f24988. * EntityLinker: add type annotations to _add_activations * _store_activations: make kwarg-only, remove doc_scores_lens arg * set_annotations: add type annotations * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * TextCat.predict: return dict * Make the `TrainablePipe.store_activations` property a bool This means that we can also bring back `store_activations` setter. * Remove `TrainablePipe.activations` We do not need to enumerate the activations anymore since `store_activations` is `bool`. * Add type annotations for activations in predict/set_annotations * Rename `TrainablePipe.store_activations` to `save_activations` * Error E1400 is not used anymore This error was used when activations were still `Union[bool, List[str]]`. * Change wording in API docs after store -> save change * docs: tag (save_)activations as new in spaCy 4.0 * Fix copied line in morphologizer activations test * Don't train in any test_save_activations test * Rename activations - "probs" -> "probabilities" - "guesses" -> "label_ids", except in the edit tree lemmatizer, where "guesses" -> "tree_ids". * Remove unused W400 warning. This warning was used when we still allowed the user to specify which activations to save. * Formatting fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Replace "kb_ids" by a constant * spancat: replace a cast by an assertion * Fix EOF spacing * Fix comments in test_save_activations tests * Do not set RNG seed in activation saving tests * Revert "spancat: replace a cast by an assertion" This reverts commit 0bd5730d16432443a2b247316928d4f789ad8741. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-13 10:51:12 +03:00
cdef public dict activations
cdef public dict user_hooks
cdef public dict user_token_hooks
cdef public dict user_span_hooks
cdef public bint has_unknown_spaces
cdef public object _context
cdef int length
cdef int max_length
cdef public object noun_chunks_iterator
2017-10-16 20:22:11 +03:00
cdef object __weakref__
2017-09-26 15:28:50 +03:00
cdef int push_back(self, LexemeOrToken lex_or_tok, bint has_space) except -1
cpdef np.ndarray to_array(self, object features)