spaCy/spacy/tokens/doc.pxd

71 lines
1.6 KiB
Cython
Raw Permalink Normal View History

cimport numpy as np
2023-06-26 12:41:03 +03:00
from cymem.cymem cimport Pool
from ..attrs cimport attr_id_t
2023-06-26 12:41:03 +03:00
from ..structs cimport LexemeC, SpanC, TokenC
from ..typedefs cimport attr_t
from ..vocab cimport Vocab
cdef attr_t get_token_attr(const TokenC* token, attr_id_t feat_name) nogil
cdef attr_t get_token_attr_for_matcher(const TokenC* token, attr_id_t feat_name) nogil
ctypedef const LexemeC* const_Lexeme_ptr
ctypedef const TokenC* const_TokenC_ptr
ctypedef fused LexemeOrToken:
const_Lexeme_ptr
const_TokenC_ptr
cdef int set_children_from_heads(TokenC* tokens, int start, int end) except -1
cdef int _set_lr_kids_and_edges(TokenC* tokens, int start, int end, int loop_count) except -1
cdef int token_by_start(const TokenC* tokens, int length, int start_char) except -2
cdef int token_by_end(const TokenC* tokens, int length, int end_char) except -2
cdef int [:, :] _get_lca_matrix(Doc, int start, int end)
cdef class Doc:
cdef readonly Pool mem
cdef readonly Vocab vocab
cdef public object _vector
cdef public object _vector_norm
2017-05-07 19:04:24 +03:00
cdef public object tensor
2017-07-22 01:34:15 +03:00
cdef public object cats
2016-10-17 12:43:22 +03:00
cdef public object user_data
cdef readonly object spans
2015-11-03 16:15:14 +03:00
cdef TokenC* c
Store activations in `Doc`s when `save_activations` is enabled (#11002) * Store activations in Doc when `store_activations` is enabled This change adds the new `activations` attribute to `Doc`. This attribute can be used by trainable pipes to store their activations, probabilities, and guesses for downstream users. As an example, this change modifies the `tagger` and `senter` pipes to add an `store_activations` option. When this option is enabled, the probabilities and guesses are stored in `set_annotations`. * Change type of `store_activations` to `Union[bool, List[str]]` When the value is: - A bool: all activations are stored when set to `True`. - A List[str]: the activations named in the list are stored * Formatting fixes in Tagger * Support store_activations in spancat and morphologizer * Make Doc.activations type visible to MyPy * textcat/textcat_multilabel: add store_activations option * trainable_lemmatizer/entity_linker: add store_activations option * parser/ner: do not currently support returning activations * Extend tagger and senter tests So that they, like the other tests, also check that we get no activations if no activations were requested. * Document `Doc.activations` and `store_activations` in the relevant pipes * Start errors/warnings at higher numbers to avoid merge conflicts Between the master and v4 branches. * Add `store_activations` to docstrings. * Replace store_activations setter by set_store_activations method Setters that take a different type than what the getter returns are still problematic for MyPy. Replace the setter by a method, so that type inference works everywhere. * Use dict comprehension suggested by @svlandeg * Revert "Use dict comprehension suggested by @svlandeg" This reverts commit 6e7b958f7060397965176c69649e5414f1f24988. * EntityLinker: add type annotations to _add_activations * _store_activations: make kwarg-only, remove doc_scores_lens arg * set_annotations: add type annotations * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * TextCat.predict: return dict * Make the `TrainablePipe.store_activations` property a bool This means that we can also bring back `store_activations` setter. * Remove `TrainablePipe.activations` We do not need to enumerate the activations anymore since `store_activations` is `bool`. * Add type annotations for activations in predict/set_annotations * Rename `TrainablePipe.store_activations` to `save_activations` * Error E1400 is not used anymore This error was used when activations were still `Union[bool, List[str]]`. * Change wording in API docs after store -> save change * docs: tag (save_)activations as new in spaCy 4.0 * Fix copied line in morphologizer activations test * Don't train in any test_save_activations test * Rename activations - "probs" -> "probabilities" - "guesses" -> "label_ids", except in the edit tree lemmatizer, where "guesses" -> "tree_ids". * Remove unused W400 warning. This warning was used when we still allowed the user to specify which activations to save. * Formatting fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Replace "kb_ids" by a constant * spancat: replace a cast by an assertion * Fix EOF spacing * Fix comments in test_save_activations tests * Do not set RNG seed in activation saving tests * Revert "spancat: replace a cast by an assertion" This reverts commit 0bd5730d16432443a2b247316928d4f789ad8741. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-13 10:51:12 +03:00
cdef public dict activations
cdef public dict user_hooks
cdef public dict user_token_hooks
cdef public dict user_span_hooks
cdef public bint has_unknown_spaces
cdef public object _context
cdef int length
cdef int max_length
cdef public object noun_chunks_iterator
2017-10-16 20:22:11 +03:00
cdef object __weakref__
2017-09-26 15:28:50 +03:00
cdef int push_back(self, LexemeOrToken lex_or_tok, bint has_space) except -1
cpdef np.ndarray to_array(self, object features)