spaCy/spacy/tokens/doc.pxd
adrianeboyd b71a11ff6d
Update morphologizer (#5108)
* Add pos and morph scoring to Scorer

Add pos, morph, and morph_per_type to `Scorer`. Report pos and morph
accuracy in `spacy evaluate`.

* Update morphologizer for v3

* switch to tagger-based morphologizer
* use `spacy.HashCharEmbedCNN` for morphologizer defaults
* add `Doc.is_morphed` flag

* Add morphologizer to train CLI

* Add basic morphologizer pipeline tests

* Add simple morphologizer training example

* Remove subword_features from CharEmbed models

Remove `subword_features` argument from `spacy.HashCharEmbedCNN.v1` and
`spacy.HashCharEmbedBiLSTM.v1` since in these cases `subword_features`
is always `False`.

* Rename setting in morphologizer example

Use `with_pos_tags` instead of `without_pos_tags`.

* Fix kwargs for spacy.HashCharEmbedBiLSTM.v1

* Remove defaults for spacy.HashCharEmbedBiLSTM.v1

Remove default `nM/nC` for `spacy.HashCharEmbedBiLSTM.v1`.

* Set random seed for textcat overfitting test
2020-04-02 14:46:32 +02:00

75 lines
1.7 KiB
Cython

from cymem.cymem cimport Pool
cimport numpy as np
from ..vocab cimport Vocab
from ..structs cimport TokenC, LexemeC
from ..typedefs cimport attr_t
from ..attrs cimport attr_id_t
cdef attr_t get_token_attr(const TokenC* token, attr_id_t feat_name) nogil
ctypedef const LexemeC* const_Lexeme_ptr
ctypedef const TokenC* const_TokenC_ptr
ctypedef fused LexemeOrToken:
const_Lexeme_ptr
const_TokenC_ptr
cdef int set_children_from_heads(TokenC* tokens, int length) except -1
cdef int _set_lr_kids_and_edges(TokenC* tokens, int length, int loop_count) except -1
cdef int token_by_start(const TokenC* tokens, int length, int start_char) except -2
cdef int token_by_end(const TokenC* tokens, int length, int end_char) except -2
cdef int set_children_from_heads(TokenC* tokens, int length) except -1
cdef int [:,:] _get_lca_matrix(Doc, int start, int end)
cdef class Doc:
cdef readonly Pool mem
cdef readonly Vocab vocab
cdef public object _vector
cdef public object _vector_norm
cdef public object tensor
cdef public object cats
cdef public object user_data
cdef TokenC* c
cdef public bint is_tagged
cdef public bint is_parsed
cdef public bint is_morphed
cdef public float sentiment
cdef public dict user_hooks
cdef public dict user_token_hooks
cdef public dict user_span_hooks
cdef public list _py_tokens
cdef int length
cdef int max_length
cdef public object noun_chunks_iterator
cdef object __weakref__
cdef int push_back(self, LexemeOrToken lex_or_tok, bint has_space) except -1
cpdef np.ndarray to_array(self, object features)
cdef void set_parse(self, const TokenC* parsed) nogil