--- title: Cython Structs teaser: C-language objects that let you group variables together next: /api/cython-classes menu: - ['TokenC', 'tokenc'] - ['LexemeC', 'lexemec'] --- ## TokenC {#tokenc tag="C struct" source="spacy/structs.pxd"} Cython data container for the `Token` object. > #### Example > > ```python > token = &doc.c[3] > token_ptr = &doc.c[3] > ``` | Name | Description | | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `lex` | A pointer to the lexeme for the token. ~~const LexemeC\*~~ | | `morph` | An ID allowing lookup of morphological attributes. ~~uint64_t~~ | | `pos` | Coarse-grained part-of-speech tag. ~~univ_pos_t~~ | | `spacy` | A binary value indicating whether the token has trailing whitespace. ~~bint~~ | | `tag` | Fine-grained part-of-speech tag. ~~attr_t (uint64_t)~~ | | `idx` | The character offset of the token within the parent document. ~~int~~ | | `lemma` | Base form of the token, with no inflectional suffixes. ~~attr_t (uint64_t)~~ | | `sense` | Space for storing a word sense ID, currently unused. ~~attr_t (uint64_t)~~ | | `head` | Offset of the syntactic parent relative to the token. ~~int~~ | | `dep` | Syntactic dependency relation. ~~attr_t (uint64_t)~~ | | `l_kids` | Number of left children. ~~uint32_t~~ | | `r_kids` | Number of right children. ~~uint32_t~~ | | `l_edge` | Offset of the leftmost token of this token's syntactic descendants. ~~uint32_t~~ | | `r_edge` | Offset of the rightmost token of this token's syntactic descendants. ~~uint32_t~~ | | `sent_start` | Ternary value indicating whether the token is the first word of a sentence. `0` indicates a missing value, `-1` indicates `False` and `1` indicates `True`. The default value, 0, is interpreted as no sentence break. Sentence boundary detectors will usually set 0 for all tokens except tokens that follow a sentence boundary. ~~int~~ | | `ent_iob` | IOB code of named entity tag. `0` indicates a missing value, `1` indicates `I`, `2` indicates `0` and `3` indicates `B`. ~~int~~ | | `ent_type` | Named entity type. ~~attr_t (uint64_t)~~ | | `ent_id` | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. ~~attr_t (uint64_t)~~ | ### Token.get_struct_attr {#token_get_struct_attr tag="staticmethod, nogil" source="spacy/tokens/token.pxd"} Get the value of an attribute from the `TokenC` struct by attribute ID. > #### Example > > ```python > from spacy.attrs cimport IS_ALPHA > from spacy.tokens cimport Token > > is_alpha = Token.get_struct_attr(&doc.c[3], IS_ALPHA) > ``` | Name | Description | | ----------- | ---------------------------------------------------------------------------------------------------- | | `token` | A pointer to a `TokenC` struct. ~~const TokenC\*~~ | | `feat_name` | The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | **RETURNS** | The value of the attribute. ~~attr_t (uint64_t)~~ | ### Token.set_struct_attr {#token_set_struct_attr tag="staticmethod, nogil" source="spacy/tokens/token.pxd"} Set the value of an attribute of the `TokenC` struct by attribute ID. > #### Example > > ```python > from spacy.attrs cimport TAG > from spacy.tokens cimport Token > > token = &doc.c[3] > Token.set_struct_attr(token, TAG, 0) > ``` | Name | Description | | ----------- | ---------------------------------------------------------------------------------------------------- | | `token` | A pointer to a `TokenC` struct. ~~const TokenC\*~~ | | `feat_name` | The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | `value` | The value to set. ~~attr_t (uint64_t)~~ | ### token_by_start {#token_by_start tag="function" source="spacy/tokens/doc.pxd"} Find a token in a `TokenC*` array by the offset of its first character. > #### Example > > ```python > from spacy.tokens.doc cimport Doc, token_by_start > from spacy.vocab cimport Vocab > > doc = Doc(Vocab(), words=["hello", "world"]) > assert token_by_start(doc.c, doc.length, 6) == 1 > assert token_by_start(doc.c, doc.length, 4) == -1 > ``` | Name | Description | | ------------ | ----------------------------------------------------------------- | | `tokens` | A `TokenC*` array. ~~const TokenC\*~~ | | `length` | The number of tokens in the array. ~~int~~ | | `start_char` | The start index to search for. ~~int~~ | | **RETURNS** | The index of the token in the array or `-1` if not found. ~~int~~ | ### token_by_end {#token_by_end tag="function" source="spacy/tokens/doc.pxd"} Find a token in a `TokenC*` array by the offset of its final character. > #### Example > > ```python > from spacy.tokens.doc cimport Doc, token_by_end > from spacy.vocab cimport Vocab > > doc = Doc(Vocab(), words=["hello", "world"]) > assert token_by_end(doc.c, doc.length, 5) == 0 > assert token_by_end(doc.c, doc.length, 1) == -1 > ``` | Name | Description | | ----------- | ----------------------------------------------------------------- | | `tokens` | A `TokenC*` array. ~~const TokenC\*~~ | | `length` | The number of tokens in the array. ~~int~~ | | `end_char` | The end index to search for. ~~int~~ | | **RETURNS** | The index of the token in the array or `-1` if not found. ~~int~~ | ### set_children_from_heads {#set_children_from_heads tag="function" source="spacy/tokens/doc.pxd"} Set attributes that allow lookup of syntactic children on a `TokenC*` array. This function must be called after making changes to the `TokenC.head` attribute, in order to make the parse tree navigation consistent. > #### Example > > ```python > from spacy.tokens.doc cimport Doc, set_children_from_heads > from spacy.vocab cimport Vocab > > doc = Doc(Vocab(), words=["Baileys", "from", "a", "shoe"]) > doc.c[0].head = 0 > doc.c[1].head = 0 > doc.c[2].head = 3 > doc.c[3].head = 1 > set_children_from_heads(doc.c, doc.length) > assert doc.c[3].l_kids == 1 > ``` | Name | Description | | -------- | ------------------------------------------ | | `tokens` | A `TokenC*` array. ~~const TokenC\*~~ | | `length` | The number of tokens in the array. ~~int~~ | ## LexemeC {#lexemec tag="C struct" source="spacy/structs.pxd"} Struct holding information about a lexical type. `LexemeC` structs are usually owned by the `Vocab`, and accessed through a read-only pointer on the `TokenC` struct. > #### Example > > ```python > lex = doc.c[3].lex > ``` | Name | Description | | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `flags` | Bit-field for binary lexical flag values. ~~flags_t (uint64_t)~~ | | `id` | Usually used to map lexemes to rows in a matrix, e.g. for word vectors. Does not need to be unique, so currently misnamed. ~~attr_t (uint64_t)~~ | | `length` | Number of unicode characters in the lexeme. ~~attr_t (uint64_t)~~ | | `orth` | ID of the verbatim text content. ~~attr_t (uint64_t)~~ | | `lower` | ID of the lowercase form of the lexeme. ~~attr_t (uint64_t)~~ | | `norm` | ID of the lexeme's norm, i.e. a normalized form of the text. ~~attr_t (uint64_t)~~ | | `shape` | Transform of the lexeme's string, to show orthographic features. ~~attr_t (uint64_t)~~ | | `prefix` | Length-N substring from the start of the lexeme. Defaults to `N=1`. ~~attr_t (uint64_t)~~ | | `suffix` | Length-N substring from the end of the lexeme. Defaults to `N=3`. ~~attr_t (uint64_t)~~ | ### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"} Get the value of an attribute from the `LexemeC` struct by attribute ID. > #### Example > > ```python > from spacy.attrs cimport IS_ALPHA > from spacy.lexeme cimport Lexeme > > lexeme = doc.c[3].lex > is_alpha = Lexeme.get_struct_attr(lexeme, IS_ALPHA) > ``` | Name | Description | | ----------- | ---------------------------------------------------------------------------------------------------- | | `lex` | A pointer to a `LexemeC` struct. ~~const LexemeC\*~~ | | `feat_name` | The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | **RETURNS** | The value of the attribute. ~~attr_t (uint64_t)~~ | ### Lexeme.set_struct_attr {#lexeme_set_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"} Set the value of an attribute of the `LexemeC` struct by attribute ID. > #### Example > > ```python > from spacy.attrs cimport NORM > from spacy.lexeme cimport Lexeme > > lexeme = doc.c[3].lex > Lexeme.set_struct_attr(lexeme, NORM, lexeme.lower) > ``` | Name | Description | | ----------- | ---------------------------------------------------------------------------------------------------- | | `lex` | A pointer to a `LexemeC` struct. ~~const LexemeC\*~~ | | `feat_name` | The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | `value` | The value to set. ~~attr_t (uint64_t)~~ | ### Lexeme.c_check_flag {#lexeme_c_check_flag tag="staticmethod, nogil" source="spacy/lexeme.pxd"} Check the value of a binary flag attribute. > #### Example > > ```python > from spacy.attrs cimport IS_STOP > from spacy.lexeme cimport Lexeme > > lexeme = doc.c[3].lex > is_stop = Lexeme.c_check_flag(lexeme, IS_STOP) > ``` | Name | Description | | ----------- | --------------------------------------------------------------------------------------------- | | `lexeme` | A pointer to a `LexemeC` struct. ~~const LexemeC\*~~ | | `flag_id` | The ID of the flag to look up. The flag IDs are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | **RETURNS** | The boolean value of the flag. ~~bint~~ | ### Lexeme.c_set_flag {#lexeme_c_set_flag tag="staticmethod, nogil" source="spacy/lexeme.pxd"} Set the value of a binary flag attribute. > #### Example > > ```python > from spacy.attrs cimport IS_STOP > from spacy.lexeme cimport Lexeme > > lexeme = doc.c[3].lex > Lexeme.c_set_flag(lexeme, IS_STOP, 0) > ``` | Name | Description | | --------- | --------------------------------------------------------------------------------------------- | | `lexeme` | A pointer to a `LexemeC` struct. ~~const LexemeC\*~~ | | `flag_id` | The ID of the flag to look up. The flag IDs are enumerated in `spacy.typedefs`. ~~attr_id_t~~ | | `value` | The value to set. ~~bint~~ |