spaCy/cython-classes.md at 5f680042647ef7d0c71a5041f33558bf81e656d8

explosion/spaCy

Fork 0

mirror of https://github.com/explosion/spaCy.git synced 2025-10-30 23:47:31 +03:00

Ines Montani 82c16b7943 Remove u-strings and fix formatting [ci skip]

2019-09-12 16:11:15 +02:00

10 KiB

Raw Blame History

title

Cython Classes

Doc

doc

Token

token

Span

span

Lexeme

lexeme

Vocab

vocab

StringStore

stringstore

Doc

The Doc object holds an array of TokenC structs.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Doc.

Attributes

Name	Type	Description
`mem`	`cymem.Pool`	A memory pool. Allocated memory will be freed once the `Doc` object is garbage collected.
`vocab`	`Vocab`	A reference to the shared `Vocab` object.
`c`	`TokenC*`	A pointer to a `TokenC` struct.
`length`	`int`	The number of tokens in the document.
`max_length`	`int`	The underlying size of the `Doc.c` array.

Doc.push_back

Append a token to the Doc. The token can be provided as a LexemeC or TokenC pointer, using Cython's fused types.

Example

from spacy.tokens cimport Doc
from spacy.vocab cimport Vocab

doc = Doc(Vocab())
lexeme = doc.vocab.get("hello")
doc.push_back(lexeme, True)
assert doc.text == "hello "

Name	Type	Description
`lex_or_tok`	`LexemeOrToken`	The word to append to the `Doc`.
`has_space`	`bint`	Whether the word has trailing whitespace.

Token

A Cython class providing access and methods for a TokenC struct. Note that the Token object does not own the struct. It only receives a pointer to it.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Token.

Attributes

Name	Type	Description
`vocab`	`Vocab`	A reference to the shared `Vocab` object.
`c`	`TokenC*`	A pointer to a `TokenC` struct.
`i`	`int`	The offset of the token within the document.
`doc`	`Doc`	The parent document.

Token.cinit

Create a Token object from a TokenC* pointer.

Example

token = Token.cinit(&doc.c[3], doc, 3)

Name	Type	Description
`vocab`	`Vocab`	A reference to the shared `Vocab`.
`c`	`TokenC*`	A pointer to a `TokenC`struct.
`offset`	`int`	The offset of the token within the document.
`doc`	`Doc`	The parent document.
RETURNS	`Token`	The newly constructed object.

Span

A Cython class providing access and methods for a slice of a Doc object.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Span.

Attributes

Name	Type	Description
`doc`	`Doc`	The parent document.
`start`	`int`	The index of the first token of the span.
`end`	`int`	The index of the first token after the span.
`start_char`	`int`	The index of the first character of the span.
`end_char`	`int`	The index of the last character of the span.
`label`	`attr_t`	A label to attach to the span, e.g. for named entities.

Lexeme

A Cython class providing access and methods for an entry in the vocabulary.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Lexeme.

Attributes

Name	Type	Description
`c`	`LexemeC*`	A pointer to a `LexemeC` struct.
`vocab`	`Vocab`	A reference to the shared `Vocab` object.
`orth`	`attr_t`	ID of the verbatim text content.

Vocab

A Cython class providing access and methods for a vocabulary and other data shared across a language.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Vocab.

Attributes

Name	Type	Description
`mem`	`cymem.Pool`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected.
`strings`	`StringStore`	A `StringStore` that maps string to hash values and vice versa.
`length`	`int`	The number of entries in the vocabulary.

Vocab.get

Retrieve a LexemeC* pointer from the vocabulary.

Example

lexeme = vocab.get(vocab.mem, "hello")

Name	Type	Description
`mem`	`cymem.Pool`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected.
`string`	unicode	The string of the word to look up.
RETURNS	`const LexemeC*`	The lexeme in the vocabulary.

Vocab.get_by_orth

Retrieve a LexemeC* pointer from the vocabulary.

Example

lexeme = vocab.get_by_orth(doc[0].lex.norm)

Name	Type	Description
`mem`	`cymem.Pool`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected.
`orth`	`attr_t`	ID of the verbatim text content.
RETURNS	`const LexemeC*`	The lexeme in the vocabulary.

StringStore

A lookup table to retrieve strings by 64-bit hashes.

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see StringStore.

Attributes

Name	Type	Description
`mem`	`cymem.Pool`	A memory pool. Allocated memory will be freed once the`StringStore` object is garbage collected.
`keys`	`vector[hash_t]`	A list of hash values in the `StringStore`.

10 KiB Raw Blame History

Doc

Attributes

Doc.push_back

Example

Token

Attributes

Token.cinit

Example

Span

Attributes

Lexeme

Attributes

Vocab

Attributes

Vocab.get

Example

Vocab.get_by_orth

Example

StringStore

Attributes

10 KiB

Raw Blame History