This reverts commitc8bb08b545
, reversing changes made tob6a509a8d1
.
10 KiB
title | menu | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cython Classes |
|
Doc
The Doc
object holds an array of TokenC
structs.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Doc
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Doc object is garbage collected. |
vocab |
Vocab |
A reference to the shared Vocab object. |
c |
TokenC* |
A pointer to a TokenC struct. |
length |
int |
The number of tokens in the document. |
max_length |
int |
The underlying size of the Doc.c array. |
Doc.push_back
Append a token to the Doc
. The token can be provided as a
LexemeC
or
TokenC
pointer, using Cython's
fused types.
Example
from spacy.tokens cimport Doc from spacy.vocab cimport Vocab doc = Doc(Vocab()) lexeme = doc.vocab.get(u'hello') doc.push_back(lexeme, True) assert doc.text == u'hello '
Name | Type | Description |
---|---|---|
lex_or_tok |
LexemeOrToken |
The word to append to the Doc . |
has_space |
bint |
Whether the word has trailing whitespace. |
Token
A Cython class providing access and methods for a
TokenC
struct. Note that the Token
object does
not own the struct. It only receives a pointer to it.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Token
.
Attributes
Name | Type | Description |
---|---|---|
vocab |
Vocab |
A reference to the shared Vocab object. |
c |
TokenC* |
A pointer to a TokenC struct. |
i |
int |
The offset of the token within the document. |
doc |
Doc |
The parent document. |
Token.cinit
Create a Token
object from a TokenC*
pointer.
Example
token = Token.cinit(&doc.c[3], doc, 3)
Name | Type | Description |
---|---|---|
vocab |
Vocab |
A reference to the shared Vocab . |
c |
TokenC* |
A pointer to a TokenC struct. |
offset |
int |
The offset of the token within the document. |
doc |
Doc |
The parent document. |
RETURNS | Token |
The newly constructed object. |
Span
A Cython class providing access and methods for a slice of a Doc
object.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Span
.
Attributes
Name | Type | Description |
---|---|---|
doc |
Doc |
The parent document. |
start |
int |
The index of the first token of the span. |
end |
int |
The index of the first token after the span. |
start_char |
int |
The index of the first character of the span. |
end_char |
int |
The index of the last character of the span. |
label |
attr_t |
A label to attach to the span, e.g. for named entities. |
Lexeme
A Cython class providing access and methods for an entry in the vocabulary.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Lexeme
.
Attributes
Name | Type | Description |
---|---|---|
c |
LexemeC* |
A pointer to a LexemeC struct. |
vocab |
Vocab |
A reference to the shared Vocab object. |
orth |
attr_t |
ID of the verbatim text content. |
Vocab
A Cython class providing access and methods for a vocabulary and other data shared across a language.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Vocab
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
strings |
StringStore |
A StringStore that maps string to hash values and vice versa. |
length |
int |
The number of entries in the vocabulary. |
Vocab.get
Retrieve a LexemeC*
pointer from the
vocabulary.
Example
lexeme = vocab.get(vocab.mem, u'hello')
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
string |
unicode | The string of the word to look up. |
RETURNS | const LexemeC* |
The lexeme in the vocabulary. |
Vocab.get_by_orth
Retrieve a LexemeC*
pointer from the
vocabulary.
Example
lexeme = vocab.get_by_orth(doc[0].lex.norm)
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
orth |
attr_t |
ID of the verbatim text content. |
RETURNS | const LexemeC* |
The lexeme in the vocabulary. |
StringStore
A lookup table to retrieve strings by 64-bit hashes.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see
StringStore
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once theStringStore object is garbage collected. |
keys |
vector[hash_t] |
A list of hash values in the StringStore . |