10 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	| title | menu | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cython Classes | 
  | 
Doc
The Doc object holds an array of TokenC
structs.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Doc.
Attributes
| Name | Type | Description | 
|---|---|---|
mem | 
cymem.Pool | 
A memory pool. Allocated memory will be freed once the Doc object is garbage collected. | 
vocab | 
Vocab | 
A reference to the shared Vocab object. | 
c | 
TokenC* | 
A pointer to a TokenC struct. | 
length | 
int | 
The number of tokens in the document. | 
max_length | 
int | 
The underlying size of the Doc.c array. | 
Doc.push_back
Append a token to the Doc. The token can be provided as a
LexemeC or
TokenC pointer, using Cython's
fused types.
Example
from spacy.tokens cimport Doc from spacy.vocab cimport Vocab doc = Doc(Vocab()) lexeme = doc.vocab.get("hello") doc.push_back(lexeme, True) assert doc.text == "hello "
| Name | Type | Description | 
|---|---|---|
lex_or_tok | 
LexemeOrToken | 
The word to append to the Doc. | 
has_space | 
bint | 
Whether the word has trailing whitespace. | 
Token
A Cython class providing access and methods for a
TokenC struct. Note that the Token object does
not own the struct. It only receives a pointer to it.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Token.
Attributes
| Name | Type | Description | 
|---|---|---|
vocab | 
Vocab | 
A reference to the shared Vocab object. | 
c | 
TokenC* | 
A pointer to a TokenC struct. | 
i | 
int | 
The offset of the token within the document. | 
doc | 
Doc | 
The parent document. | 
Token.cinit
Create a Token object from a TokenC* pointer.
Example
token = Token.cinit(&doc.c[3], doc, 3)
| Name | Type | Description | 
|---|---|---|
vocab | 
Vocab | 
A reference to the shared Vocab. | 
c | 
TokenC* | 
A pointer to a TokenCstruct. | 
offset | 
int | 
The offset of the token within the document. | 
doc | 
Doc | 
The parent document. | 
| RETURNS | Token | 
The newly constructed object. | 
Span
A Cython class providing access and methods for a slice of a Doc object.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Span.
Attributes
| Name | Type | Description | 
|---|---|---|
doc | 
Doc | 
The parent document. | 
start | 
int | 
The index of the first token of the span. | 
end | 
int | 
The index of the first token after the span. | 
start_char | 
int | 
The index of the first character of the span. | 
end_char | 
int | 
The index of the last character of the span. | 
label | 
attr_t | 
A label to attach to the span, e.g. for named entities. | 
Lexeme
A Cython class providing access and methods for an entry in the vocabulary.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Lexeme.
Attributes
| Name | Type | Description | 
|---|---|---|
c | 
LexemeC* | 
A pointer to a LexemeC struct. | 
vocab | 
Vocab | 
A reference to the shared Vocab object. | 
orth | 
attr_t | 
ID of the verbatim text content. | 
Vocab
A Cython class providing access and methods for a vocabulary and other data shared across a language.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Vocab.
Attributes
| Name | Type | Description | 
|---|---|---|
mem | 
cymem.Pool | 
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. | 
strings | 
StringStore | 
A StringStore that maps string to hash values and vice versa. | 
length | 
int | 
The number of entries in the vocabulary. | 
Vocab.get
Retrieve a LexemeC* pointer from the
vocabulary.
Example
lexeme = vocab.get(vocab.mem, "hello")
| Name | Type | Description | 
|---|---|---|
mem | 
cymem.Pool | 
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. | 
string | 
unicode | The string of the word to look up. | 
| RETURNS | const LexemeC* | 
The lexeme in the vocabulary. | 
Vocab.get_by_orth
Retrieve a LexemeC* pointer from the
vocabulary.
Example
lexeme = vocab.get_by_orth(doc[0].lex.norm)
| Name | Type | Description | 
|---|---|---|
mem | 
cymem.Pool | 
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. | 
orth | 
attr_t | 
ID of the verbatim text content. | 
| RETURNS | const LexemeC* | 
The lexeme in the vocabulary. | 
StringStore
A lookup table to retrieve strings by 64-bit hashes.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see
StringStore.
Attributes
| Name | Type | Description | 
|---|---|---|
mem | 
cymem.Pool | 
A memory pool. Allocated memory will be freed once theStringStore object is garbage collected. | 
keys | 
vector[hash_t] | 
A list of hash values in the StringStore. |