<!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
10 KiB
title | menu | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cython Classes |
|
Doc
The Doc
object holds an array of TokenC
structs.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Doc
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Doc object is garbage collected. |
vocab |
Vocab |
A reference to the shared Vocab object. |
c |
TokenC* |
A pointer to a TokenC struct. |
length |
int |
The number of tokens in the document. |
max_length |
int |
The underlying size of the Doc.c array. |
Doc.push_back
Append a token to the Doc
. The token can be provided as a
LexemeC
or
TokenC
pointer, using Cython's
fused types.
Example
from spacy.tokens cimport Doc from spacy.vocab cimport Vocab doc = Doc(Vocab()) lexeme = doc.vocab.get(u'hello') doc.push_back(lexeme, True) assert doc.text == u'hello '
Name | Type | Description |
---|---|---|
lex_or_tok |
LexemeOrToken |
The word to append to the Doc . |
has_space |
bint |
Whether the word has trailing whitespace. |
Token
A Cython class providing access and methods for a
TokenC
struct. Note that the Token
object does
not own the struct. It only receives a pointer to it.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Token
.
Attributes
Name | Type | Description |
---|---|---|
vocab |
Vocab |
A reference to the shared Vocab object. |
c |
TokenC* |
A pointer to a TokenC struct. |
i |
int |
The offset of the token within the document. |
doc |
Doc |
The parent document. |
Token.cinit
Create a Token
object from a TokenC*
pointer.
Example
token = Token.cinit(&doc.c[3], doc, 3)
Name | Type | Description |
---|---|---|
vocab |
Vocab |
A reference to the shared Vocab . |
c |
TokenC* |
A pointer to a TokenC struct. |
offset |
int |
The offset of the token within the document. |
doc |
Doc |
The parent document. |
RETURNS | Token |
The newly constructed object. |
Span
A Cython class providing access and methods for a slice of a Doc
object.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Span
.
Attributes
Name | Type | Description |
---|---|---|
doc |
Doc |
The parent document. |
start |
int |
The index of the first token of the span. |
end |
int |
The index of the first token after the span. |
start_char |
int |
The index of the first character of the span. |
end_char |
int |
The index of the last character of the span. |
label |
attr_t |
A label to attach to the span, e.g. for named entities. |
Lexeme
A Cython class providing access and methods for an entry in the vocabulary.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Lexeme
.
Attributes
Name | Type | Description |
---|---|---|
c |
LexemeC* |
A pointer to a LexemeC struct. |
vocab |
Vocab |
A reference to the shared Vocab object. |
orth |
attr_t |
ID of the verbatim text content. |
Vocab
A Cython class providing access and methods for a vocabulary and other data shared across a language.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Vocab
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
strings |
StringStore |
A StringStore that maps string to hash values and vice versa. |
length |
int |
The number of entries in the vocabulary. |
Vocab.get
Retrieve a LexemeC*
pointer from the
vocabulary.
Example
lexeme = vocab.get(vocab.mem, u'hello')
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
string |
unicode | The string of the word to look up. |
RETURNS | const LexemeC* |
The lexeme in the vocabulary. |
Vocab.get_by_orth
Retrieve a LexemeC*
pointer from the
vocabulary.
Example
lexeme = vocab.get_by_orth(doc[0].lex.norm)
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
orth |
attr_t |
ID of the verbatim text content. |
RETURNS | const LexemeC* |
The lexeme in the vocabulary. |
StringStore
A lookup table to retrieve strings by 64-bit hashes.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see
StringStore
.
Attributes
Name | Type | Description |
---|---|---|
mem |
cymem.Pool |
A memory pool. Allocated memory will be freed once theStringStore object is garbage collected. |
keys |
vector[hash_t] |
A list of hash values in the StringStore . |