* Update docs for v0.80

This commit is contained in:
Matthew Honnibal 2015-04-13 05:40:51 +02:00
parent 3faaad0271
commit 5ce51ce8d6
2 changed files with 41 additions and 20 deletions

View File

@ -5,29 +5,32 @@ API
.. autoclass:: spacy.en.English .. autoclass:: spacy.en.English
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| Attribute | Type | Attr API | NoteS | | Attribute | Type | Attr API | Notes |
+===========+========================================+=============+==========================+ +============+========================================+=============+==========================+
| strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping | | strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping |
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object | | vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object |
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode | | tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode |
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens | | tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens |
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens | | parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens |
+-----------+----------------------------------------+-------------+--------------------------+ +------------+----------------------------------------+-------------+--------------------------+
| entity | :py:class:`syntax.parser.GreedyParser` | __call__ | Set entities on Tokens |
+------------+----------------------------------------+-------------+--------------------------+
| mwe_merger | :py:class:`multi_words.RegexMerger` | __call__ | Apply regex for units |
+------------+----------------------------------------+-------------+--------------------------+
.. automethod:: spacy.en.English.__call__ .. automethod:: spacy.en.English.__call__
.. autoclass:: spacy.tokens.Tokens .. autoclass:: spacy.tokens.Tokens
:members:
+---------------+-------------+-------------+ +---------------+-------------+-------------+
| Attribute | Type | Useful | | Attribute | Type | Attr API |
+===============+=============+=============+ +===============+=============+=============+
| vocab | Vocab | __getitem__ | | vocab | Vocab | __getitem__ |
+---------------+-------------+-------------+ +---------------+-------------+-------------+
@ -47,8 +50,6 @@ API
access is required, and you need slightly better performance. However, this access is required, and you need slightly better performance. However, this
is both slower and has a worse API than Cython access. is both slower and has a worse API than Cython access.
.. Once a Token object has been created, it is persisted internally in Tokens._py_tokens.
.. autoclass:: spacy.tokens.Token .. autoclass:: spacy.tokens.Token
@ -192,6 +193,15 @@ API
An iterator for the part of the sentence syntactically governed by the An iterator for the part of the sentence syntactically governed by the
word, including the word itself. word, including the word itself.
**Named Entities**
ent_type
If the token is part of an entity, its entity type
ent_iob
The IOB (inside, outside, begin) entity recognition tag for the token
.. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None) .. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None)
.. py:method:: __len__(self) --> int .. py:method:: __len__(self) --> int

View File

@ -68,7 +68,7 @@ a convenient API:
spaCy maps all strings to sequential integer IDs --- a common trick in NLP. spaCy maps all strings to sequential integer IDs --- a common trick in NLP.
If an attribute `Token.foo` is an integer ID, then `Token.foo_` is the string, If an attribute `Token.foo` is an integer ID, then `Token.foo_` is the string,
e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of e.g. `pizza.orth` and `pizza.orth_` provide the integer ID and the string of
the original orthographic form of the word. the original orthographic form of the word.
.. note:: en.English.__call__ is stateful --- it has an important **side-effect**. .. note:: en.English.__call__ is stateful --- it has an important **side-effect**.
@ -88,7 +88,7 @@ the original orthographic form of the word.
.. py:class:: spacy.en.English(self, data_dir=join(dirname(__file__), 'data')) .. py:class:: spacy.en.English(self, data_dir=join(dirname(__file__), 'data'))
.. py:method:: __call__(self, text: unicode, tag=True, parse=False) --> Tokens .. py:method:: __call__(self, text: unicode, tag=True, parse=True, entity=True, merge_mwes=False) --> Tokens
+-----------------+--------------+--------------+ +-----------------+--------------+--------------+
| Attribute | Type | Its API | | Attribute | Type | Its API |
@ -103,6 +103,8 @@ the original orthographic form of the word.
+-----------------+--------------+--------------+ +-----------------+--------------+--------------+
| parser | GreedyParser | __call__ | | parser | GreedyParser | __call__ |
+-----------------+--------------+--------------+ +-----------------+--------------+--------------+
| entity | GreedyParser | __call__ |
+-----------------+--------------+--------------+
**Get dict or numpy array:** **Get dict or numpy array:**
@ -116,6 +118,16 @@ the original orthographic form of the word.
.. py:method:: tokens.Tokens.__iter__(self) --> Iterator[Token] .. py:method:: tokens.Tokens.__iter__(self) --> Iterator[Token]
**Get sentence or named entity spans**
.. py:attribute:: tokens.Tokens.sents --> Iterator[Span]
.. py:attribute:: tokens.Tokens.ents --> Iterator[Span]
You can iterate over a Span to access individual Tokens, or access its
start, end or label.
**Embedded word representenations** **Embedded word representenations**
.. py:attribute:: tokens.Token.repvec .. py:attribute:: tokens.Token.repvec
@ -150,7 +162,6 @@ the original orthographic form of the word.
Starting offset of word in the original string. Starting offset of word in the original string.
Features Features
-------- --------