* Update docs for v0.80

This commit is contained in:
Matthew Honnibal 2015-04-13 05:40:51 +02:00
parent 3faaad0271
commit 5ce51ce8d6
2 changed files with 41 additions and 20 deletions

View File

@ -5,29 +5,32 @@ API
.. autoclass:: spacy.en.English
+-----------+----------------------------------------+-------------+--------------------------+
| Attribute | Type | Attr API | NoteS |
+===========+========================================+=============+==========================+
| strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping |
+-----------+----------------------------------------+-------------+--------------------------+
| vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object |
+-----------+----------------------------------------+-------------+--------------------------+
| tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode |
+-----------+----------------------------------------+-------------+--------------------------+
| tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens |
+-----------+----------------------------------------+-------------+--------------------------+
| parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens |
+-----------+----------------------------------------+-------------+--------------------------+
+------------+----------------------------------------+-------------+--------------------------+
| Attribute | Type | Attr API | Notes |
+============+========================================+=============+==========================+
| strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping |
+------------+----------------------------------------+-------------+--------------------------+
| vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object |
+------------+----------------------------------------+-------------+--------------------------+
| tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode |
+------------+----------------------------------------+-------------+--------------------------+
| tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens |
+------------+----------------------------------------+-------------+--------------------------+
| parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens |
+------------+----------------------------------------+-------------+--------------------------+
| entity | :py:class:`syntax.parser.GreedyParser` | __call__ | Set entities on Tokens |
+------------+----------------------------------------+-------------+--------------------------+
| mwe_merger | :py:class:`multi_words.RegexMerger` | __call__ | Apply regex for units |
+------------+----------------------------------------+-------------+--------------------------+
.. automethod:: spacy.en.English.__call__
.. autoclass:: spacy.tokens.Tokens
:members:
+---------------+-------------+-------------+
| Attribute | Type | Useful |
| Attribute | Type | Attr API |
+===============+=============+=============+
| vocab | Vocab | __getitem__ |
+---------------+-------------+-------------+
@ -47,8 +50,6 @@ API
access is required, and you need slightly better performance. However, this
is both slower and has a worse API than Cython access.
.. Once a Token object has been created, it is persisted internally in Tokens._py_tokens.
.. autoclass:: spacy.tokens.Token
@ -192,6 +193,15 @@ API
An iterator for the part of the sentence syntactically governed by the
word, including the word itself.
**Named Entities**
ent_type
If the token is part of an entity, its entity type
ent_iob
The IOB (inside, outside, begin) entity recognition tag for the token
.. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None)
.. py:method:: __len__(self) --> int

View File

@ -68,7 +68,7 @@ a convenient API:
spaCy maps all strings to sequential integer IDs --- a common trick in NLP.
If an attribute `Token.foo` is an integer ID, then `Token.foo_` is the string,
e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of
e.g. `pizza.orth` and `pizza.orth_` provide the integer ID and the string of
the original orthographic form of the word.
.. note:: en.English.__call__ is stateful --- it has an important **side-effect**.
@ -88,7 +88,7 @@ the original orthographic form of the word.
.. py:class:: spacy.en.English(self, data_dir=join(dirname(__file__), 'data'))
.. py:method:: __call__(self, text: unicode, tag=True, parse=False) --> Tokens
.. py:method:: __call__(self, text: unicode, tag=True, parse=True, entity=True, merge_mwes=False) --> Tokens
+-----------------+--------------+--------------+
| Attribute | Type | Its API |
@ -103,6 +103,8 @@ the original orthographic form of the word.
+-----------------+--------------+--------------+
| parser | GreedyParser | __call__ |
+-----------------+--------------+--------------+
| entity | GreedyParser | __call__ |
+-----------------+--------------+--------------+
**Get dict or numpy array:**
@ -116,6 +118,16 @@ the original orthographic form of the word.
.. py:method:: tokens.Tokens.__iter__(self) --> Iterator[Token]
**Get sentence or named entity spans**
.. py:attribute:: tokens.Tokens.sents --> Iterator[Span]
.. py:attribute:: tokens.Tokens.ents --> Iterator[Span]
You can iterate over a Span to access individual Tokens, or access its
start, end or label.
**Embedded word representenations**
.. py:attribute:: tokens.Token.repvec
@ -150,7 +162,6 @@ the original orthographic form of the word.
Starting offset of word in the original string.
Features
--------