* Update docs for v0.80

2025-08-02 03:10:22 +03:00 · 2015-04-13 05:40:51 +02:00 · 2015-04-13 05:40:51 +02:00 · 5ce51ce8d6
commit 5ce51ce8d6
parent 3faaad0271
2 changed files with 41 additions and 20 deletions
--- a/docs/source/api.rst
+++ b/docs/source/api.rst
@ -5,29 +5,32 @@ API

 .. autoclass:: spacy.en.English

-  +-----------+----------------------------------------+-------------+--------------------------+
-  | Attribute | Type                                   | Attr API    | NoteS                    |
-  +===========+========================================+=============+==========================+
-  | strings   | :py:class:`strings.StringStore`        | __getitem__ | string <-> int  mapping  |
-  +-----------+----------------------------------------+-------------+--------------------------+
-  | vocab     | :py:class:`vocab.Vocab`                | __getitem__ | Look up Lexeme object    |
-  +-----------+----------------------------------------+-------------+--------------------------+
-  | tokenizer | :py:class:`tokenizer.Tokenizer`        | __call__    | Get Tokens given unicode |
-  +-----------+----------------------------------------+-------------+--------------------------+
-  | tagger    | :py:class:`en.pos.EnPosTagger`         | __call__    | Set POS tags on Tokens   |
-  +-----------+----------------------------------------+-------------+--------------------------+
-  | parser    | :py:class:`syntax.parser.GreedyParser` | __call__    | Set parse on Tokens      |
-  +-----------+----------------------------------------+-------------+--------------------------+
+  +------------+----------------------------------------+-------------+--------------------------+
+  | Attribute  | Type                                   | Attr API    | Notes                    |
+  +============+========================================+=============+==========================+
+  | strings    | :py:class:`strings.StringStore`        | __getitem__ | string <-> int  mapping  |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | vocab      | :py:class:`vocab.Vocab`                | __getitem__ | Look up Lexeme object    |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | tokenizer  | :py:class:`tokenizer.Tokenizer`        | __call__    | Get Tokens given unicode |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | tagger     | :py:class:`en.pos.EnPosTagger`         | __call__    | Set POS tags on Tokens   |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | parser     | :py:class:`syntax.parser.GreedyParser` | __call__    | Set parse on Tokens      |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | entity     | :py:class:`syntax.parser.GreedyParser` | __call__    | Set entities on Tokens   |
+  +------------+----------------------------------------+-------------+--------------------------+
+  | mwe_merger | :py:class:`multi_words.RegexMerger`    | __call__    | Apply regex for units    |
+  +------------+----------------------------------------+-------------+--------------------------+


  .. automethod:: spacy.en.English.__call__


 .. autoclass:: spacy.tokens.Tokens
-  :members: 
  
  +---------------+-------------+-------------+
-  | Attribute     | Type        | Useful      |
+  | Attribute     | Type        | Attr API    |
  +===============+=============+=============+
  | vocab         | Vocab       | __getitem__ |
  +---------------+-------------+-------------+
@ -47,8 +50,6 @@ API
    access is required, and you need slightly better performance.  However, this
    is both slower and has a worse API than Cython access.  

-.. Once a Token object has been created, it is persisted internally in Tokens._py_tokens.
-

 .. autoclass:: spacy.tokens.Token

@ -192,6 +193,15 @@ API
    An iterator for the part of the sentence syntactically governed by the
    word, including the word itself.

+
+  **Named Entities**
+
+  ent_type
+    If the token is part of an entity, its entity type
+
+  ent_iob
+    The IOB (inside, outside, begin) entity recognition tag for the token
+
 .. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None)

  .. py:method:: __len__(self) --> int
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@ -68,7 +68,7 @@ a convenient API:

 spaCy maps all strings to sequential integer IDs --- a common trick in NLP.
 If an attribute `Token.foo` is an integer ID, then `Token.foo_` is the string,
-e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of
+e.g. `pizza.orth` and `pizza.orth_` provide the integer ID and the string of
 the original orthographic form of the word.

  .. note::  en.English.__call__ is stateful --- it has an important **side-effect**.
@ -88,7 +88,7 @@ the original orthographic form of the word.

  .. py:class:: spacy.en.English(self, data_dir=join(dirname(__file__), 'data'))

-    .. py:method:: __call__(self, text: unicode, tag=True, parse=False) --> Tokens 
+    .. py:method:: __call__(self, text: unicode, tag=True, parse=True, entity=True, merge_mwes=False) --> Tokens 

    +-----------------+--------------+--------------+
    | Attribute       | Type         | Its API      |
@ -103,6 +103,8 @@ the original orthographic form of the word.
    +-----------------+--------------+--------------+
    | parser          | GreedyParser | __call__     |
    +-----------------+--------------+--------------+
+    | entity          | GreedyParser | __call__     |
+    +-----------------+--------------+--------------+

 **Get dict or numpy array:**

@ -116,6 +118,16 @@ the original orthographic form of the word.

  .. py:method:: tokens.Tokens.__iter__(self) --> Iterator[Token]

+**Get sentence or named entity spans**
+
+  .. py:attribute:: tokens.Tokens.sents --> Iterator[Span]
+  
+  .. py:attribute:: tokens.Tokens.ents --> Iterator[Span]
+
+    You can iterate over a Span to access individual Tokens, or access its
+    start, end or label.
+
+
 **Embedded word representenations**

  .. py:attribute:: tokens.Token.repvec
@ -150,7 +162,6 @@ the original orthographic form of the word.
    Starting offset of word in the original string.


-
 Features
 --------