diff --git a/docs/source/api.rst b/docs/source/api.rst index c1132baff..e8638ed55 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -5,29 +5,32 @@ API .. autoclass:: spacy.en.English - +-----------+----------------------------------------+-------------+--------------------------+ - | Attribute | Type | Attr API | NoteS | - +===========+========================================+=============+==========================+ - | strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping | - +-----------+----------------------------------------+-------------+--------------------------+ - | vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object | - +-----------+----------------------------------------+-------------+--------------------------+ - | tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode | - +-----------+----------------------------------------+-------------+--------------------------+ - | tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens | - +-----------+----------------------------------------+-------------+--------------------------+ - | parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens | - +-----------+----------------------------------------+-------------+--------------------------+ + +------------+----------------------------------------+-------------+--------------------------+ + | Attribute | Type | Attr API | Notes | + +============+========================================+=============+==========================+ + | strings | :py:class:`strings.StringStore` | __getitem__ | string <-> int mapping | + +------------+----------------------------------------+-------------+--------------------------+ + | vocab | :py:class:`vocab.Vocab` | __getitem__ | Look up Lexeme object | + +------------+----------------------------------------+-------------+--------------------------+ + | tokenizer | :py:class:`tokenizer.Tokenizer` | __call__ | Get Tokens given unicode | + +------------+----------------------------------------+-------------+--------------------------+ + | tagger | :py:class:`en.pos.EnPosTagger` | __call__ | Set POS tags on Tokens | + +------------+----------------------------------------+-------------+--------------------------+ + | parser | :py:class:`syntax.parser.GreedyParser` | __call__ | Set parse on Tokens | + +------------+----------------------------------------+-------------+--------------------------+ + | entity | :py:class:`syntax.parser.GreedyParser` | __call__ | Set entities on Tokens | + +------------+----------------------------------------+-------------+--------------------------+ + | mwe_merger | :py:class:`multi_words.RegexMerger` | __call__ | Apply regex for units | + +------------+----------------------------------------+-------------+--------------------------+ .. automethod:: spacy.en.English.__call__ .. autoclass:: spacy.tokens.Tokens - :members: +---------------+-------------+-------------+ - | Attribute | Type | Useful | + | Attribute | Type | Attr API | +===============+=============+=============+ | vocab | Vocab | __getitem__ | +---------------+-------------+-------------+ @@ -47,8 +50,6 @@ API access is required, and you need slightly better performance. However, this is both slower and has a worse API than Cython access. -.. Once a Token object has been created, it is persisted internally in Tokens._py_tokens. - .. autoclass:: spacy.tokens.Token @@ -192,6 +193,15 @@ API An iterator for the part of the sentence syntactically governed by the word, including the word itself. + + **Named Entities** + + ent_type + If the token is part of an entity, its entity type + + ent_iob + The IOB (inside, outside, begin) entity recognition tag for the token + .. py:class:: vocab.Vocab(self, data_dir=None, lex_props_getter=None) .. py:method:: __len__(self) --> int diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index 579348655..470df42d7 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -68,7 +68,7 @@ a convenient API: spaCy maps all strings to sequential integer IDs --- a common trick in NLP. If an attribute `Token.foo` is an integer ID, then `Token.foo_` is the string, -e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of +e.g. `pizza.orth` and `pizza.orth_` provide the integer ID and the string of the original orthographic form of the word. .. note:: en.English.__call__ is stateful --- it has an important **side-effect**. @@ -88,7 +88,7 @@ the original orthographic form of the word. .. py:class:: spacy.en.English(self, data_dir=join(dirname(__file__), 'data')) - .. py:method:: __call__(self, text: unicode, tag=True, parse=False) --> Tokens + .. py:method:: __call__(self, text: unicode, tag=True, parse=True, entity=True, merge_mwes=False) --> Tokens +-----------------+--------------+--------------+ | Attribute | Type | Its API | @@ -103,6 +103,8 @@ the original orthographic form of the word. +-----------------+--------------+--------------+ | parser | GreedyParser | __call__ | +-----------------+--------------+--------------+ + | entity | GreedyParser | __call__ | + +-----------------+--------------+--------------+ **Get dict or numpy array:** @@ -116,6 +118,16 @@ the original orthographic form of the word. .. py:method:: tokens.Tokens.__iter__(self) --> Iterator[Token] +**Get sentence or named entity spans** + + .. py:attribute:: tokens.Tokens.sents --> Iterator[Span] + + .. py:attribute:: tokens.Tokens.ents --> Iterator[Span] + + You can iterate over a Span to access individual Tokens, or access its + start, end or label. + + **Embedded word representenations** .. py:attribute:: tokens.Token.repvec @@ -150,7 +162,6 @@ the original orthographic form of the word. Starting offset of word in the original string. - Features --------