Merge branch 'develop' into spacy.io

This commit is contained in:
Ines Montani 2019-02-24 22:22:30 +01:00
commit e983eefee7
5 changed files with 62 additions and 55 deletions

View File

@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
## DependencyParser.\_\_call\_\_ {#call tag="method"} ## DependencyParser.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned. Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when you call the `nlp` object on a text and This usually happens under the hood when the `nlp` object is called on a text
all pipeline components are applied to the `Doc` in order. Both and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/dependencyparser#call) and [`__call__`](/api/dependencyparser#call) and
[`pipe`](/api/dependencyparser#pipe) delegate to the [`pipe`](/api/dependencyparser#pipe) delegate to the
[`predict`](/api/dependencyparser#predict) and [`predict`](/api/dependencyparser#predict) and
@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both
## DependencyParser.pipe {#pipe tag="method"} ## DependencyParser.pipe {#pipe tag="method"}
Apply the pipe to a stream of documents. Both Apply the pipe to a stream of documents. This usually happens under the hood
[`__call__`](/api/dependencyparser#call) and when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
[`pipe`](/api/dependencyparser#pipe) delegate to the [`pipe`](/api/dependencyparser#pipe) delegate to the
[`predict`](/api/dependencyparser#predict) and [`predict`](/api/dependencyparser#predict) and
[`set_annotations`](/api/dependencyparser#set_annotations) methods. [`set_annotations`](/api/dependencyparser#set_annotations) methods.
@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both
> #### Example > #### Example
> >
> ```python > ```python
> texts = [u"One doc", u"...", u"Lots of docs"]
> parser = DependencyParser(nlp.vocab) > parser = DependencyParser(nlp.vocab)
> for doc in parser.pipe(texts, batch_size=50): > for doc in parser.pipe(docs, batch_size=50):
> pass > pass
> ``` > ```
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
> scores = parser.predict([doc1, doc2]) > scores = parser.predict([doc1, doc2])
> ``` > ```
| Name | Type | Description | | Name | Type | Description |
| ----------- | -------- | ------------------------- | | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. | | `docs` | iterable | The documents to predict. |
| **RETURNS** | - | Scores from the model. | | **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
## DependencyParser.set_annotations {#set_annotations tag="method"} ## DependencyParser.set_annotations {#set_annotations tag="method"}

View File

@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
## EntityRecognizer.\_\_call\_\_ {#call tag="method"} ## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned. Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when you call the `nlp` object on a text and This usually happens under the hood when the `nlp` object is called on a text
all pipeline components are applied to the `Doc` in order. Both and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/entityrecognizer#call) and [`__call__`](/api/entityrecognizer#call) and
[`pipe`](/api/entityrecognizer#pipe) delegate to the [`pipe`](/api/entityrecognizer#pipe) delegate to the
[`predict`](/api/entityrecognizer#predict) and [`predict`](/api/entityrecognizer#predict) and
@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both
## EntityRecognizer.pipe {#pipe tag="method"} ## EntityRecognizer.pipe {#pipe tag="method"}
Apply the pipe to a stream of documents. Both Apply the pipe to a stream of documents. This usually happens under the hood
[`__call__`](/api/entityrecognizer#call) and when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
[`pipe`](/api/entityrecognizer#pipe) delegate to the [`pipe`](/api/entityrecognizer#pipe) delegate to the
[`predict`](/api/entityrecognizer#predict) and [`predict`](/api/entityrecognizer#predict) and
[`set_annotations`](/api/entityrecognizer#set_annotations) methods. [`set_annotations`](/api/entityrecognizer#set_annotations) methods.
@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both
> #### Example > #### Example
> >
> ```python > ```python
> texts = [u"One doc", u"...", u"Lots of docs"]
> ner = EntityRecognizer(nlp.vocab) > ner = EntityRecognizer(nlp.vocab)
> for doc in ner.pipe(texts, batch_size=50): > for doc in ner.pipe(docs, batch_size=50):
> pass > pass
> ``` > ```
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
> scores = ner.predict([doc1, doc2]) > scores = ner.predict([doc1, doc2])
> ``` > ```
| Name | Type | Description | | Name | Type | Description |
| ----------- | -------- | ------------------------- | | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. | | `docs` | iterable | The documents to predict. |
| **RETURNS** | - | Scores from the model. | | **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
## EntityRecognizer.set_annotations {#set_annotations tag="method"} ## EntityRecognizer.set_annotations {#set_annotations tag="method"}

View File

@ -7,17 +7,23 @@ source: spacy/gold.pyx
## GoldParse.\_\_init\_\_ {#init tag="method"} ## GoldParse.\_\_init\_\_ {#init tag="method"}
Create a `GoldParse`. Create a `GoldParse`. Unlike annotations in `entities`, label annotations in
`cats` can overlap, i.e. a single word can be covered by multiple labelled
spans. The [`TextCategorizer`](/api/textcategorizer) component expects true
examples of a label to have the value `1.0`, and negative examples of a label to
have the value `0.0`. Labels not in the dictionary are treated as missing the
gradient for those labels will be zero.
| Name | Type | Description | | Name | Type | Description |
| ----------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | ----------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `doc` | `Doc` | The document the annotations refer to. | | `doc` | `Doc` | The document the annotations refer to. |
| `words` | iterable | A sequence of unicode word strings. | | `words` | iterable | A sequence of unicode word strings. |
| `tags` | iterable | A sequence of strings, representing tag annotations. | | `tags` | iterable | A sequence of strings, representing tag annotations. |
| `heads` | iterable | A sequence of integers, representing syntactic head offsets. | | `heads` | iterable | A sequence of integers, representing syntactic head offsets. |
| `deps` | iterable | A sequence of strings, representing the syntactic relation types. | | `deps` | iterable | A sequence of strings, representing the syntactic relation types. |
| `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. | | `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. |
| **RETURNS** | `GoldParse` | The newly constructed object. | | `cats` | dict | Labels for text classification. Each key in the dictionary may be a string or an int, or a `(start_char, end_char, label)` tuple, indicating that the label is applied to only part of the document (usually a sentence). |
| **RETURNS** | `GoldParse` | The newly constructed object. |
## GoldParse.\_\_len\_\_ {#len tag="method"} ## GoldParse.\_\_len\_\_ {#len tag="method"}
@ -52,11 +58,10 @@ Whether the provided syntactic annotations form a projective dependency tree.
### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"} ### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"}
Encode labelled spans into per-token tags, using the Encode labelled spans into per-token tags, using the
[BILUO scheme](/api/annotation#biluo) (Begin/In/Last/Unit/Out). [BILUO scheme](/api/annotation#biluo) (Begin, In, Last, Unit, Out). Returns a
list of unicode strings, describing the tags. Each tag string will be of the
Returns a list of unicode strings, describing the tags. Each tag string will be form of either `""`, `"O"` or `"{action}-{label}"`, where action is one of
of the form of either `""`, `"O"` or `"{action}-{label}"`, where action is one `"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets
of `"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets
don't align with the tokenization in the `Doc` object. The training algorithm don't align with the tokenization in the `Doc` object. The training algorithm
will view these as missing values. `O` denotes a non-entity token. `B` denotes will view these as missing values. `O` denotes a non-entity token. `B` denotes
the beginning of a multi-token entity, `I` the inside of an entity of three or the beginning of a multi-token entity, `I` the inside of an entity of three or

View File

@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
## Tagger.\_\_call\_\_ {#call tag="method"} ## Tagger.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned. Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when you call the `nlp` object on a text and This usually happens under the hood when the `nlp` object is called on a text
all pipeline components are applied to the `Doc` in order. Both and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the [`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the
[`predict`](/api/tagger#predict) and [`predict`](/api/tagger#predict) and
[`set_annotations`](/api/tagger#set_annotations) methods. [`set_annotations`](/api/tagger#set_annotations) methods.
@ -69,16 +69,17 @@ all pipeline components are applied to the `Doc` in order. Both
## Tagger.pipe {#pipe tag="method"} ## Tagger.pipe {#pipe tag="method"}
Apply the pipe to a stream of documents. Both [`__call__`](/api/tagger#call) and Apply the pipe to a stream of documents. This usually happens under the hood
when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
[`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and [`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and
[`set_annotations`](/api/tagger#set_annotations) methods. [`set_annotations`](/api/tagger#set_annotations) methods.
> #### Example > #### Example
> >
> ```python > ```python
> texts = [u"One doc", u"...", u"Lots of docs"]
> tagger = Tagger(nlp.vocab) > tagger = Tagger(nlp.vocab)
> for doc in tagger.pipe(texts, batch_size=50): > for doc in tagger.pipe(docs, batch_size=50):
> pass > pass
> ``` > ```
@ -99,10 +100,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
> scores = tagger.predict([doc1, doc2]) > scores = tagger.predict([doc1, doc2])
> ``` > ```
| Name | Type | Description | | Name | Type | Description |
| ----------- | -------- | ------------------------- | | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. | | `docs` | iterable | The documents to predict. |
| **RETURNS** | - | Scores from the model. | | **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
## Tagger.set_annotations {#set_annotations tag="method"} ## Tagger.set_annotations {#set_annotations tag="method"}

View File

@ -64,8 +64,8 @@ argument.
## TextCategorizer.\_\_call\_\_ {#call tag="method"} ## TextCategorizer.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned. Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when you call the `nlp` object on a text and This usually happens under the hood when the `nlp` object is called on a text
all pipeline components are applied to the `Doc` in order. Both and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe) [`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe)
delegate to the [`predict`](/api/textcategorizer#predict) and delegate to the [`predict`](/api/textcategorizer#predict) and
[`set_annotations`](/api/textcategorizer#set_annotations) methods. [`set_annotations`](/api/textcategorizer#set_annotations) methods.
@ -86,17 +86,18 @@ delegate to the [`predict`](/api/textcategorizer#predict) and
## TextCategorizer.pipe {#pipe tag="method"} ## TextCategorizer.pipe {#pipe tag="method"}
Apply the pipe to a stream of documents. Both Apply the pipe to a stream of documents. This usually happens under the hood
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe) when the `nlp` object is called on a text and all pipeline components are
delegate to the [`predict`](/api/textcategorizer#predict) and applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
[`pipe`](/api/textcategorizer#pipe) delegate to the
[`predict`](/api/textcategorizer#predict) and
[`set_annotations`](/api/textcategorizer#set_annotations) methods. [`set_annotations`](/api/textcategorizer#set_annotations) methods.
> #### Example > #### Example
> >
> ```python > ```python
> texts = [u"One doc", u"...", u"Lots of docs"]
> textcat = TextCategorizer(nlp.vocab) > textcat = TextCategorizer(nlp.vocab)
> for doc in textcat.pipe(texts, batch_size=50): > for doc in textcat.pipe(docs, batch_size=50):
> pass > pass
> ``` > ```
@ -117,10 +118,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
> scores = textcat.predict([doc1, doc2]) > scores = textcat.predict([doc1, doc2])
> ``` > ```
| Name | Type | Description | | Name | Type | Description |
| ----------- | -------- | ------------------------- | | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. | | `docs` | iterable | The documents to predict. |
| **RETURNS** | - | Scores from the model. | | **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
## TextCategorizer.set_annotations {#set_annotations tag="method"} ## TextCategorizer.set_annotations {#set_annotations tag="method"}