mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 04:08:09 +03:00
Merge branch 'develop' into spacy.io
This commit is contained in:
commit
e983eefee7
|
@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
|
||||||
## DependencyParser.\_\_call\_\_ {#call tag="method"}
|
## DependencyParser.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
This usually happens under the hood when you call the `nlp` object on a text and
|
This usually happens under the hood when the `nlp` object is called on a text
|
||||||
all pipeline components are applied to the `Doc` in order. Both
|
and all pipeline components are applied to the `Doc` in order. Both
|
||||||
[`__call__`](/api/dependencyparser#call) and
|
[`__call__`](/api/dependencyparser#call) and
|
||||||
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
||||||
[`predict`](/api/dependencyparser#predict) and
|
[`predict`](/api/dependencyparser#predict) and
|
||||||
|
@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
|
||||||
## DependencyParser.pipe {#pipe tag="method"}
|
## DependencyParser.pipe {#pipe tag="method"}
|
||||||
|
|
||||||
Apply the pipe to a stream of documents. Both
|
Apply the pipe to a stream of documents. This usually happens under the hood
|
||||||
[`__call__`](/api/dependencyparser#call) and
|
when the `nlp` object is called on a text and all pipeline components are
|
||||||
|
applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
|
||||||
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
||||||
[`predict`](/api/dependencyparser#predict) and
|
[`predict`](/api/dependencyparser#predict) and
|
||||||
[`set_annotations`](/api/dependencyparser#set_annotations) methods.
|
[`set_annotations`](/api/dependencyparser#set_annotations) methods.
|
||||||
|
@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> texts = [u"One doc", u"...", u"Lots of docs"]
|
|
||||||
> parser = DependencyParser(nlp.vocab)
|
> parser = DependencyParser(nlp.vocab)
|
||||||
> for doc in parser.pipe(texts, batch_size=50):
|
> for doc in parser.pipe(docs, batch_size=50):
|
||||||
> pass
|
> pass
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
||||||
> scores = parser.predict([doc1, doc2])
|
> scores = parser.predict([doc1, doc2])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | -------- | ------------------------- |
|
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `docs` | iterable | The documents to predict. |
|
| `docs` | iterable | The documents to predict. |
|
||||||
| **RETURNS** | - | Scores from the model. |
|
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
||||||
|
|
||||||
## DependencyParser.set_annotations {#set_annotations tag="method"}
|
## DependencyParser.set_annotations {#set_annotations tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
|
||||||
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
|
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
This usually happens under the hood when you call the `nlp` object on a text and
|
This usually happens under the hood when the `nlp` object is called on a text
|
||||||
all pipeline components are applied to the `Doc` in order. Both
|
and all pipeline components are applied to the `Doc` in order. Both
|
||||||
[`__call__`](/api/entityrecognizer#call) and
|
[`__call__`](/api/entityrecognizer#call) and
|
||||||
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
||||||
[`predict`](/api/entityrecognizer#predict) and
|
[`predict`](/api/entityrecognizer#predict) and
|
||||||
|
@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
|
||||||
## EntityRecognizer.pipe {#pipe tag="method"}
|
## EntityRecognizer.pipe {#pipe tag="method"}
|
||||||
|
|
||||||
Apply the pipe to a stream of documents. Both
|
Apply the pipe to a stream of documents. This usually happens under the hood
|
||||||
[`__call__`](/api/entityrecognizer#call) and
|
when the `nlp` object is called on a text and all pipeline components are
|
||||||
|
applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
|
||||||
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
||||||
[`predict`](/api/entityrecognizer#predict) and
|
[`predict`](/api/entityrecognizer#predict) and
|
||||||
[`set_annotations`](/api/entityrecognizer#set_annotations) methods.
|
[`set_annotations`](/api/entityrecognizer#set_annotations) methods.
|
||||||
|
@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> texts = [u"One doc", u"...", u"Lots of docs"]
|
|
||||||
> ner = EntityRecognizer(nlp.vocab)
|
> ner = EntityRecognizer(nlp.vocab)
|
||||||
> for doc in ner.pipe(texts, batch_size=50):
|
> for doc in ner.pipe(docs, batch_size=50):
|
||||||
> pass
|
> pass
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
||||||
> scores = ner.predict([doc1, doc2])
|
> scores = ner.predict([doc1, doc2])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | -------- | ------------------------- |
|
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `docs` | iterable | The documents to predict. |
|
| `docs` | iterable | The documents to predict. |
|
||||||
| **RETURNS** | - | Scores from the model. |
|
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
||||||
|
|
||||||
## EntityRecognizer.set_annotations {#set_annotations tag="method"}
|
## EntityRecognizer.set_annotations {#set_annotations tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -7,17 +7,23 @@ source: spacy/gold.pyx
|
||||||
|
|
||||||
## GoldParse.\_\_init\_\_ {#init tag="method"}
|
## GoldParse.\_\_init\_\_ {#init tag="method"}
|
||||||
|
|
||||||
Create a `GoldParse`.
|
Create a `GoldParse`. Unlike annotations in `entities`, label annotations in
|
||||||
|
`cats` can overlap, i.e. a single word can be covered by multiple labelled
|
||||||
|
spans. The [`TextCategorizer`](/api/textcategorizer) component expects true
|
||||||
|
examples of a label to have the value `1.0`, and negative examples of a label to
|
||||||
|
have the value `0.0`. Labels not in the dictionary are treated as missing – the
|
||||||
|
gradient for those labels will be zero.
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `doc` | `Doc` | The document the annotations refer to. |
|
| `doc` | `Doc` | The document the annotations refer to. |
|
||||||
| `words` | iterable | A sequence of unicode word strings. |
|
| `words` | iterable | A sequence of unicode word strings. |
|
||||||
| `tags` | iterable | A sequence of strings, representing tag annotations. |
|
| `tags` | iterable | A sequence of strings, representing tag annotations. |
|
||||||
| `heads` | iterable | A sequence of integers, representing syntactic head offsets. |
|
| `heads` | iterable | A sequence of integers, representing syntactic head offsets. |
|
||||||
| `deps` | iterable | A sequence of strings, representing the syntactic relation types. |
|
| `deps` | iterable | A sequence of strings, representing the syntactic relation types. |
|
||||||
| `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. |
|
| `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. |
|
||||||
| **RETURNS** | `GoldParse` | The newly constructed object. |
|
| `cats` | dict | Labels for text classification. Each key in the dictionary may be a string or an int, or a `(start_char, end_char, label)` tuple, indicating that the label is applied to only part of the document (usually a sentence). |
|
||||||
|
| **RETURNS** | `GoldParse` | The newly constructed object. |
|
||||||
|
|
||||||
## GoldParse.\_\_len\_\_ {#len tag="method"}
|
## GoldParse.\_\_len\_\_ {#len tag="method"}
|
||||||
|
|
||||||
|
@ -52,11 +58,10 @@ Whether the provided syntactic annotations form a projective dependency tree.
|
||||||
### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"}
|
### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"}
|
||||||
|
|
||||||
Encode labelled spans into per-token tags, using the
|
Encode labelled spans into per-token tags, using the
|
||||||
[BILUO scheme](/api/annotation#biluo) (Begin/In/Last/Unit/Out).
|
[BILUO scheme](/api/annotation#biluo) (Begin, In, Last, Unit, Out). Returns a
|
||||||
|
list of unicode strings, describing the tags. Each tag string will be of the
|
||||||
Returns a list of unicode strings, describing the tags. Each tag string will be
|
form of either `""`, `"O"` or `"{action}-{label}"`, where action is one of
|
||||||
of the form of either `""`, `"O"` or `"{action}-{label}"`, where action is one
|
`"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets
|
||||||
of `"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets
|
|
||||||
don't align with the tokenization in the `Doc` object. The training algorithm
|
don't align with the tokenization in the `Doc` object. The training algorithm
|
||||||
will view these as missing values. `O` denotes a non-entity token. `B` denotes
|
will view these as missing values. `O` denotes a non-entity token. `B` denotes
|
||||||
the beginning of a multi-token entity, `I` the inside of an entity of three or
|
the beginning of a multi-token entity, `I` the inside of an entity of three or
|
||||||
|
|
|
@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and
|
||||||
## Tagger.\_\_call\_\_ {#call tag="method"}
|
## Tagger.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
This usually happens under the hood when you call the `nlp` object on a text and
|
This usually happens under the hood when the `nlp` object is called on a text
|
||||||
all pipeline components are applied to the `Doc` in order. Both
|
and all pipeline components are applied to the `Doc` in order. Both
|
||||||
[`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the
|
[`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the
|
||||||
[`predict`](/api/tagger#predict) and
|
[`predict`](/api/tagger#predict) and
|
||||||
[`set_annotations`](/api/tagger#set_annotations) methods.
|
[`set_annotations`](/api/tagger#set_annotations) methods.
|
||||||
|
@ -69,16 +69,17 @@ all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
|
||||||
## Tagger.pipe {#pipe tag="method"}
|
## Tagger.pipe {#pipe tag="method"}
|
||||||
|
|
||||||
Apply the pipe to a stream of documents. Both [`__call__`](/api/tagger#call) and
|
Apply the pipe to a stream of documents. This usually happens under the hood
|
||||||
|
when the `nlp` object is called on a text and all pipeline components are
|
||||||
|
applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
|
||||||
[`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and
|
[`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and
|
||||||
[`set_annotations`](/api/tagger#set_annotations) methods.
|
[`set_annotations`](/api/tagger#set_annotations) methods.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> texts = [u"One doc", u"...", u"Lots of docs"]
|
|
||||||
> tagger = Tagger(nlp.vocab)
|
> tagger = Tagger(nlp.vocab)
|
||||||
> for doc in tagger.pipe(texts, batch_size=50):
|
> for doc in tagger.pipe(docs, batch_size=50):
|
||||||
> pass
|
> pass
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -99,10 +100,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
||||||
> scores = tagger.predict([doc1, doc2])
|
> scores = tagger.predict([doc1, doc2])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | -------- | ------------------------- |
|
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `docs` | iterable | The documents to predict. |
|
| `docs` | iterable | The documents to predict. |
|
||||||
| **RETURNS** | - | Scores from the model. |
|
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
||||||
|
|
||||||
## Tagger.set_annotations {#set_annotations tag="method"}
|
## Tagger.set_annotations {#set_annotations tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -64,8 +64,8 @@ argument.
|
||||||
## TextCategorizer.\_\_call\_\_ {#call tag="method"}
|
## TextCategorizer.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
This usually happens under the hood when you call the `nlp` object on a text and
|
This usually happens under the hood when the `nlp` object is called on a text
|
||||||
all pipeline components are applied to the `Doc` in order. Both
|
and all pipeline components are applied to the `Doc` in order. Both
|
||||||
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe)
|
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe)
|
||||||
delegate to the [`predict`](/api/textcategorizer#predict) and
|
delegate to the [`predict`](/api/textcategorizer#predict) and
|
||||||
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
||||||
|
@ -86,17 +86,18 @@ delegate to the [`predict`](/api/textcategorizer#predict) and
|
||||||
|
|
||||||
## TextCategorizer.pipe {#pipe tag="method"}
|
## TextCategorizer.pipe {#pipe tag="method"}
|
||||||
|
|
||||||
Apply the pipe to a stream of documents. Both
|
Apply the pipe to a stream of documents. This usually happens under the hood
|
||||||
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe)
|
when the `nlp` object is called on a text and all pipeline components are
|
||||||
delegate to the [`predict`](/api/textcategorizer#predict) and
|
applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
|
||||||
|
[`pipe`](/api/textcategorizer#pipe) delegate to the
|
||||||
|
[`predict`](/api/textcategorizer#predict) and
|
||||||
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> texts = [u"One doc", u"...", u"Lots of docs"]
|
|
||||||
> textcat = TextCategorizer(nlp.vocab)
|
> textcat = TextCategorizer(nlp.vocab)
|
||||||
> for doc in textcat.pipe(texts, batch_size=50):
|
> for doc in textcat.pipe(docs, batch_size=50):
|
||||||
> pass
|
> pass
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -117,10 +118,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
||||||
> scores = textcat.predict([doc1, doc2])
|
> scores = textcat.predict([doc1, doc2])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | -------- | ------------------------- |
|
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `docs` | iterable | The documents to predict. |
|
| `docs` | iterable | The documents to predict. |
|
||||||
| **RETURNS** | - | Scores from the model. |
|
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
||||||
|
|
||||||
## TextCategorizer.set_annotations {#set_annotations tag="method"}
|
## TextCategorizer.set_annotations {#set_annotations tag="method"}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user