remove tensors, fix predict, get_loss and set_annotations

This commit is contained in:
svlandeg 2020-07-08 13:11:54 +02:00
parent 90b100c39f
commit c94279ac1b
7 changed files with 104 additions and 135 deletions

View File

@ -15,7 +15,7 @@ via the ID `"parser"`.
> ```python
> # Construction via create_pipe with default model
> parser = nlp.create_pipe("parser")
>
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_parser"}}
> parser = nlp.create_pipe("parser", config)
@ -112,10 +112,10 @@ Modify a batch of documents, using pre-computed scores.
> parser.set_annotations([doc1, doc2], scores)
> ```
| Name | Type | Description |
| -------- | -------- | ---------------------------------------------------------- |
| `docs` | iterable | The documents to modify. |
| `scores` | - | The scores to set, produced by `DependencyParser.predict`. |
| Name | Type | Description |
| -------- | ------------------- | ---------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | `syntax.StateClass` | The scores to set, produced by `DependencyParser.predict`. |
## DependencyParser.update {#update tag="method"}
@ -150,16 +150,15 @@ predicted scores.
>
> ```python
> parser = DependencyParser(nlp.vocab)
> scores = parser.predict([doc1, doc2])
> loss, d_loss = parser.get_loss([doc1, doc2], [gold1, gold2], scores)
> scores = parser.predict([eg.predicted for eg in examples])
> loss, d_loss = parser.get_loss(examples, scores)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `docs` | iterable | The batch of documents. |
| `golds` | iterable | The gold-standard data. Must have the same length as `docs`. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
| Name | Type | Description |
| ----------- | ------------------- | --------------------------------------------------- |
| `examples` | `Iterable[Example]` | The batch of examples. |
| `scores` | `syntax.StateClass` | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
## DependencyParser.begin_training {#begin_training tag="method"}
@ -193,9 +192,9 @@ component.
> optimizer = parser.create_optimizer()
> ```
| Name | Type | Description |
| ----------- | ----------- | -------------- |
| **RETURNS** | `Optimizer` | The optimizer. |
| Name | Type | Description |
| ----------- | ----------- | --------------------------------------------------------------- |
| **RETURNS** | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
## DependencyParser.use_params {#use_params tag="method, contextmanager"}

View File

@ -96,13 +96,13 @@ Apply the pipeline's model to a batch of docs, without modifying them.
>
> ```python
> entity_linker = EntityLinker(nlp.vocab)
> kb_ids, tensors = entity_linker.predict([doc1, doc2])
> kb_ids = entity_linker.predict([doc1, doc2])
> ```
| Name | Type | Description |
| ----------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. |
| **RETURNS** | tuple | A `(kb_ids, tensors)` tuple where `kb_ids` are the model's predicted KB identifiers for the entities in the `docs`, and `tensors` are the token representations used to predict these identifiers. |
| Name | Type | Description |
| ----------- | --------------- | ------------------------------------------------------------ |
| `docs` | `Iterable[Doc]` | The documents to predict. |
| **RETURNS** | `Iterable[str]` | The predicted KB identifiers for the entities in the `docs`. |
## EntityLinker.set_annotations {#set_annotations tag="method"}
@ -113,15 +113,14 @@ entities.
>
> ```python
> entity_linker = EntityLinker(nlp.vocab)
> kb_ids, tensors = entity_linker.predict([doc1, doc2])
> entity_linker.set_annotations([doc1, doc2], kb_ids, tensors)
> kb_ids = entity_linker.predict([doc1, doc2])
> entity_linker.set_annotations([doc1, doc2], kb_ids)
> ```
| Name | Type | Description |
| --------- | -------- | ------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to modify. |
| `kb_ids` | iterable | The knowledge base identifiers for the entities in the docs, predicted by `EntityLinker.predict`. |
| `tensors` | iterable | The token representations used to predict the identifiers. |
| Name | Type | Description |
| -------- | --------------- | ------------------------------------------------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `kb_ids` | `Iterable[str]` | The knowledge base identifiers for the entities in the docs, predicted by `EntityLinker.predict`. |
## EntityLinker.update {#update tag="method"}
@ -148,27 +147,6 @@ pipe's entity linking model and context encoder. Delegates to
| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
## EntityLinker.get_loss {#get_loss tag="method"}
Find the loss and gradient of loss for the entities in a batch of documents and
their predicted scores.
> #### Example
>
> ```python
> entity_linker = EntityLinker(nlp.vocab)
> kb_ids, tensors = entity_linker.predict(docs)
> loss, d_loss = entity_linker.get_loss(docs, [gold1, gold2], kb_ids, tensors)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `docs` | iterable | The batch of documents. |
| `golds` | iterable | The gold-standard data. Must have the same length as `docs`. |
| `kb_ids` | iterable | KB identifiers representing the model's predictions. |
| `tensors` | iterable | The token representations used to predict the identifiers |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
## EntityLinker.set_kb {#set_kb tag="method"}
Define the knowledge base (KB) used for disambiguating named entities to KB
@ -219,9 +197,9 @@ Create an optimizer for the pipeline component.
> optimizer = entity_linker.create_optimizer()
> ```
| Name | Type | Description |
| ----------- | -------- | -------------- |
| **RETURNS** | callable | The optimizer. |
| Name | Type | Description |
| ----------- | ----------- | --------------------------------------------------------------- |
| **RETURNS** | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
## EntityLinker.use_params {#use_params tag="method, contextmanager"}

View File

@ -15,7 +15,7 @@ via the ID `"ner"`.
> ```python
> # Construction via create_pipe
> ner = nlp.create_pipe("ner")
>
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_ner"}}
> parser = nlp.create_pipe("ner", config)
@ -92,13 +92,13 @@ Apply the pipeline's model to a batch of docs, without modifying them.
>
> ```python
> ner = EntityRecognizer(nlp.vocab)
> scores, tensors = ner.predict([doc1, doc2])
> scores = ner.predict([doc1, doc2])
> ```
| Name | Type | Description |
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. |
| **RETURNS** | list | List of `syntax.StateClass` objects. `syntax.StateClass` is a helper class for the parse state (internal). |
| Name | Type | Description |
| ----------- | ------------------ | ---------------------------------------------------------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to predict. |
| **RETURNS** | `List[StateClass]` | List of `syntax.StateClass` objects. `syntax.StateClass` is a helper class for the parse state (internal). |
## EntityRecognizer.set_annotations {#set_annotations tag="method"}
@ -108,15 +108,14 @@ Modify a batch of documents, using pre-computed scores.
>
> ```python
> ner = EntityRecognizer(nlp.vocab)
> scores, tensors = ner.predict([doc1, doc2])
> ner.set_annotations([doc1, doc2], scores, tensors)
> scores = ner.predict([doc1, doc2])
> ner.set_annotations([doc1, doc2], scores)
> ```
| Name | Type | Description |
| --------- | -------- | ---------------------------------------------------------- |
| `docs` | iterable | The documents to modify. |
| `scores` | - | The scores to set, produced by `EntityRecognizer.predict`. |
| `tensors` | iterable | The token representations used to predict the scores. |
| Name | Type | Description |
| -------- | ------------------ | ---------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | `List[StateClass]` | The scores to set, produced by `EntityRecognizer.predict`. |
## EntityRecognizer.update {#update tag="method"}
@ -151,16 +150,15 @@ predicted scores.
>
> ```python
> ner = EntityRecognizer(nlp.vocab)
> scores = ner.predict([doc1, doc2])
> loss, d_loss = ner.get_loss([doc1, doc2], [gold1, gold2], scores)
> scores = ner.predict([eg.predicted for eg in examples])
> loss, d_loss = ner.get_loss(examples, scores)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `docs` | iterable | The batch of documents. |
| `golds` | iterable | The gold-standard data. Must have the same length as `docs`. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
| Name | Type | Description |
| ----------- | ------------------- | --------------------------------------------------- |
| `examples` | `Iterable[Example]` | The batch of examples. |
| `scores` | `List[StateClass]` | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
## EntityRecognizer.begin_training {#begin_training tag="method"}
@ -182,8 +180,6 @@ Initialize the pipe for training, using data examples if available. Return an
| `sgd` | `Optimizer` | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/entityrecognizer#create_optimizer) if not set. |
| **RETURNS** | `Optimizer` | An optimizer. |
|
## EntityRecognizer.create_optimizer {#create_optimizer tag="method"}
Create an optimizer for the pipeline component.
@ -195,9 +191,9 @@ Create an optimizer for the pipeline component.
> optimizer = ner.create_optimizer()
> ```
| Name | Type | Description |
| ----------- | -------- | -------------- |
| **RETURNS** | callable | The optimizer. |
| Name | Type | Description |
| ----------- | ----------- | --------------------------------------------------------------- |
| **RETURNS** | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
## EntityRecognizer.use_params {#use_params tag="method, contextmanager"}

View File

@ -52,7 +52,7 @@ contain arbitrary whitespace. Alignment into the original string is preserved.
| Name | Type | Description |
| ----------- | ----- | --------------------------------------------------------------------------------- |
| `text` | str | The text to be processed. |
| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| `disable` | `List[str]` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| **RETURNS** | `Doc` | A container for accessing the annotations. |
## Language.pipe {#pipe tag="method"}

View File

@ -15,7 +15,7 @@ via the ID `"tagger"`.
> ```python
> # Construction via create_pipe
> tagger = nlp.create_pipe("tagger")
>
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_tagger"}}
> parser = nlp.create_pipe("tagger", config)
@ -90,13 +90,13 @@ Apply the pipeline's model to a batch of docs, without modifying them.
>
> ```python
> tagger = Tagger(nlp.vocab)
> scores, tensors = tagger.predict([doc1, doc2])
> scores = tagger.predict([doc1, doc2])
> ```
| Name | Type | Description |
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. |
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
| Name | Type | Description |
| ----------- | --------------- | ----------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to predict. |
| **RETURNS** | - | The model's prediction for each document. |
## Tagger.set_annotations {#set_annotations tag="method"}
@ -106,15 +106,14 @@ Modify a batch of documents, using pre-computed scores.
>
> ```python
> tagger = Tagger(nlp.vocab)
> scores, tensors = tagger.predict([doc1, doc2])
> tagger.set_annotations([doc1, doc2], scores, tensors)
> scores = tagger.predict([doc1, doc2])
> tagger.set_annotations([doc1, doc2], scores)
> ```
| Name | Type | Description |
| --------- | -------- | ----------------------------------------------------- |
| `docs` | iterable | The documents to modify. |
| `scores` | - | The scores to set, produced by `Tagger.predict`. |
| `tensors` | iterable | The token representations used to predict the scores. |
| Name | Type | Description |
| -------- | --------------- | ------------------------------------------------ |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | - | The scores to set, produced by `Tagger.predict`. |
## Tagger.update {#update tag="method"}
@ -149,16 +148,15 @@ predicted scores.
>
> ```python
> tagger = Tagger(nlp.vocab)
> scores = tagger.predict([doc1, doc2])
> loss, d_loss = tagger.get_loss([doc1, doc2], [gold1, gold2], scores)
> scores = tagger.predict([eg.predicted for eg in examples])
> loss, d_loss = tagger.get_loss(examples, scores)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `docs` | iterable | The batch of documents. |
| `golds` | iterable | The gold-standard data. Must have the same length as `docs`. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
| Name | Type | Description |
| ----------- | ------------------- | --------------------------------------------------- |
| `examples` | `Iterable[Example]` | The batch of examples. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
## Tagger.begin_training {#begin_training tag="method"}
@ -191,9 +189,9 @@ Create an optimizer for the pipeline component.
> optimizer = tagger.create_optimizer()
> ```
| Name | Type | Description |
| ----------- | -------- | -------------- |
| **RETURNS** | callable | The optimizer. |
| Name | Type | Description |
| ----------- | ----------- | --------------------------------------------------------------- |
| **RETURNS** | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
## Tagger.use_params {#use_params tag="method, contextmanager"}

View File

@ -16,11 +16,11 @@ via the ID `"textcat"`.
> ```python
> # Construction via create_pipe
> textcat = nlp.create_pipe("textcat")
>
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_textcat"}}
> parser = nlp.create_pipe("textcat", config)
>
>
> # Construction from class with custom model from file
> from spacy.pipeline import TextCategorizer
> model = util.load_config("model.cfg", create_objects=True)["model"]
@ -38,7 +38,7 @@ shortcut for this and instantiate the component using its string name and
| `**cfg` | - | Configuration parameters. |
| **RETURNS** | `TextCategorizer` | The newly constructed object. |
<!-- TODO move to config page
<!-- TODO move to config page
### Architectures {#architectures new="2.1"}
Text classification models can be used to solve a wide variety of problems.
@ -109,13 +109,13 @@ Apply the pipeline's model to a batch of docs, without modifying them.
>
> ```python
> textcat = TextCategorizer(nlp.vocab)
> scores, tensors = textcat.predict([doc1, doc2])
> scores = textcat.predict([doc1, doc2])
> ```
| Name | Type | Description |
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | iterable | The documents to predict. |
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
| Name | Type | Description |
| ----------- | --------------- | ----------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to predict. |
| **RETURNS** | - | The model's prediction for each document. |
## TextCategorizer.set_annotations {#set_annotations tag="method"}
@ -125,15 +125,14 @@ Modify a batch of documents, using pre-computed scores.
>
> ```python
> textcat = TextCategorizer(nlp.vocab)
> scores, tensors = textcat.predict([doc1, doc2])
> textcat.set_annotations([doc1, doc2], scores, tensors)
> scores = textcat.predict(docs)
> textcat.set_annotations(docs, scores)
> ```
| Name | Type | Description |
| --------- | -------- | --------------------------------------------------------- |
| `docs` | iterable | The documents to modify. |
| `scores` | - | The scores to set, produced by `TextCategorizer.predict`. |
| `tensors` | iterable | The token representations used to predict the scores. |
| Name | Type | Description |
| -------- | --------------- | --------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | - | The scores to set, produced by `TextCategorizer.predict`. |
## TextCategorizer.update {#update tag="method"}
@ -168,16 +167,15 @@ predicted scores.
>
> ```python
> textcat = TextCategorizer(nlp.vocab)
> scores = textcat.predict([doc1, doc2])
> loss, d_loss = textcat.get_loss([doc1, doc2], [gold1, gold2], scores)
> scores = textcat.predict([eg.predicted for eg in examples])
> loss, d_loss = textcat.get_loss(examples, scores)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `docs` | iterable | The batch of documents. |
| `golds` | iterable | The gold-standard data. Must have the same length as `docs`. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
| Name | Type | Description |
| ----------- | ------------------- | --------------------------------------------------- |
| `examples` | `Iterable[Example]` | The batch of examples. |
| `scores` | - | Scores representing the model's predictions. |
| **RETURNS** | tuple | The loss and the gradient, i.e. `(loss, gradient)`. |
## TextCategorizer.begin_training {#begin_training tag="method"}
@ -210,9 +208,9 @@ Create an optimizer for the pipeline component.
> optimizer = textcat.create_optimizer()
> ```
| Name | Type | Description |
| ----------- | -------- | -------------- |
| **RETURNS** | callable | The optimizer. |
| Name | Type | Description |
| ----------- | ----------- | --------------------------------------------------------------- |
| **RETURNS** | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
## TextCategorizer.use_params {#use_params tag="method, contextmanager"}

View File

@ -34,7 +34,7 @@ loaded in via [`Language.from_disk`](/api/language#from_disk).
| Name | Type | Description |
| ----------- | ------------ | --------------------------------------------------------------------------------- |
| `name` | str / `Path` | Model to load, i.e. package name or path. |
| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| `disable` | `List[str]` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| **RETURNS** | `Language` | A `Language` object with the loaded model. |
Essentially, `spacy.load()` is a convenience wrapper that reads the language ID
@ -61,11 +61,11 @@ Create a blank model of a given language class. This function is the twin of
> nlp_de = spacy.blank("de")
> ```
| Name | Type | Description |
| ----------- | ---------- | ------------------------------------------------------------------------------------------------ |
| `name` | str | [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) of the language class to load. |
| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| **RETURNS** | `Language` | An empty `Language` object of the appropriate subclass. |
| Name | Type | Description |
| ----------- | ----------- | ------------------------------------------------------------------------------------------------ |
| `name` | str | [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) of the language class to load. |
| `disable` | `List[str]` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| **RETURNS** | `Language` | An empty `Language` object of the appropriate subclass. |
#### spacy.info {#spacy.info tag="function"}