From 90b100c39fb5e878404e35044ee4a3561b871a7b Mon Sep 17 00:00:00 2001 From: svlandeg Date: Wed, 8 Jul 2020 12:14:30 +0200 Subject: [PATCH] remove component.Model, update constructor, losses is return value of update --- website/docs/api/dependencyparser.md | 46 +++++++++++--------------- website/docs/api/entitylinker.md | 47 +++++++++++---------------- website/docs/api/entityrecognizer.md | 42 ++++++++++-------------- website/docs/api/language.md | 19 +++++------ website/docs/api/tagger.md | 43 +++++++++++-------------- website/docs/api/textcategorizer.md | 48 ++++++++++++---------------- 6 files changed, 104 insertions(+), 141 deletions(-) diff --git a/website/docs/api/dependencyparser.md b/website/docs/api/dependencyparser.md index 9c9a60490..0e493e600 100644 --- a/website/docs/api/dependencyparser.md +++ b/website/docs/api/dependencyparser.md @@ -8,35 +8,28 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline component is available in the [processing pipeline](/usage/processing-pipelines) via the ID `"parser"`. -## DependencyParser.Model {#model tag="classmethod"} - -Initialize a model for the pipe. The model should implement the -`thinc.neural.Model` API. Wrappers are under development for most major machine -learning libraries. - -| Name | Type | Description | -| ----------- | ------ | ------------------------------------- | -| `**kwargs` | - | Parameters for initializing the model | -| **RETURNS** | object | The initialized model. | - ## DependencyParser.\_\_init\_\_ {#init tag="method"} -Create a new pipeline instance. In your application, you would normally use a -shortcut for this and instantiate the component using its string name and -[`nlp.create_pipe`](/api/language#create_pipe). - > #### Example > > ```python -> # Construction via create_pipe +> # Construction via create_pipe with default model > parser = nlp.create_pipe("parser") +> +> # Construction via create_pipe with custom model +> config = {"model": {"@architectures": "my_parser"}} +> parser = nlp.create_pipe("parser", config) > -> # Construction from class +> # Construction from class with custom model from file > from spacy.pipeline import DependencyParser -> parser = DependencyParser(nlp.vocab, parser_model) -> parser.from_disk("/path/to/model") +> model = util.load_config("model.cfg", create_objects=True)["model"] +> parser = DependencyParser(nlp.vocab, model) > ``` +Create a new pipeline instance. In your application, you would normally use a +shortcut for this and instantiate the component using its string name and +[`nlp.create_pipe`](/api/language#create_pipe). + | Name | Type | Description | | ----------- | ------------------ | ------------------------------------------------------------------------------- | | `vocab` | `Vocab` | The shared vocabulary. | @@ -85,11 +78,11 @@ applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and > pass > ``` -| Name | Type | Description | -| ------------ | -------- | ------------------------------------------------------ | -| `stream` | iterable | A stream of documents. | -| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | -| **YIELDS** | `Doc` | Processed documents in the order of the original text. | +| Name | Type | Description | +| ------------ | --------------- | ------------------------------------------------------ | +| `stream` | `Iterable[Doc]` | A stream of documents. | +| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | +| **YIELDS** | `Doc` | Processed documents in the order of the original text. | ## DependencyParser.predict {#predict tag="method"} @@ -104,7 +97,7 @@ Apply the pipeline's model to a batch of docs, without modifying them. | Name | Type | Description | | ----------- | ------------------- | ---------------------------------------------- | -| `docs` | iterable | The documents to predict. | +| `docs` | `Iterable[Doc]` | The documents to predict. | | **RETURNS** | `syntax.StateClass` | A helper class for the parse state (internal). | ## DependencyParser.set_annotations {#set_annotations tag="method"} @@ -134,9 +127,8 @@ model. Delegates to [`predict`](/api/dependencyparser#predict) and > > ```python > parser = DependencyParser(nlp.vocab, parser_model) -> losses = {} > optimizer = nlp.begin_training() -> parser.update(examples, losses=losses, sgd=optimizer) +> losses = parser.update(examples, sgd=optimizer) > ``` | Name | Type | Description | diff --git a/website/docs/api/entitylinker.md b/website/docs/api/entitylinker.md index 1e6a56a48..754c2fc33 100644 --- a/website/docs/api/entitylinker.md +++ b/website/docs/api/entitylinker.md @@ -12,36 +12,28 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline component is available in the [processing pipeline](/usage/processing-pipelines) via the ID `"entity_linker"`. -## EntityLinker.Model {#model tag="classmethod"} - -Initialize a model for the pipe. The model should implement the -`thinc.neural.Model` API, and should contain a field `tok2vec` that contains the -context encoder. Wrappers are under development for most major machine learning -libraries. - -| Name | Type | Description | -| ----------- | ------ | ------------------------------------- | -| `**kwargs` | - | Parameters for initializing the model | -| **RETURNS** | object | The initialized model. | - ## EntityLinker.\_\_init\_\_ {#init tag="method"} -Create a new pipeline instance. In your application, you would normally use a -shortcut for this and instantiate the component using its string name and -[`nlp.create_pipe`](/api/language#create_pipe). - > #### Example > > ```python -> # Construction via create_pipe +> # Construction via create_pipe with default model > entity_linker = nlp.create_pipe("entity_linker") > -> # Construction from class +> # Construction via create_pipe with custom model +> config = {"model": {"@architectures": "my_el"}} +> entity_linker = nlp.create_pipe("entity_linker", config) +> +> # Construction from class with custom model from file > from spacy.pipeline import EntityLinker -> entity_linker = EntityLinker(nlp.vocab, nel_model) -> entity_linker.from_disk("/path/to/model") +> model = util.load_config("model.cfg", create_objects=True)["model"] +> entity_linker = EntityLinker(nlp.vocab, model) > ``` +Create a new pipeline instance. In your application, you would normally use a +shortcut for this and instantiate the component using its string name and +[`nlp.create_pipe`](/api/language#create_pipe). + | Name | Type | Description | | ------- | ------- | ------------------------------------------------------------------------------- | | `vocab` | `Vocab` | The shared vocabulary. | @@ -90,11 +82,11 @@ applied to the `Doc` in order. Both [`__call__`](/api/entitylinker#call) and > pass > ``` -| Name | Type | Description | -| ------------ | -------- | ------------------------------------------------------ | -| `stream` | iterable | A stream of documents. | -| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | -| **YIELDS** | `Doc` | Processed documents in the order of the original text. | +| Name | Type | Description | +| ------------ | --------------- | ------------------------------------------------------ | +| `stream` | `Iterable[Doc]` | A stream of documents. | +| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | +| **YIELDS** | `Doc` | Processed documents in the order of the original text. | ## EntityLinker.predict {#predict tag="method"} @@ -142,9 +134,8 @@ pipe's entity linking model and context encoder. Delegates to > > ```python > entity_linker = EntityLinker(nlp.vocab, nel_model) -> losses = {} > optimizer = nlp.begin_training() -> entity_linker.update(examples, losses=losses, sgd=optimizer) +> losses = entity_linker.update(examples, sgd=optimizer) > ``` | Name | Type | Description | @@ -155,7 +146,7 @@ pipe's entity linking model and context encoder. Delegates to | `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/entitylinker#set_annotations). | | `sgd` | `Optimizer` | [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. | | `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. | -| **RETURNS** | float | The loss from this batch. | +| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. | ## EntityLinker.get_loss {#get_loss tag="method"} diff --git a/website/docs/api/entityrecognizer.md b/website/docs/api/entityrecognizer.md index 9a9b0926b..5739afff4 100644 --- a/website/docs/api/entityrecognizer.md +++ b/website/docs/api/entityrecognizer.md @@ -8,35 +8,28 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline component is available in the [processing pipeline](/usage/processing-pipelines) via the ID `"ner"`. -## EntityRecognizer.Model {#model tag="classmethod"} - -Initialize a model for the pipe. The model should implement the -`thinc.neural.Model` API. Wrappers are under development for most major machine -learning libraries. - -| Name | Type | Description | -| ----------- | ------ | ------------------------------------- | -| `**kwargs` | - | Parameters for initializing the model | -| **RETURNS** | object | The initialized model. | - ## EntityRecognizer.\_\_init\_\_ {#init tag="method"} -Create a new pipeline instance. In your application, you would normally use a -shortcut for this and instantiate the component using its string name and -[`nlp.create_pipe`](/api/language#create_pipe). - > #### Example > > ```python > # Construction via create_pipe > ner = nlp.create_pipe("ner") +> +> # Construction via create_pipe with custom model +> config = {"model": {"@architectures": "my_ner"}} +> parser = nlp.create_pipe("ner", config) > -> # Construction from class +> # Construction from class with custom model from file > from spacy.pipeline import EntityRecognizer -> ner = EntityRecognizer(nlp.vocab, ner_model) -> ner.from_disk("/path/to/model") +> model = util.load_config("model.cfg", create_objects=True)["model"] +> ner = EntityRecognizer(nlp.vocab, model) > ``` +Create a new pipeline instance. In your application, you would normally use a +shortcut for this and instantiate the component using its string name and +[`nlp.create_pipe`](/api/language#create_pipe). + | Name | Type | Description | | ----------- | ------------------ | ------------------------------------------------------------------------------- | | `vocab` | `Vocab` | The shared vocabulary. | @@ -85,11 +78,11 @@ applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and > pass > ``` -| Name | Type | Description | -| ------------ | -------- | ------------------------------------------------------ | -| `stream` | iterable | A stream of documents. | -| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | -| **YIELDS** | `Doc` | Processed documents in the order of the original text. | +| Name | Type | Description | +| ------------ | --------------- | ------------------------------------------------------ | +| `stream` | `Iterable[Doc]` | A stream of documents. | +| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | +| **YIELDS** | `Doc` | Processed documents in the order of the original text. | ## EntityRecognizer.predict {#predict tag="method"} @@ -135,9 +128,8 @@ model. Delegates to [`predict`](/api/entityrecognizer#predict) and > > ```python > ner = EntityRecognizer(nlp.vocab, ner_model) -> losses = {} > optimizer = nlp.begin_training() -> ner.update(examples, losses=losses, sgd=optimizer) +> losses = ner.update(examples, sgd=optimizer) > ``` | Name | Type | Description | diff --git a/website/docs/api/language.md b/website/docs/api/language.md index f6631b1db..c9cfd2f2d 100644 --- a/website/docs/api/language.md +++ b/website/docs/api/language.md @@ -68,15 +68,15 @@ more efficient than processing texts one-by-one. > assert doc.is_parsed > ``` -| Name | Type | Description | -| -------------------------------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `texts` | iterable | A sequence of strings. | -| `as_tuples` | bool | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. | -| `batch_size` | int | The number of texts to buffer. | -| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). | -| `component_cfg` 2.1 | dict | Config parameters for specific pipeline components, keyed by component name. | -| `n_process` 2.2.2 | int | Number of processors to use, only supported in Python 3. Defaults to `1`. | -| **YIELDS** | `Doc` | Documents in the order of the original text. | +| Name | Type | Description | +| -------------------------------------------- | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `texts` | `Iterable[str]` | A sequence of strings. | +| `as_tuples` | bool | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. | +| `batch_size` | int | The number of texts to buffer. | +| `disable` | `List[str]` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). | +| `component_cfg` 2.1 | `Dict[str, Dict]` | Config parameters for specific pipeline components, keyed by component name. | +| `n_process` 2.2.2 | int | Number of processors to use, only supported in Python 3. Defaults to `1`. | +| **YIELDS** | `Doc` | Documents in the order of the original text. | ## Language.update {#update tag="method"} @@ -99,6 +99,7 @@ Update the models in the pipeline. | `sgd` | `Optimizer` | An [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. | | `losses` | `Dict[str, float]` | Dictionary to update with the loss, keyed by pipeline component. | | `component_cfg` 2.1 | `Dict[str, Dict]` | Config parameters for specific pipeline components, keyed by component name. | +| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. | ## Language.evaluate {#evaluate tag="method"} diff --git a/website/docs/api/tagger.md b/website/docs/api/tagger.md index 1aa5fb327..5f625f842 100644 --- a/website/docs/api/tagger.md +++ b/website/docs/api/tagger.md @@ -8,35 +8,28 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline component is available in the [processing pipeline](/usage/processing-pipelines) via the ID `"tagger"`. -## Tagger.Model {#model tag="classmethod"} - -Initialize a model for the pipe. The model should implement the -`thinc.neural.Model` API. Wrappers are under development for most major machine -learning libraries. - -| Name | Type | Description | -| ----------- | ------ | ------------------------------------- | -| `**kwargs` | - | Parameters for initializing the model | -| **RETURNS** | object | The initialized model. | - ## Tagger.\_\_init\_\_ {#init tag="method"} -Create a new pipeline instance. In your application, you would normally use a -shortcut for this and instantiate the component using its string name and -[`nlp.create_pipe`](/api/language#create_pipe). - > #### Example > > ```python > # Construction via create_pipe > tagger = nlp.create_pipe("tagger") +> +> # Construction via create_pipe with custom model +> config = {"model": {"@architectures": "my_tagger"}} +> parser = nlp.create_pipe("tagger", config) > -> # Construction from class +> # Construction from class with custom model from file > from spacy.pipeline import Tagger -> tagger = Tagger(nlp.vocab, tagger_model) -> tagger.from_disk("/path/to/model") +> model = util.load_config("model.cfg", create_objects=True)["model"] +> tagger = Tagger(nlp.vocab, model) > ``` +Create a new pipeline instance. In your application, you would normally use a +shortcut for this and instantiate the component using its string name and +[`nlp.create_pipe`](/api/language#create_pipe). + | Name | Type | Description | | ----------- | -------- | ------------------------------------------------------------------------------- | | `vocab` | `Vocab` | The shared vocabulary. | @@ -83,11 +76,11 @@ applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and > pass > ``` -| Name | Type | Description | -| ------------ | -------- | ------------------------------------------------------ | -| `stream` | iterable | A stream of documents. | -| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | -| **YIELDS** | `Doc` | Processed documents in the order of the original text. | +| Name | Type | Description | +| ------------ | --------------- | ------------------------------------------------------ | +| `stream` | `Iterable[Doc]` | A stream of documents. | +| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | +| **YIELDS** | `Doc` | Processed documents in the order of the original text. | ## Tagger.predict {#predict tag="method"} @@ -133,9 +126,8 @@ pipe's model. Delegates to [`predict`](/api/tagger#predict) and > > ```python > tagger = Tagger(nlp.vocab, tagger_model) -> losses = {} > optimizer = nlp.begin_training() -> tagger.update(examples, losses=losses, sgd=optimizer) +> losses = tagger.update(examples, sgd=optimizer) > ``` | Name | Type | Description | @@ -146,6 +138,7 @@ pipe's model. Delegates to [`predict`](/api/tagger#predict) and | `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/tagger#set_annotations). | | `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. | | `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. | +| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. | ## Tagger.get_loss {#get_loss tag="method"} diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md index c0c3e15a0..ff9890dd6 100644 --- a/website/docs/api/textcategorizer.md +++ b/website/docs/api/textcategorizer.md @@ -9,36 +9,28 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline component is available in the [processing pipeline](/usage/processing-pipelines) via the ID `"textcat"`. -## TextCategorizer.Model {#model tag="classmethod"} - -Initialize a model for the pipe. The model should implement the -`thinc.neural.Model` API. Wrappers are under development for most major machine -learning libraries. - -| Name | Type | Description | -| ----------- | ------ | ------------------------------------- | -| `**kwargs` | - | Parameters for initializing the model | -| **RETURNS** | object | The initialized model. | - ## TextCategorizer.\_\_init\_\_ {#init tag="method"} -Create a new pipeline instance. In your application, you would normally use a -shortcut for this and instantiate the component using its string name and -[`nlp.create_pipe`](/api/language#create_pipe). - > #### Example > > ```python > # Construction via create_pipe > textcat = nlp.create_pipe("textcat") -> textcat = nlp.create_pipe("textcat", config={"exclusive_classes": True}) -> -> # Construction from class +> +> # Construction via create_pipe with custom model +> config = {"model": {"@architectures": "my_textcat"}} +> parser = nlp.create_pipe("textcat", config) +> +> # Construction from class with custom model from file > from spacy.pipeline import TextCategorizer -> textcat = TextCategorizer(nlp.vocab, textcat_model) -> textcat.from_disk("/path/to/model") +> model = util.load_config("model.cfg", create_objects=True)["model"] +> textcat = TextCategorizer(nlp.vocab, model) > ``` +Create a new pipeline instance. In your application, you would normally use a +shortcut for this and instantiate the component using its string name and +[`nlp.create_pipe`](/api/language#create_pipe). + | Name | Type | Description | | ----------- | ----------------- | ------------------------------------------------------------------------------- | | `vocab` | `Vocab` | The shared vocabulary. | @@ -46,6 +38,7 @@ shortcut for this and instantiate the component using its string name and | `**cfg` | - | Configuration parameters. | | **RETURNS** | `TextCategorizer` | The newly constructed object. | + ## TextCategorizer.\_\_call\_\_ {#call tag="method"} @@ -101,11 +95,11 @@ applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and > pass > ``` -| Name | Type | Description | -| ------------ | -------- | ------------------------------------------------------ | -| `stream` | iterable | A stream of documents. | -| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | -| **YIELDS** | `Doc` | Processed documents in the order of the original text. | +| Name | Type | Description | +| ------------ | --------------- | ------------------------------------------------------ | +| `stream` | `Iterable[Doc]` | A stream of documents. | +| `batch_size` | int | The number of texts to buffer. Defaults to `128`. | +| **YIELDS** | `Doc` | Processed documents in the order of the original text. | ## TextCategorizer.predict {#predict tag="method"} @@ -151,9 +145,8 @@ pipe's model. Delegates to [`predict`](/api/textcategorizer#predict) and > > ```python > textcat = TextCategorizer(nlp.vocab, textcat_model) -> losses = {} > optimizer = nlp.begin_training() -> textcat.update(examples, losses=losses, sgd=optimizer) +> losses = textcat.update(examples, sgd=optimizer) > ``` | Name | Type | Description | @@ -164,6 +157,7 @@ pipe's model. Delegates to [`predict`](/api/textcategorizer#predict) and | `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/textcategorizer#set_annotations). | | `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. | | `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. | +| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. | ## TextCategorizer.get_loss {#get_loss tag="method"}