Merge branch 'develop' of https://github.com/explosion/spaCy into develop

This commit is contained in:
Matthew Honnibal 2020-09-14 21:04:49 +02:00
commit adf0bab23a
25 changed files with 206 additions and 116 deletions

View File

@ -4,17 +4,19 @@
spaCy is a library for advanced Natural Language Processing in Python and spaCy is a library for advanced Natural Language Processing in Python and
Cython. It's built on the very latest research, and was designed from day one to Cython. It's built on the very latest research, and was designed from day one to
be used in real products. spaCy comes with be used in real products.
[pretrained statistical models](https://spacy.io/models) and word vectors, and
currently supports tokenization for **60+ languages**. It features spaCy comes with
[pretrained pipelines](https://spacy.io/models) and vectors, and
currently supports tokenization for **59+ languages**. It features
state-of-the-art speed, convolutional **neural network models** for tagging, state-of-the-art speed, convolutional **neural network models** for tagging,
parsing and **named entity recognition** and easy **deep learning** integration. parsing, **named entity recognition**, **text classification** and more, multi-task learning with pretrained **transformers** like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management.
It's commercial open-source software, released under the MIT license. spaCy is commercial open-source software, released under the MIT license.
💫 **Version 2.3 out now!** 💫 **Version 2.3 out now!**
[Check out the release notes here.](https://github.com/explosion/spaCy/releases) [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
[![Azure Pipelines](<https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build+(3.x)>)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8) [![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
[![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases) [![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases)
[![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/) [![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
[![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy) [![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
@ -31,7 +33,7 @@ It's commercial open-source software, released under the MIT license.
| --------------- | -------------------------------------------------------------- | | --------------- | -------------------------------------------------------------- |
| [spaCy 101] | New to spaCy? Here's everything you need to know! | | [spaCy 101] | New to spaCy? Here's everything you need to know! |
| [Usage Guides] | How to use spaCy and its features. | | [Usage Guides] | How to use spaCy and its features. |
| [New in v2.3] | New features, backwards incompatibilities and migration guide. | | [New in v3.0] | New features, backwards incompatibilities and migration guide. |
| [API Reference] | The detailed reference for spaCy's API. | | [API Reference] | The detailed reference for spaCy's API. |
| [Models] | Download statistical language models for spaCy. | | [Models] | Download statistical language models for spaCy. |
| [Universe] | Libraries, extensions, demos, books and courses. | | [Universe] | Libraries, extensions, demos, books and courses. |
@ -39,7 +41,7 @@ It's commercial open-source software, released under the MIT license.
| [Contribute] | How to contribute to the spaCy project and code base. | | [Contribute] | How to contribute to the spaCy project and code base. |
[spacy 101]: https://spacy.io/usage/spacy-101 [spacy 101]: https://spacy.io/usage/spacy-101
[new in v2.3]: https://spacy.io/usage/v2-3 [new in v3.0]: https://spacy.io/usage/v3
[usage guides]: https://spacy.io/usage/ [usage guides]: https://spacy.io/usage/
[api reference]: https://spacy.io/api/ [api reference]: https://spacy.io/api/
[models]: https://spacy.io/models [models]: https://spacy.io/models
@ -56,34 +58,29 @@ be able to provide individual support via email. We also believe that help is
much more valuable if it's shared publicly, so that more people can benefit from much more valuable if it's shared publicly, so that more people can benefit from
it. it.
| Type | Platforms | | Type | Platforms |
| ------------------------ | ------------------------------------------------------ | | ----------------------- | ---------------------- |
| 🚨 **Bug Reports** | [GitHub Issue Tracker] | | 🚨 **Bug Reports** | [GitHub Issue Tracker] |
| 🎁 **Feature Requests** | [GitHub Issue Tracker] | | 🎁 **Feature Requests** | [GitHub Issue Tracker] |
| 👩‍💻 **Usage Questions** | [Stack Overflow] · [Gitter Chat] · [Reddit User Group] | | 👩‍💻 **Usage Questions** | [Stack Overflow] |
| 🗯 **General Discussion** | [Gitter Chat] · [Reddit User Group] |
[github issue tracker]: https://github.com/explosion/spaCy/issues [github issue tracker]: https://github.com/explosion/spaCy/issues
[stack overflow]: https://stackoverflow.com/questions/tagged/spacy [stack overflow]: https://stackoverflow.com/questions/tagged/spacy
[gitter chat]: https://gitter.im/explosion/spaCy
[reddit user group]: https://www.reddit.com/r/spacynlp
## Features ## Features
- Non-destructive **tokenization** - Support for **59+ languages**
- **Named entity** recognition - **Trained pipelines**
- Support for **50+ languages** - Multi-task learning with pretrained **transformers** like BERT
- pretrained [statistical models](https://spacy.io/models) and word vectors - Pretrained **word vectors**
- State-of-the-art speed - State-of-the-art speed
- Easy **deep learning** integration - Production-ready **training system**
- Part-of-speech tagging - Linguistically-motivated **tokenization**
- Labelled dependency parsing - Components for named **entity recognition**, part-of-speech-tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
- Syntax-driven sentence segmentation - Easily extensible with **custom components** and attributes
- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
- Built in **visualizers** for syntax and NER - Built in **visualizers** for syntax and NER
- Convenient string-to-hash mapping - Easy **model packaging**, deployment and workflow management
- Export to numpy data arrays
- Efficient binary serialization
- Easy **model packaging** and deployment
- Robust, rigorously evaluated accuracy - Robust, rigorously evaluated accuracy
📖 **For more details, see the 📖 **For more details, see the
@ -102,13 +99,6 @@ For detailed installation instructions, see the
[pip]: https://pypi.org/project/spacy/ [pip]: https://pypi.org/project/spacy/
[conda]: https://anaconda.org/conda-forge/spacy [conda]: https://anaconda.org/conda-forge/spacy
> ⚠️ **Important note for Python 3.8:** We can't yet ship pre-compiled binary
> wheels for spaCy that work on Python 3.8, as we're still waiting for our CI
> providers and other tooling to support it. This means that in order to run
> spaCy on Python 3.8, you'll need [a compiler installed](#source) and compile
> the library and its Cython dependencies locally. If this is causing problems
> for you, the easiest solution is to **use Python 3.7** in the meantime.
### pip ### pip
Using pip, spaCy releases are available as source packages and binary wheels (as Using pip, spaCy releases are available as source packages and binary wheels (as
@ -164,26 +154,26 @@ If you've trained your own models, keep in mind that your training and runtime
inputs must match. After updating spaCy, we recommend **retraining your models** inputs must match. After updating spaCy, we recommend **retraining your models**
with the new version. with the new version.
📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the 📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the
[migration guide](https://spacy.io/usage/v2#migrating).** [migration guide](https://spacy.io/usage/v3#migrating).**
## Download models ## Download models
As of v1.7.0, models for spaCy can be installed as **Python packages**. This Trained pipelines for spaCy can be installed as **Python packages**. This
means that they're a component of your application, just like any other module. means that they're a component of your application, just like any other module.
Models can be installed using spaCy's `download` command, or manually by Models can be installed using spaCy's `download` command, or manually by
pointing pip to a path or URL. pointing pip to a path or URL.
| Documentation | | | Documentation | |
| ---------------------- | ------------------------------------------------------------- | | ---------------------- | ---------------------------------------------------------------- |
| [Available Models] | Detailed model descriptions, accuracy figures and benchmarks. | | [Available Pipelines] | Detailed pipeline descriptions, accuracy figures and benchmarks. |
| [Models Documentation] | Detailed usage instructions. | | [Models Documentation] | Detailed usage instructions. |
[available models]: https://spacy.io/models [available pipelines]: https://spacy.io/models
[models documentation]: https://spacy.io/docs/usage/models [models documentation]: https://spacy.io/docs/usage/models
```bash ```bash
# download best-matching version of specific model for your spaCy installation # Download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm python -m spacy download en_core_web_sm
# pip install .tar.gz archive from path or URL # pip install .tar.gz archive from path or URL

View File

@ -89,7 +89,6 @@ def train(
nlp, config = util.load_model_from_config(config) nlp, config = util.load_model_from_config(config)
if config["training"]["vectors"] is not None: if config["training"]["vectors"] is not None:
util.load_vectors_into_model(nlp, config["training"]["vectors"]) util.load_vectors_into_model(nlp, config["training"]["vectors"])
verify_config(nlp)
raw_text, tag_map, morph_rules, weights_data = load_from_paths(config) raw_text, tag_map, morph_rules, weights_data = load_from_paths(config)
T_cfg = config["training"] T_cfg = config["training"]
optimizer = T_cfg["optimizer"] optimizer = T_cfg["optimizer"]
@ -108,6 +107,8 @@ def train(
nlp.resume_training(sgd=optimizer) nlp.resume_training(sgd=optimizer)
with nlp.select_pipes(disable=[*frozen_components, *resume_components]): with nlp.select_pipes(disable=[*frozen_components, *resume_components]):
nlp.begin_training(lambda: train_corpus(nlp), sgd=optimizer) nlp.begin_training(lambda: train_corpus(nlp), sgd=optimizer)
# Verify the config after calling 'begin_training' to ensure labels are properly initialized
verify_config(nlp)
if tag_map: if tag_map:
# Replace tag map with provided mapping # Replace tag map with provided mapping
@ -401,7 +402,7 @@ def verify_cli_args(config_path: Path, output_path: Optional[Path] = None) -> No
def verify_config(nlp: Language) -> None: def verify_config(nlp: Language) -> None:
"""Perform additional checks based on the config and loaded nlp object.""" """Perform additional checks based on the config, loaded nlp object and training data."""
# TODO: maybe we should validate based on the actual components, the list # TODO: maybe we should validate based on the actual components, the list
# in config["nlp"]["pipeline"] instead? # in config["nlp"]["pipeline"] instead?
for pipe_config in nlp.config["components"].values(): for pipe_config in nlp.config["components"].values():
@ -415,18 +416,13 @@ def verify_textcat_config(nlp: Language, pipe_config: Dict[str, Any]) -> None:
# if 'positive_label' is provided: double check whether it's in the data and # if 'positive_label' is provided: double check whether it's in the data and
# the task is binary # the task is binary
if pipe_config.get("positive_label"): if pipe_config.get("positive_label"):
textcat_labels = nlp.get_pipe("textcat").cfg.get("labels", []) textcat_labels = nlp.get_pipe("textcat").labels
pos_label = pipe_config.get("positive_label") pos_label = pipe_config.get("positive_label")
if pos_label not in textcat_labels: if pos_label not in textcat_labels:
msg.fail( raise ValueError(
f"The textcat's 'positive_label' config setting '{pos_label}' " Errors.E920.format(pos_label=pos_label, labels=textcat_labels)
f"does not match any label in the training data.",
exits=1,
) )
if len(textcat_labels) != 2: if len(list(textcat_labels)) != 2:
msg.fail( raise ValueError(
f"A textcat 'positive_label' '{pos_label}' was " Errors.E919.format(pos_label=pos_label, labels=textcat_labels)
f"provided for training data that does not appear to be a "
f"binary classification problem with two labels.",
exits=1,
) )

View File

@ -480,6 +480,11 @@ class Errors:
E201 = ("Span index out of range.") E201 = ("Span index out of range.")
# TODO: fix numbering after merging develop into master # TODO: fix numbering after merging develop into master
E919 = ("A textcat 'positive_label' '{pos_label}' was provided for training "
"data that does not appear to be a binary classification problem "
"with two labels. Labels found: {labels}")
E920 = ("The textcat's 'positive_label' config setting '{pos_label}' "
"does not match any label in the training data. Labels found: {labels}")
E921 = ("The method 'set_output' can only be called on components that have " E921 = ("The method 'set_output' can only be called on components that have "
"a Model with a 'resize_output' attribute. Otherwise, the output " "a Model with a 'resize_output' attribute. Otherwise, the output "
"layer can not be dynamically changed.") "layer can not be dynamically changed.")

View File

@ -56,7 +56,12 @@ subword_features = true
@Language.factory( @Language.factory(
"textcat", "textcat",
assigns=["doc.cats"], assigns=["doc.cats"],
default_config={"labels": [], "threshold": 0.5, "model": DEFAULT_TEXTCAT_MODEL}, default_config={
"labels": [],
"threshold": 0.5,
"positive_label": None,
"model": DEFAULT_TEXTCAT_MODEL,
},
scores=[ scores=[
"cats_score", "cats_score",
"cats_score_desc", "cats_score_desc",
@ -74,8 +79,9 @@ def make_textcat(
nlp: Language, nlp: Language,
name: str, name: str,
model: Model[List[Doc], List[Floats2d]], model: Model[List[Doc], List[Floats2d]],
labels: Iterable[str], labels: List[str],
threshold: float, threshold: float,
positive_label: Optional[str],
) -> "TextCategorizer": ) -> "TextCategorizer":
"""Create a TextCategorizer compoment. The text categorizer predicts categories """Create a TextCategorizer compoment. The text categorizer predicts categories
over a whole document. It can learn one or more labels, and the labels can over a whole document. It can learn one or more labels, and the labels can
@ -88,8 +94,16 @@ def make_textcat(
labels (list): A list of categories to learn. If empty, the model infers the labels (list): A list of categories to learn. If empty, the model infers the
categories from the data. categories from the data.
threshold (float): Cutoff to consider a prediction "positive". threshold (float): Cutoff to consider a prediction "positive".
positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.
""" """
return TextCategorizer(nlp.vocab, model, name, labels=labels, threshold=threshold) return TextCategorizer(
nlp.vocab,
model,
name,
labels=labels,
threshold=threshold,
positive_label=positive_label,
)
class TextCategorizer(Pipe): class TextCategorizer(Pipe):
@ -104,8 +118,9 @@ class TextCategorizer(Pipe):
model: Model, model: Model,
name: str = "textcat", name: str = "textcat",
*, *,
labels: Iterable[str], labels: List[str],
threshold: float, threshold: float,
positive_label: Optional[str],
) -> None: ) -> None:
"""Initialize a text categorizer. """Initialize a text categorizer.
@ -113,8 +128,9 @@ class TextCategorizer(Pipe):
model (thinc.api.Model): The Thinc Model powering the pipeline component. model (thinc.api.Model): The Thinc Model powering the pipeline component.
name (str): The component instance name, used to add entries to the name (str): The component instance name, used to add entries to the
losses during training. losses during training.
labels (Iterable[str]): The labels to use. labels (List[str]): The labels to use.
threshold (float): Cutoff to consider a prediction "positive". threshold (float): Cutoff to consider a prediction "positive".
positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.
DOCS: https://nightly.spacy.io/api/textcategorizer#init DOCS: https://nightly.spacy.io/api/textcategorizer#init
""" """
@ -122,7 +138,11 @@ class TextCategorizer(Pipe):
self.model = model self.model = model
self.name = name self.name = name
self._rehearsal_model = None self._rehearsal_model = None
cfg = {"labels": labels, "threshold": threshold} cfg = {
"labels": labels,
"threshold": threshold,
"positive_label": positive_label,
}
self.cfg = dict(cfg) self.cfg = dict(cfg)
@property @property
@ -131,10 +151,10 @@ class TextCategorizer(Pipe):
DOCS: https://nightly.spacy.io/api/textcategorizer#labels DOCS: https://nightly.spacy.io/api/textcategorizer#labels
""" """
return tuple(self.cfg.setdefault("labels", [])) return tuple(self.cfg["labels"])
@labels.setter @labels.setter
def labels(self, value: Iterable[str]) -> None: def labels(self, value: List[str]) -> None:
self.cfg["labels"] = tuple(value) self.cfg["labels"] = tuple(value)
def pipe(self, stream: Iterable[Doc], *, batch_size: int = 128) -> Iterator[Doc]: def pipe(self, stream: Iterable[Doc], *, batch_size: int = 128) -> Iterator[Doc]:
@ -353,17 +373,10 @@ class TextCategorizer(Pipe):
sgd = self.create_optimizer() sgd = self.create_optimizer()
return sgd return sgd
def score( def score(self, examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
self,
examples: Iterable[Example],
*,
positive_label: Optional[str] = None,
**kwargs,
) -> Dict[str, Any]:
"""Score a batch of examples. """Score a batch of examples.
examples (Iterable[Example]): The examples to score. examples (Iterable[Example]): The examples to score.
positive_label (str): Optional positive label.
RETURNS (Dict[str, Any]): The scores, produced by Scorer.score_cats. RETURNS (Dict[str, Any]): The scores, produced by Scorer.score_cats.
DOCS: https://nightly.spacy.io/api/textcategorizer#score DOCS: https://nightly.spacy.io/api/textcategorizer#score
@ -374,7 +387,7 @@ class TextCategorizer(Pipe):
"cats", "cats",
labels=self.labels, labels=self.labels,
multi_label=self.model.attrs["multi_label"], multi_label=self.model.attrs["multi_label"],
positive_label=positive_label, positive_label=self.cfg["positive_label"],
threshold=self.cfg["threshold"], threshold=self.cfg["threshold"],
**kwargs, **kwargs,
) )

View File

@ -10,6 +10,7 @@ from spacy.tokens import Doc
from spacy.pipeline.tok2vec import DEFAULT_TOK2VEC_MODEL from spacy.pipeline.tok2vec import DEFAULT_TOK2VEC_MODEL
from ..util import make_tempdir from ..util import make_tempdir
from ...cli.train import verify_textcat_config
from ...training import Example from ...training import Example
@ -130,7 +131,10 @@ def test_overfitting_IO():
fix_random_seed(0) fix_random_seed(0)
nlp = English() nlp = English()
# Set exclusive labels # Set exclusive labels
textcat = nlp.add_pipe("textcat", config={"model": {"exclusive_classes": True}}) textcat = nlp.add_pipe(
"textcat",
config={"model": {"exclusive_classes": True}, "positive_label": "POSITIVE"},
)
train_examples = [] train_examples = []
for text, annotations in TRAIN_DATA: for text, annotations in TRAIN_DATA:
train_examples.append(Example.from_dict(nlp.make_doc(text), annotations)) train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
@ -159,7 +163,7 @@ def test_overfitting_IO():
assert cats2["POSITIVE"] + cats2["NEGATIVE"] == pytest.approx(1.0, 0.001) assert cats2["POSITIVE"] + cats2["NEGATIVE"] == pytest.approx(1.0, 0.001)
# Test scoring # Test scoring
scores = nlp.evaluate(train_examples, scorer_cfg={"positive_label": "POSITIVE"}) scores = nlp.evaluate(train_examples)
assert scores["cats_micro_f"] == 1.0 assert scores["cats_micro_f"] == 1.0
assert scores["cats_score"] == 1.0 assert scores["cats_score"] == 1.0
assert "cats_score_desc" in scores assert "cats_score_desc" in scores
@ -194,3 +198,29 @@ def test_textcat_configs(textcat_config):
for i in range(5): for i in range(5):
losses = {} losses = {}
nlp.update(train_examples, sgd=optimizer, losses=losses) nlp.update(train_examples, sgd=optimizer, losses=losses)
def test_positive_class():
nlp = English()
pipe_config = {"positive_label": "POS", "labels": ["POS", "NEG"]}
textcat = nlp.add_pipe("textcat", config=pipe_config)
assert textcat.labels == ("POS", "NEG")
verify_textcat_config(nlp, pipe_config)
def test_positive_class_not_present():
nlp = English()
pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING"]}
textcat = nlp.add_pipe("textcat", config=pipe_config)
assert textcat.labels == ("SOME", "THING")
with pytest.raises(ValueError):
verify_textcat_config(nlp, pipe_config)
def test_positive_class_not_binary():
nlp = English()
pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING", "POS"]}
textcat = nlp.add_pipe("textcat", config=pipe_config)
assert textcat.labels == ("SOME", "THING", "POS")
with pytest.raises(ValueError):
verify_textcat_config(nlp, pipe_config)

View File

@ -136,7 +136,7 @@ def test_serialize_textcat_empty(en_vocab):
# See issue #1105 # See issue #1105
cfg = {"model": DEFAULT_TEXTCAT_MODEL} cfg = {"model": DEFAULT_TEXTCAT_MODEL}
model = registry.make_from_config(cfg, validate=True)["model"] model = registry.make_from_config(cfg, validate=True)["model"]
textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5) textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5, positive_label=None)
textcat.to_bytes(exclude=["vocab"]) textcat.to_bytes(exclude=["vocab"])

View File

@ -630,3 +630,49 @@ In addition to the native markdown elements, you can use the components
├── gatsby-node.js # Node-specific hooks for Gatsby ├── gatsby-node.js # Node-specific hooks for Gatsby
└── package.json # package settings and dependencies └── package.json # package settings and dependencies
``` ```
## Editorial {#editorial}
- "spaCy" should always be spelled with a lowercase "s" and a capital "C",
unless it specifically refers to the Python package or Python import `spacy`
(in which case it should be formatted as code).
- ✅ spaCy is a library for advanced NLP in Python.
- ❌ Spacy is a library for advanced NLP in Python.
- ✅ First, you need to install the `spacy` package from pip.
- Mentions of code, like function names, classes, variable names etc. in inline
text should be formatted as `code`.
- ✅ "Calling the `nlp` object on a text returns a `Doc`."
- Objects that have pages in the [API docs](/api) should be linked for
example, [`Doc`](/api/doc) or [`Language.to_disk`](/api/language#to_disk). The
mentions should still be formatted as code within the link. Links pointing to
the API docs will automatically receive a little icon. However, if a paragraph
includes many references to the API, the links can easily get messy. In that
case, we typically only link the first mention of an object and not any
subsequent ones.
- ✅ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
[`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a `Doc` object
from a `Span`.
- ❌ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
[`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a
[`Doc`](/api/doc) object from a [`Span`](/api/span).
* Other things we format as code are: references to trained pipeline packages
like `en_core_web_sm` or file names like `code.py` or `meta.json`.
- ✅ After training, the `config.cfg` is saved to disk.
* [Type annotations](#type-annotations) are a special type of code formatting,
expressed by wrapping the text in `~~` instead of backticks. The result looks
like this: ~~List[Doc]~~. All references to known types will be linked
automatically.
- ✅ The model has the input type ~~List[Doc]~~ and it outputs a
~~List[Array2d]~~.
* We try to keep links meaningful but short.
- ✅ For details, see the usage guide on
[training with custom code](/usage/training#custom-code).
- ❌ For details, see
[the usage guide on training with custom code](/usage/training#custom-code).
- ❌ For details, see the usage guide on training with custom code
[here](/usage/training#custom-code).

View File

@ -183,7 +183,7 @@ will be overwritten.
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `match_id` | An ID for the patterns. ~~str~~ | | `match_id` | An ID for the patterns. ~~str~~ |
| `patterns` | A list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. ~~List[List[Dict[str, Union[str, Dict]]]]~~ | | `patterns` | A list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. ~~List[List[Dict[str, Union[str, Dict]]]]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[DependencyMatcher, Doc, int, List[Tuple], Any]]~~ | | `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[DependencyMatcher, Doc, int, List[Tuple], Any]]~~ |
## DependencyMatcher.get {#get tag="method"} ## DependencyMatcher.get {#get tag="method"}

View File

@ -217,7 +217,7 @@ model. Delegates to [`predict`](/api/dependencyparser#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |

View File

@ -85,7 +85,7 @@ providing custom registered functions.
| `vocab` | The shared vocabulary. ~~Vocab~~ | | `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~ | | `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ | | `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ | | `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ |
| `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ | | `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ |
| `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~ | | `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~ |
@ -218,7 +218,7 @@ pipe's entity linking model and context encoder. Delegates to
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |

View File

@ -206,7 +206,7 @@ model. Delegates to [`predict`](/api/entityrecognizer#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |

View File

@ -255,7 +255,7 @@ Get all patterns that were added to the entity ruler.
| Name | Description | | Name | Description |
| ----------------- | --------------------------------------------------------------------------------------------------------------------- | | ----------------- | --------------------------------------------------------------------------------------------------------------------- |
| `matcher` | The underlying matcher used to process token patterns. ~~Matcher~~ | | | `matcher` | The underlying matcher used to process token patterns. ~~Matcher~~ |
| `phrase_matcher` | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~ | | `phrase_matcher` | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~ |
| `token_patterns` | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ | | `token_patterns` | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ |
| `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~ | | `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~ |

View File

@ -81,7 +81,7 @@ shortcut for this and instantiate the component using its string name and
| `vocab` | The shared vocabulary. ~~Vocab~~ | | `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | **Not yet implemented:** The model to use. ~~Model~~ | | `model` | **Not yet implemented:** The model to use. ~~Model~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ | | `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| mode | The lemmatizer mode, e.g. `"lookup"` or `"rule"`. Defaults to `"lookup"`. ~~str~~ | | mode | The lemmatizer mode, e.g. `"lookup"` or `"rule"`. Defaults to `"lookup"`. ~~str~~ |
| lookups | A lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. Defaults to `None`. ~~Optional[Lookups]~~ | | lookups | A lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. Defaults to `None`. ~~Optional[Lookups]~~ |
| overwrite | Whether to overwrite existing lemmas. ~~bool~ | | overwrite | Whether to overwrite existing lemmas. ~~bool~ |

View File

@ -139,7 +139,7 @@ setting up the label scheme based on the data.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ | | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ | | `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| **RETURNS** | The optimizer. ~~Optimizer~~ | | **RETURNS** | The optimizer. ~~Optimizer~~ |
@ -196,7 +196,7 @@ Delegates to [`predict`](/api/morphologizer#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |

View File

@ -150,9 +150,9 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
| Name | Description | | Name | Description |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `match_id` | str | An ID for the thing you're matching. ~~str~~ | | `match_id` | An ID for the thing you're matching. ~~str~~ | |
| `docs` | `Doc` objects of the phrases to match. ~~List[Doc]~~ | | `docs` | `Doc` objects of the phrases to match. ~~List[Doc]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ | | `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
## PhraseMatcher.remove {#remove tag="method" new="2.2"} ## PhraseMatcher.remove {#remove tag="method" new="2.2"}

View File

@ -187,7 +187,7 @@ predictions and gold-standard annotations, and update the component's model.
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
@ -211,7 +211,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ | | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | | `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |

View File

@ -192,7 +192,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
@ -216,7 +216,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ | | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | | `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |

View File

@ -53,7 +53,7 @@ Initialize the sentencizer.
| Name | Description | | Name | Description |
| -------------- | ----------------------------------------------------------------------------------------------------------------------- | | -------------- | ----------------------------------------------------------------------------------------------------------------------- |
| _keyword-only_ | | | | _keyword-only_ | |
| `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults. ~~Optional[List[str]]~~ | | `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults. ~~Optional[List[str]]~~ |
```python ```python

View File

@ -190,7 +190,7 @@ Delegates to [`predict`](/api/tagger#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
@ -214,7 +214,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ | | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | | `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |

View File

@ -36,11 +36,12 @@ architectures and their arguments and hyperparameters.
> nlp.add_pipe("textcat", config=config) > nlp.add_pipe("textcat", config=config)
> ``` > ```
| Setting | Description | | Setting | Description |
| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~ | | `labels` | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~ |
| `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~ | | `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~ |
| `model` | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ | | `positive_label` | The positive label for a binary task with exclusive classes, None otherwise and by default. ~~Optional[str]~~ |
| `model` | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ |
```python ```python
%%GITHUB_SPACY/spacy/pipeline/textcat.py %%GITHUB_SPACY/spacy/pipeline/textcat.py
@ -60,21 +61,22 @@ architectures and their arguments and hyperparameters.
> >
> # Construction from class > # Construction from class
> from spacy.pipeline import TextCategorizer > from spacy.pipeline import TextCategorizer
> textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5) > textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5, positive_label="POS")
> ``` > ```
Create a new pipeline instance. In your application, you would normally use a Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and shortcut for this and instantiate the component using its string name and
[`nlp.add_pipe`](/api/language#create_pipe). [`nlp.add_pipe`](/api/language#create_pipe).
| Name | Description | | Name | Description |
| -------------- | -------------------------------------------------------------------------------------------------------------------------- | | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `vocab` | The shared vocabulary. ~~Vocab~~ | | `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ | | `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ | | `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
| _keyword-only_ | | | _keyword-only_ | |
| `labels` | The labels to use. ~~Iterable[str]~~ | | `labels` | The labels to use. ~~Iterable[str]~~ |
| `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~ | | `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~ |
| `positive_label` | The positive label for a binary task with exclusive classes, None otherwise. ~~Optional[str]~~ |
## TextCategorizer.\_\_call\_\_ {#call tag="method"} ## TextCategorizer.\_\_call\_\_ {#call tag="method"}
@ -201,7 +203,7 @@ Delegates to [`predict`](/api/textcategorizer#predict) and
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
@ -225,7 +227,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ | | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | | `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
@ -263,7 +265,7 @@ Score a batch of examples.
| Name | Description | | Name | Description |
| ---------------- | -------------------------------------------------------------------------------------------------------------------- | | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
| `examples` | The examples to score. ~~Iterable[Example]~~ | | `examples` | The examples to score. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `positive_label` | Optional positive label. ~~Optional[str]~~ | | `positive_label` | Optional positive label. ~~Optional[str]~~ |
| **RETURNS** | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ | | **RETURNS** | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ |

View File

@ -144,7 +144,7 @@ setting up the label scheme based on the data.
| Name | Description | | Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ | | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ | | `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| **RETURNS** | The optimizer. ~~Optimizer~~ | | **RETURNS** | The optimizer. ~~Optimizer~~ |
@ -200,7 +200,7 @@ Delegates to [`predict`](/api/tok2vec#predict).
| Name | Description | | Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ | | `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | | | | _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ | | `drop` | The dropout rate. ~~float~~ |
| `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ | | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | | `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |

View File

@ -11,6 +11,7 @@ menu:
- ['Setup & Installation', 'setup'] - ['Setup & Installation', 'setup']
- ['Markdown Reference', 'markdown'] - ['Markdown Reference', 'markdown']
- ['Project Structure', 'structure'] - ['Project Structure', 'structure']
- ['Editorial', 'editorial']
sidebar: sidebar:
- label: Styleguide - label: Styleguide
items: items:

View File

@ -27,6 +27,7 @@ const Quickstart = ({
hidePrompts, hidePrompts,
small, small,
codeLang, codeLang,
Container = Section,
children, children,
}) => { }) => {
const contentRef = useRef() const contentRef = useRef()
@ -83,7 +84,7 @@ const Quickstart = ({
}, [data, initialized]) }, [data, initialized])
return !data.length ? null : ( return !data.length ? null : (
<Section id={id}> <Container id={id}>
<div className={classNames(classes.root, { [classes.hidePrompts]: !!hidePrompts })}> <div className={classNames(classes.root, { [classes.hidePrompts]: !!hidePrompts })}>
{title && ( {title && (
<H2 className={classes.title} name={id}> <H2 className={classes.title} name={id}>
@ -249,7 +250,7 @@ const Quickstart = ({
</pre> </pre>
{showCopy && <textarea ref={copyAreaRef} className={classes.copyArea} rows={1} />} {showCopy && <textarea ref={copyAreaRef} className={classes.copyArea} rows={1} />}
</div> </div>
</Section> </Container>
) )
} }

View File

@ -41,3 +41,7 @@
&:before &:before
content: "" content: ""
.ul .ul &
text-indent: initial
margin-left: -20px

View File

@ -87,6 +87,8 @@ export default function QuickstartTraining({ id, title, download = 'base_config.
.sort((a, b) => a.title.localeCompare(b.title)) .sort((a, b) => a.title.localeCompare(b.title))
return ( return (
<Quickstart <Quickstart
id="quickstart-widget"
Container="div"
download={download} download={download}
rawContent={content} rawContent={content}
data={DATA} data={DATA}