Proofreading

Proofread some API docs
This commit is contained in:
walterhenry 2020-09-24 13:15:28 +02:00
parent aaf01689a1
commit 3dd5f409ec
18 changed files with 39 additions and 42 deletions

View File

@ -143,7 +143,7 @@ argument that connects to the shared `tok2vec` component in the pipeline.
Construct an embedding layer that separately embeds a number of lexical
attributes using hash embedding, concatenates the results, and passes it through
a feed-forward subnetwork to build a mixed representations. The features used
a feed-forward subnetwork to build mixed representations. The features used
are the `NORM`, `PREFIX`, `SUFFIX` and `SHAPE`, which can have varying
definitions depending on the `Vocab` of the `Doc` object passed in. Vectors from
pretrained static vectors can also be incorporated into the concatenated
@ -170,7 +170,7 @@ representation.
> nC = 8
> ```
Construct an embedded representations based on character embeddings, using a
Construct an embedded representation based on character embeddings, using a
feed-forward network. A fixed number of UTF-8 byte characters are used for each
word, taken from the beginning and end of the word equally. Padding is used in
the center for words that are too short.
@ -392,7 +392,7 @@ a single token vector given zero or more wordpiece vectors.
> ```
Use a transformer as a [`Tok2Vec`](/api/tok2vec) layer directly. This does
**not** allow multiple components to share the transformer weights, and does
**not** allow multiple components to share the transformer weights and does
**not** allow the transformer to set annotations into the [`Doc`](/api/doc)
object, but it's a **simpler solution** if you only need the transformer within
one component.
@ -436,7 +436,7 @@ might find [this tutorial](https://explosion.ai/blog/parsing-english-in-python)
helpful for background information. The neural network state prediction model
consists of either two or three subnetworks:
- **tok2vec**: Map each token into a vector representations. This subnetwork is
- **tok2vec**: Map each token into a vector representation. This subnetwork is
run once for each batch.
- **lower**: Construct a feature-specific vector for each `(token, feature)`
pair. This is also run once for each batch. Constructing the state
@ -573,14 +573,14 @@ architecture is usually less accurate than the ensemble, but runs faster.
> nO = null
> ```
An ngram "bag-of-words" model. This architecture should run much faster than the
An n-gram "bag-of-words" model. This architecture should run much faster than the
others, but may not be as accurate, especially if texts are short.
| Name | Description |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `exclusive_classes` | Whether or not categories are mutually exclusive. ~~bool~~ |
| `ngram_size` | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. ~~int~~ |
| `no_output_layer` | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes` is `True`, else `Logistic`. ~~bool~~ |
| `no_output_layer` | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes` is `True`, else `Logistic`). ~~bool~~ |
| `nO` | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. ~~Optional[int]~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
@ -594,7 +594,7 @@ into the "real world". This requires 3 main components:
synonyms and prior probabilities.
- A candidate generation step to produce a set of likely identifiers, given a
certain textual mention.
- A Machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the
- A machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the
most plausible ID from the set of candidates.
### spacy.EntityLinker.v1 {#EntityLinker}

View File

@ -71,7 +71,7 @@ pattern_dicts = [
## AttributeRuler.\_\_call\_\_ {#call tag="method"}
Apply the attribute ruler to a Doc, setting token attributes for tokens matched
Apply the attribute ruler to a `Doc`, setting token attributes for tokens matched
by the provided patterns.
| Name | Description |
@ -256,6 +256,6 @@ serialization by passing in the string names via the `exclude` argument.
| Name | Description |
| ---------- | -------------------------------------------------------------- |
| `vocab` | The shared [`Vocab`](/api/vocab). |
| `patterns` | The Matcher patterns. You usually don't want to exclude this. |
| `patterns` | The `Matcher` patterns. You usually don't want to exclude this. |
| `attrs` | The attributes to set. You usually don't want to exclude this. |
| `indices` | The token indices. You usually don't want to exclude this. |

View File

@ -81,7 +81,7 @@ $ python -m spacy info [model] [--markdown] [--silent]
Find all trained pipeline packages installed in the current environment and
check whether they are compatible with the currently installed version of spaCy.
Should be run after upgrading spaCy via `pip install -U spacy` to ensure that
all installed packages are can be used with the new version. It will show a list
all installed packages can be used with the new version. It will show a list
of packages and their installed versions. If any package is out of date, the
latest compatible versions and command for updating are shown.
@ -406,7 +406,7 @@ File /path/to/spacy/training/corpus.py (line 18)
### debug data {#debug-data tag="command"}
Analyze, debug, and validate your training and development data. Get useful
Analyze, debug and validate your training and development data. Get useful
stats, and find problems like invalid entity annotations, cyclic dependencies,
low data labels and more.

View File

@ -188,7 +188,7 @@ Typically, the extension for these binary files is `.spacy`, and they are used
as input format for specifying a [training corpus](/api/corpus) and for spaCy's
CLI [`train`](/api/cli#train) command. The built-in
[`convert`](/api/cli#convert) command helps you convert spaCy's previous
[JSON format](#json-input) to the new binary format format. It also supports
[JSON format](#json-input) to the new binary format. It also supports
conversion of the `.conllu` format used by the
[Universal Dependencies corpora](https://github.com/UniversalDependencies).
@ -252,7 +252,7 @@ $ python -m spacy convert ./data.json ./output.spacy
<Accordion title="Sample JSON data" spaced>
Here's an example of dependencies, part-of-speech tags and names entities, taken
Here's an example of dependencies, part-of-speech tags and named entities, taken
from the English Wall Street Journal portion of the Penn Treebank:
```json

View File

@ -21,8 +21,7 @@ non-projective parses.
The parser is trained using an **imitation learning objective**. It follows the
actions predicted by the current weights, and at each state, determines which
actions are compatible with the optimal parse that could be reached from the
current state. The weights such that the scores assigned to the set of optimal
actions is increased, while scores assigned to other actions are decreased. Note
current state. The weights are updated such that the scores assigned to the set of optimal actions is increased, while scores assigned to other actions are decreased. Note
that more than one action may be optimal for a given state.
## Config and implementation {#config}

View File

@ -445,7 +445,7 @@ Mark a span for merging. The `attrs` will be applied to the resulting token (if
they're context-dependent token attributes like `LEMMA` or `DEP`) or to the
underlying lexeme (if they're context-independent lexical attributes like
`LOWER` or `IS_STOP`). Writable custom extension attributes can be provided as a
dictionary mapping attribute names to values as the `"_"` key.
dictionary mapping attribute name to values as the `"_"` key.
> #### Example
>

View File

@ -94,7 +94,7 @@ providing custom registered functions.
## EntityLinker.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned.
Apply the pipe to one document. The document is modified in place and returned.
This usually happens under the hood when the `nlp` object is called on a text
and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/entitylinker#call) and [`pipe`](/api/entitylinker#pipe)

View File

@ -43,7 +43,7 @@ architectures and their arguments and hyperparameters.
| Setting | Description |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `moves` | A list of transition names. Inferred from the data if not provided. Defaults to `None`. ~~Optional[List[str]] |
| `moves` | A list of transition names. Inferred from the data if not provided. Defaults to `None`. ~~Optional[List[str]]~~ |
| `update_with_oracle_cut_size` | During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. The model is not very sensitive to this parameter, so you usually won't need to change it. Defaults to `100`. ~~int~~ |
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [TransitionBasedParser](/api/architectures#TransitionBasedParser). ~~Model[List[Doc], List[Floats2d]]~~ |
@ -83,7 +83,7 @@ shortcut for this and instantiate the component using its string name and
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
Apply the pipe to one document. The document is modified in place, and returned.
Apply the pipe to one document. The document is modified in place and returned.
This usually happens under the hood when the `nlp` object is called on a text
and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/entityrecognizer#call) and

View File

@ -256,6 +256,6 @@ Get all patterns that were added to the entity ruler.
| Name | Description |
| ----------------- | --------------------------------------------------------------------------------------------------------------------- |
| `matcher` | The underlying matcher used to process token patterns. ~~Matcher~~ |
| `phrase_matcher` | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~ |
| `phrase_matcher` | The underlying phrase matcher used to process phrase patterns. ~~PhraseMatcher~~ |
| `token_patterns` | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ |
| `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~ |

View File

@ -33,8 +33,8 @@ both documents.
| Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `predicted` | The document containing (partial) predictions. Can not be `None`. ~~Doc~~ |
| `reference` | The document containing gold-standard annotations. Can not be `None`. ~~Doc~~ |
| `predicted` | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~ |
| `reference` | The document containing gold-standard annotations. Cannot be `None`. ~~Doc~~ |
| _keyword-only_ | |
| `alignment` | An object holding the alignment between the tokens of the `predicted` and `reference` documents. ~~Optional[Alignment]~~ |
@ -58,8 +58,8 @@ see the [training format documentation](/api/data-formats#dict-input).
| Name | Description |
| -------------- | ------------------------------------------------------------------------- |
| `predicted` | The document containing (partial) predictions. Can not be `None`. ~~Doc~~ |
| `example_dict` | `Dict[str, obj]` | The gold-standard annotations as a dictionary. Can not be `None`. ~~Dict[str, Any]~~ |
| `predicted` | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~ |
| `example_dict` | `Dict[str, obj]` | The gold-standard annotations as a dictionary. Cannot be `None`. ~~Dict[str, Any]~~ |
| **RETURNS** | The newly constructed object. ~~Example~~ |
## Example.text {#text tag="property"}

View File

@ -46,9 +46,7 @@ information in [`Language.meta`](/api/language#meta) and not to configure the
## Language.from_config {#from_config tag="classmethod" new="3"}
Create a `Language` object from a loaded config. Will set up the tokenizer and
language data, add pipeline components based on the pipeline and components
define in the config and validate the results. If no config is provided, the
default config of the given language is used. This is also how spaCy loads a
language data, add pipeline components based on the pipeline and add pipeline components based on the definitions specified in the config. If no config is provided, the default config of the given language is used. This is also how spaCy loads a
model under the hood based on its [`config.cfg`](/api/data-formats#config).
> #### Example
@ -107,7 +105,7 @@ decorator. For more details and examples, see the
| `assigns` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
| `requires` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
| `func` | Optional function if not used a a decorator. ~~Optional[Callable[[Doc], Doc]]~~ |
| `func` | Optional function if not used as a decorator. ~~Optional[Callable[[Doc], Doc]]~~ |
## Language.factory {#factory tag="classmethod"}
@ -155,7 +153,7 @@ examples, see the
| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
| `scores` | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. ~~Dict[str, float]~~ |
| `func` | Optional function if not used a a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~ |
| `func` | Optional function if not used as a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~ |
## Language.\_\_call\_\_ {#call tag="method"}
@ -602,7 +600,7 @@ does nothing.
## Language.enable_pipe {#enable_pipe tag="method" new="3"}
Enable a previously disable component (e.g. via
Enable a previously disabled component (e.g. via
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
already enabled, this method does nothing.
@ -629,7 +627,7 @@ pipeline will be restored to the initial state at the end of the block.
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
you can use to undo your changes. You can specify either `disable` (as a list or
string), or `enable`. In the latter case, all components not in the `enable`
list, will be disabled. Under the hood, this method calls into
list will be disabled. Under the hood, this method calls into
[`disable_pipe`](/api/language#disable_pipe) and
[`enable_pipe`](/api/language#enable_pipe).
@ -662,7 +660,7 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
| -------------- | ------------------------------------------------------------------------------------------------------ |
| _keyword-only_ | |
| `disable` | Name(s) of pipeline components to disable. ~~Optional[Union[str, Iterable[str]]]~~ |
| `enable` | Names(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
| `enable` | Name(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
| **RETURNS** | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |
## Language.get_factory_meta {#get_factory_meta tag="classmethod" new="3"}
@ -874,7 +872,7 @@ Loads state from a directory, including all data that was saved with the
<Infobox variant="warning" title="Important note">
Keep in mind that this method **only loads serialized state** and doesn't set up
Keep in mind that this method **only loads the serialized state** and doesn't set up
the `nlp` object. This means that it requires the correct language class to be
initialized and all pipeline components to be added to the pipeline. If you want
to load a serialized pipeline from a directory, you should use

View File

@ -38,7 +38,7 @@ The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
`config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
[`config.cfg` for training](/usage/training#config). For examples of the lookups
data formats used by the lookup and rule-based lemmatizers, see
data format used by the lookup and rule-based lemmatizers, see
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
> #### Example

View File

@ -61,7 +61,7 @@ matched:
| `!` | Negate the pattern, by requiring it to match exactly 0 times. |
| `?` | Make the pattern optional, by allowing it to match 0 or 1 times. |
| `+` | Require the pattern to match 1 or more times. |
| `*` | Allow the pattern to match zero or more times. |
| `*` | Allow the pattern to match 0 or more times. |
Token patterns can also map to a **dictionary of properties** instead of a
single value to indicate whether the expected value is a member of a list or how

View File

@ -12,7 +12,7 @@ container storing a single morphological analysis.
## Morphology.\_\_init\_\_ {#init tag="method"}
Create a Morphology object.
Create a `Morphology` object.
> #### Example
>
@ -101,7 +101,7 @@ representation.
| Name | Description |
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `feats_dict` | The morphological features as a dictionary. ~~Dict[str, str]~~ |
| **RETURNS** | The morphological features as in Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. ~~str~~ |
| **RETURNS** | The morphological features in Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. ~~str~~ |
## Attributes {#attributes}

View File

@ -26,7 +26,7 @@ Merge noun chunks into a single token. Also available via the string name
<Infobox variant="warning">
Since noun chunks require part-of-speech tags and the dependency parse, make
Since noun chunks require part-of-speech tags and the dependency parser, make
sure to add this component _after_ the `"tagger"` and `"parser"` components. By
default, `nlp.add_pipe` will add components to the end of the pipeline and after
all other components.

View File

@ -202,7 +202,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
## SentenceRecognizer.rehearse {#rehearse tag="method,experimental" new="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model, to try to address
current model to make predictions similar to an initial model to try to address
the "catastrophic forgetting" problem. This feature is experimental.
> #### Example

View File

@ -8,7 +8,7 @@ api_string_name: sentencizer
api_trainable: false
---
A simple pipeline component, to allow custom sentence boundary detection logic
A simple pipeline component to allow custom sentence boundary detection logic
that doesn't require the dependency parse. By default, sentence segmentation is
performed by the [`DependencyParser`](/api/dependencyparser), so the
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
@ -130,7 +130,7 @@ Score a batch of examples.
## Sentencizer.to_disk {#to_disk tag="method"}
Save the sentencizer settings (punctuation characters) a directory. Will create
Save the sentencizer settings (punctuation characters) to a directory. Will create
a file `sentencizer.json`. This also happens automatically when you save an
`nlp` object with a sentencizer added to its pipeline.

View File

@ -8,7 +8,7 @@ A slice from a [`Doc`](/api/doc) object.
## Span.\_\_init\_\_ {#init tag="method"}
Create a Span object from the slice `doc[start : end]`.
Create a `Span` object from the slice `doc[start : end]`.
> #### Example
>