mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 17:24:41 +03:00
Proofreading
Proofread some API docs
This commit is contained in:
parent
aaf01689a1
commit
3dd5f409ec
|
@ -143,7 +143,7 @@ argument that connects to the shared `tok2vec` component in the pipeline.
|
|||
|
||||
Construct an embedding layer that separately embeds a number of lexical
|
||||
attributes using hash embedding, concatenates the results, and passes it through
|
||||
a feed-forward subnetwork to build a mixed representations. The features used
|
||||
a feed-forward subnetwork to build mixed representations. The features used
|
||||
are the `NORM`, `PREFIX`, `SUFFIX` and `SHAPE`, which can have varying
|
||||
definitions depending on the `Vocab` of the `Doc` object passed in. Vectors from
|
||||
pretrained static vectors can also be incorporated into the concatenated
|
||||
|
@ -170,7 +170,7 @@ representation.
|
|||
> nC = 8
|
||||
> ```
|
||||
|
||||
Construct an embedded representations based on character embeddings, using a
|
||||
Construct an embedded representation based on character embeddings, using a
|
||||
feed-forward network. A fixed number of UTF-8 byte characters are used for each
|
||||
word, taken from the beginning and end of the word equally. Padding is used in
|
||||
the center for words that are too short.
|
||||
|
@ -392,7 +392,7 @@ a single token vector given zero or more wordpiece vectors.
|
|||
> ```
|
||||
|
||||
Use a transformer as a [`Tok2Vec`](/api/tok2vec) layer directly. This does
|
||||
**not** allow multiple components to share the transformer weights, and does
|
||||
**not** allow multiple components to share the transformer weights and does
|
||||
**not** allow the transformer to set annotations into the [`Doc`](/api/doc)
|
||||
object, but it's a **simpler solution** if you only need the transformer within
|
||||
one component.
|
||||
|
@ -436,7 +436,7 @@ might find [this tutorial](https://explosion.ai/blog/parsing-english-in-python)
|
|||
helpful for background information. The neural network state prediction model
|
||||
consists of either two or three subnetworks:
|
||||
|
||||
- **tok2vec**: Map each token into a vector representations. This subnetwork is
|
||||
- **tok2vec**: Map each token into a vector representation. This subnetwork is
|
||||
run once for each batch.
|
||||
- **lower**: Construct a feature-specific vector for each `(token, feature)`
|
||||
pair. This is also run once for each batch. Constructing the state
|
||||
|
@ -573,14 +573,14 @@ architecture is usually less accurate than the ensemble, but runs faster.
|
|||
> nO = null
|
||||
> ```
|
||||
|
||||
An ngram "bag-of-words" model. This architecture should run much faster than the
|
||||
An n-gram "bag-of-words" model. This architecture should run much faster than the
|
||||
others, but may not be as accurate, especially if texts are short.
|
||||
|
||||
| Name | Description |
|
||||
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `exclusive_classes` | Whether or not categories are mutually exclusive. ~~bool~~ |
|
||||
| `ngram_size` | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. ~~int~~ |
|
||||
| `no_output_layer` | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes` is `True`, else `Logistic`. ~~bool~~ |
|
||||
| `no_output_layer` | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes` is `True`, else `Logistic`). ~~bool~~ |
|
||||
| `nO` | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. ~~Optional[int]~~ |
|
||||
| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
|
||||
|
||||
|
@ -594,7 +594,7 @@ into the "real world". This requires 3 main components:
|
|||
synonyms and prior probabilities.
|
||||
- A candidate generation step to produce a set of likely identifiers, given a
|
||||
certain textual mention.
|
||||
- A Machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the
|
||||
- A machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the
|
||||
most plausible ID from the set of candidates.
|
||||
|
||||
### spacy.EntityLinker.v1 {#EntityLinker}
|
||||
|
|
|
@ -71,7 +71,7 @@ pattern_dicts = [
|
|||
|
||||
## AttributeRuler.\_\_call\_\_ {#call tag="method"}
|
||||
|
||||
Apply the attribute ruler to a Doc, setting token attributes for tokens matched
|
||||
Apply the attribute ruler to a `Doc`, setting token attributes for tokens matched
|
||||
by the provided patterns.
|
||||
|
||||
| Name | Description |
|
||||
|
@ -256,6 +256,6 @@ serialization by passing in the string names via the `exclude` argument.
|
|||
| Name | Description |
|
||||
| ---------- | -------------------------------------------------------------- |
|
||||
| `vocab` | The shared [`Vocab`](/api/vocab). |
|
||||
| `patterns` | The Matcher patterns. You usually don't want to exclude this. |
|
||||
| `patterns` | The `Matcher` patterns. You usually don't want to exclude this. |
|
||||
| `attrs` | The attributes to set. You usually don't want to exclude this. |
|
||||
| `indices` | The token indices. You usually don't want to exclude this. |
|
||||
|
|
|
@ -81,7 +81,7 @@ $ python -m spacy info [model] [--markdown] [--silent]
|
|||
Find all trained pipeline packages installed in the current environment and
|
||||
check whether they are compatible with the currently installed version of spaCy.
|
||||
Should be run after upgrading spaCy via `pip install -U spacy` to ensure that
|
||||
all installed packages are can be used with the new version. It will show a list
|
||||
all installed packages can be used with the new version. It will show a list
|
||||
of packages and their installed versions. If any package is out of date, the
|
||||
latest compatible versions and command for updating are shown.
|
||||
|
||||
|
@ -406,7 +406,7 @@ File /path/to/spacy/training/corpus.py (line 18)
|
|||
|
||||
### debug data {#debug-data tag="command"}
|
||||
|
||||
Analyze, debug, and validate your training and development data. Get useful
|
||||
Analyze, debug and validate your training and development data. Get useful
|
||||
stats, and find problems like invalid entity annotations, cyclic dependencies,
|
||||
low data labels and more.
|
||||
|
||||
|
|
|
@ -188,7 +188,7 @@ Typically, the extension for these binary files is `.spacy`, and they are used
|
|||
as input format for specifying a [training corpus](/api/corpus) and for spaCy's
|
||||
CLI [`train`](/api/cli#train) command. The built-in
|
||||
[`convert`](/api/cli#convert) command helps you convert spaCy's previous
|
||||
[JSON format](#json-input) to the new binary format format. It also supports
|
||||
[JSON format](#json-input) to the new binary format. It also supports
|
||||
conversion of the `.conllu` format used by the
|
||||
[Universal Dependencies corpora](https://github.com/UniversalDependencies).
|
||||
|
||||
|
@ -252,7 +252,7 @@ $ python -m spacy convert ./data.json ./output.spacy
|
|||
|
||||
<Accordion title="Sample JSON data" spaced>
|
||||
|
||||
Here's an example of dependencies, part-of-speech tags and names entities, taken
|
||||
Here's an example of dependencies, part-of-speech tags and named entities, taken
|
||||
from the English Wall Street Journal portion of the Penn Treebank:
|
||||
|
||||
```json
|
||||
|
|
|
@ -21,8 +21,7 @@ non-projective parses.
|
|||
The parser is trained using an **imitation learning objective**. It follows the
|
||||
actions predicted by the current weights, and at each state, determines which
|
||||
actions are compatible with the optimal parse that could be reached from the
|
||||
current state. The weights such that the scores assigned to the set of optimal
|
||||
actions is increased, while scores assigned to other actions are decreased. Note
|
||||
current state. The weights are updated such that the scores assigned to the set of optimal actions is increased, while scores assigned to other actions are decreased. Note
|
||||
that more than one action may be optimal for a given state.
|
||||
|
||||
## Config and implementation {#config}
|
||||
|
|
|
@ -445,7 +445,7 @@ Mark a span for merging. The `attrs` will be applied to the resulting token (if
|
|||
they're context-dependent token attributes like `LEMMA` or `DEP`) or to the
|
||||
underlying lexeme (if they're context-independent lexical attributes like
|
||||
`LOWER` or `IS_STOP`). Writable custom extension attributes can be provided as a
|
||||
dictionary mapping attribute names to values as the `"_"` key.
|
||||
dictionary mapping attribute name to values as the `"_"` key.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
|
|
@ -94,7 +94,7 @@ providing custom registered functions.
|
|||
|
||||
## EntityLinker.\_\_call\_\_ {#call tag="method"}
|
||||
|
||||
Apply the pipe to one document. The document is modified in place, and returned.
|
||||
Apply the pipe to one document. The document is modified in place and returned.
|
||||
This usually happens under the hood when the `nlp` object is called on a text
|
||||
and all pipeline components are applied to the `Doc` in order. Both
|
||||
[`__call__`](/api/entitylinker#call) and [`pipe`](/api/entitylinker#pipe)
|
||||
|
|
|
@ -43,7 +43,7 @@ architectures and their arguments and hyperparameters.
|
|||
|
||||
| Setting | Description |
|
||||
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `moves` | A list of transition names. Inferred from the data if not provided. Defaults to `None`. ~~Optional[List[str]] |
|
||||
| `moves` | A list of transition names. Inferred from the data if not provided. Defaults to `None`. ~~Optional[List[str]]~~ |
|
||||
| `update_with_oracle_cut_size` | During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. The model is not very sensitive to this parameter, so you usually won't need to change it. Defaults to `100`. ~~int~~ |
|
||||
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [TransitionBasedParser](/api/architectures#TransitionBasedParser). ~~Model[List[Doc], List[Floats2d]]~~ |
|
||||
|
||||
|
@ -83,7 +83,7 @@ shortcut for this and instantiate the component using its string name and
|
|||
|
||||
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
|
||||
|
||||
Apply the pipe to one document. The document is modified in place, and returned.
|
||||
Apply the pipe to one document. The document is modified in place and returned.
|
||||
This usually happens under the hood when the `nlp` object is called on a text
|
||||
and all pipeline components are applied to the `Doc` in order. Both
|
||||
[`__call__`](/api/entityrecognizer#call) and
|
||||
|
|
|
@ -256,6 +256,6 @@ Get all patterns that were added to the entity ruler.
|
|||
| Name | Description |
|
||||
| ----------------- | --------------------------------------------------------------------------------------------------------------------- |
|
||||
| `matcher` | The underlying matcher used to process token patterns. ~~Matcher~~ |
|
||||
| `phrase_matcher` | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~ |
|
||||
| `phrase_matcher` | The underlying phrase matcher used to process phrase patterns. ~~PhraseMatcher~~ |
|
||||
| `token_patterns` | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ |
|
||||
| `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~ |
|
||||
|
|
|
@ -33,8 +33,8 @@ both documents.
|
|||
|
||||
| Name | Description |
|
||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `predicted` | The document containing (partial) predictions. Can not be `None`. ~~Doc~~ |
|
||||
| `reference` | The document containing gold-standard annotations. Can not be `None`. ~~Doc~~ |
|
||||
| `predicted` | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~ |
|
||||
| `reference` | The document containing gold-standard annotations. Cannot be `None`. ~~Doc~~ |
|
||||
| _keyword-only_ | |
|
||||
| `alignment` | An object holding the alignment between the tokens of the `predicted` and `reference` documents. ~~Optional[Alignment]~~ |
|
||||
|
||||
|
@ -58,8 +58,8 @@ see the [training format documentation](/api/data-formats#dict-input).
|
|||
|
||||
| Name | Description |
|
||||
| -------------- | ------------------------------------------------------------------------- |
|
||||
| `predicted` | The document containing (partial) predictions. Can not be `None`. ~~Doc~~ |
|
||||
| `example_dict` | `Dict[str, obj]` | The gold-standard annotations as a dictionary. Can not be `None`. ~~Dict[str, Any]~~ |
|
||||
| `predicted` | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~ |
|
||||
| `example_dict` | `Dict[str, obj]` | The gold-standard annotations as a dictionary. Cannot be `None`. ~~Dict[str, Any]~~ |
|
||||
| **RETURNS** | The newly constructed object. ~~Example~~ |
|
||||
|
||||
## Example.text {#text tag="property"}
|
||||
|
|
|
@ -46,9 +46,7 @@ information in [`Language.meta`](/api/language#meta) and not to configure the
|
|||
## Language.from_config {#from_config tag="classmethod" new="3"}
|
||||
|
||||
Create a `Language` object from a loaded config. Will set up the tokenizer and
|
||||
language data, add pipeline components based on the pipeline and components
|
||||
define in the config and validate the results. If no config is provided, the
|
||||
default config of the given language is used. This is also how spaCy loads a
|
||||
language data, add pipeline components based on the pipeline and add pipeline components based on the definitions specified in the config. If no config is provided, the default config of the given language is used. This is also how spaCy loads a
|
||||
model under the hood based on its [`config.cfg`](/api/data-formats#config).
|
||||
|
||||
> #### Example
|
||||
|
@ -107,7 +105,7 @@ decorator. For more details and examples, see the
|
|||
| `assigns` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
|
||||
| `requires` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
|
||||
| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
|
||||
| `func` | Optional function if not used a a decorator. ~~Optional[Callable[[Doc], Doc]]~~ |
|
||||
| `func` | Optional function if not used as a decorator. ~~Optional[Callable[[Doc], Doc]]~~ |
|
||||
|
||||
## Language.factory {#factory tag="classmethod"}
|
||||
|
||||
|
@ -155,7 +153,7 @@ examples, see the
|
|||
| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
|
||||
| `scores` | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
|
||||
| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. ~~Dict[str, float]~~ |
|
||||
| `func` | Optional function if not used a a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~ |
|
||||
| `func` | Optional function if not used as a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~ |
|
||||
|
||||
## Language.\_\_call\_\_ {#call tag="method"}
|
||||
|
||||
|
@ -602,7 +600,7 @@ does nothing.
|
|||
|
||||
## Language.enable_pipe {#enable_pipe tag="method" new="3"}
|
||||
|
||||
Enable a previously disable component (e.g. via
|
||||
Enable a previously disabled component (e.g. via
|
||||
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
|
||||
the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
|
||||
already enabled, this method does nothing.
|
||||
|
@ -629,7 +627,7 @@ pipeline will be restored to the initial state at the end of the block.
|
|||
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
|
||||
you can use to undo your changes. You can specify either `disable` (as a list or
|
||||
string), or `enable`. In the latter case, all components not in the `enable`
|
||||
list, will be disabled. Under the hood, this method calls into
|
||||
list will be disabled. Under the hood, this method calls into
|
||||
[`disable_pipe`](/api/language#disable_pipe) and
|
||||
[`enable_pipe`](/api/language#enable_pipe).
|
||||
|
||||
|
@ -662,7 +660,7 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
|
|||
| -------------- | ------------------------------------------------------------------------------------------------------ |
|
||||
| _keyword-only_ | |
|
||||
| `disable` | Name(s) of pipeline components to disable. ~~Optional[Union[str, Iterable[str]]]~~ |
|
||||
| `enable` | Names(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
|
||||
| `enable` | Name(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
|
||||
| **RETURNS** | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |
|
||||
|
||||
## Language.get_factory_meta {#get_factory_meta tag="classmethod" new="3"}
|
||||
|
@ -874,7 +872,7 @@ Loads state from a directory, including all data that was saved with the
|
|||
|
||||
<Infobox variant="warning" title="Important note">
|
||||
|
||||
Keep in mind that this method **only loads serialized state** and doesn't set up
|
||||
Keep in mind that this method **only loads the serialized state** and doesn't set up
|
||||
the `nlp` object. This means that it requires the correct language class to be
|
||||
initialized and all pipeline components to be added to the pipeline. If you want
|
||||
to load a serialized pipeline from a directory, you should use
|
||||
|
|
|
@ -38,7 +38,7 @@ The default config is defined by the pipeline component factory and describes
|
|||
how the component should be configured. You can override its settings via the
|
||||
`config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
|
||||
[`config.cfg` for training](/usage/training#config). For examples of the lookups
|
||||
data formats used by the lookup and rule-based lemmatizers, see
|
||||
data format used by the lookup and rule-based lemmatizers, see
|
||||
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
|
||||
|
||||
> #### Example
|
||||
|
|
|
@ -61,7 +61,7 @@ matched:
|
|||
| `!` | Negate the pattern, by requiring it to match exactly 0 times. |
|
||||
| `?` | Make the pattern optional, by allowing it to match 0 or 1 times. |
|
||||
| `+` | Require the pattern to match 1 or more times. |
|
||||
| `*` | Allow the pattern to match zero or more times. |
|
||||
| `*` | Allow the pattern to match 0 or more times. |
|
||||
|
||||
Token patterns can also map to a **dictionary of properties** instead of a
|
||||
single value to indicate whether the expected value is a member of a list or how
|
||||
|
|
|
@ -12,7 +12,7 @@ container storing a single morphological analysis.
|
|||
|
||||
## Morphology.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
Create a Morphology object.
|
||||
Create a `Morphology` object.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
@ -101,7 +101,7 @@ representation.
|
|||
| Name | Description |
|
||||
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `feats_dict` | The morphological features as a dictionary. ~~Dict[str, str]~~ |
|
||||
| **RETURNS** | The morphological features as in Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. ~~str~~ |
|
||||
| **RETURNS** | The morphological features in Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. ~~str~~ |
|
||||
|
||||
## Attributes {#attributes}
|
||||
|
||||
|
|
|
@ -26,7 +26,7 @@ Merge noun chunks into a single token. Also available via the string name
|
|||
|
||||
<Infobox variant="warning">
|
||||
|
||||
Since noun chunks require part-of-speech tags and the dependency parse, make
|
||||
Since noun chunks require part-of-speech tags and the dependency parser, make
|
||||
sure to add this component _after_ the `"tagger"` and `"parser"` components. By
|
||||
default, `nlp.add_pipe` will add components to the end of the pipeline and after
|
||||
all other components.
|
||||
|
|
|
@ -202,7 +202,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
|
|||
## SentenceRecognizer.rehearse {#rehearse tag="method,experimental" new="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model, to try to address
|
||||
current model to make predictions similar to an initial model to try to address
|
||||
the "catastrophic forgetting" problem. This feature is experimental.
|
||||
|
||||
> #### Example
|
||||
|
|
|
@ -8,7 +8,7 @@ api_string_name: sentencizer
|
|||
api_trainable: false
|
||||
---
|
||||
|
||||
A simple pipeline component, to allow custom sentence boundary detection logic
|
||||
A simple pipeline component to allow custom sentence boundary detection logic
|
||||
that doesn't require the dependency parse. By default, sentence segmentation is
|
||||
performed by the [`DependencyParser`](/api/dependencyparser), so the
|
||||
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
|
||||
|
@ -130,7 +130,7 @@ Score a batch of examples.
|
|||
|
||||
## Sentencizer.to_disk {#to_disk tag="method"}
|
||||
|
||||
Save the sentencizer settings (punctuation characters) a directory. Will create
|
||||
Save the sentencizer settings (punctuation characters) to a directory. Will create
|
||||
a file `sentencizer.json`. This also happens automatically when you save an
|
||||
`nlp` object with a sentencizer added to its pipeline.
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ A slice from a [`Doc`](/api/doc) object.
|
|||
|
||||
## Span.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
Create a Span object from the slice `doc[start : end]`.
|
||||
Create a `Span` object from the slice `doc[start : end]`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
|
Loading…
Reference in New Issue
Block a user