mirror of
https://github.com/explosion/spaCy.git
synced 2025-08-03 20:00:21 +03:00
prettier formatting
This commit is contained in:
parent
d314bbb0f4
commit
1f22e197f5
|
@ -45,33 +45,33 @@ For attributes that represent string values, the internal integer ID is accessed
|
||||||
as `Token.attr`, e.g. `token.dep`, while the string value can be retrieved by
|
as `Token.attr`, e.g. `token.dep`, while the string value can be retrieved by
|
||||||
appending `_` as in `token.dep_`.
|
appending `_` as in `token.dep_`.
|
||||||
|
|
||||||
| Attribute | Description |
|
| Attribute | Description |
|
||||||
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `DEP` | The token's dependency label. ~~str~~ |
|
| `DEP` | The token's dependency label. ~~str~~ |
|
||||||
| `ENT_ID` | The token's entity ID (`ent_id`). ~~str~~ |
|
| `ENT_ID` | The token's entity ID (`ent_id`). ~~str~~ |
|
||||||
| `ENT_IOB` | The IOB part of the token's entity tag. Uses custom integer values rather than the string store: unset is `0`, `I` is `1`, `O` is `2`, and `B` is `3`. ~~str~~ |
|
| `ENT_IOB` | The IOB part of the token's entity tag. Uses custom integer values rather than the string store: unset is `0`, `I` is `1`, `O` is `2`, and `B` is `3`. ~~str~~ |
|
||||||
| `ENT_KB_ID` | The token's entity knowledge base ID. ~~str~~ |
|
| `ENT_KB_ID` | The token's entity knowledge base ID. ~~str~~ |
|
||||||
| `ENT_TYPE` | The token's entity label. ~~str~~ |
|
| `ENT_TYPE` | The token's entity label. ~~str~~ |
|
||||||
| `IS_ALPHA` | Token text consists of alphabetic characters. ~~bool~~ |
|
| `IS_ALPHA` | Token text consists of alphabetic characters. ~~bool~~ |
|
||||||
| `IS_ASCII` | Token text consists of ASCII characters. ~~bool~~ |
|
| `IS_ASCII` | Token text consists of ASCII characters. ~~bool~~ |
|
||||||
| `IS_DIGIT` | Token text consists of digits. ~~bool~~ |
|
| `IS_DIGIT` | Token text consists of digits. ~~bool~~ |
|
||||||
| `IS_LOWER` | Token text is in lowercase. ~~bool~~ |
|
| `IS_LOWER` | Token text is in lowercase. ~~bool~~ |
|
||||||
| `IS_PUNCT` | Token is punctuation. ~~bool~~ |
|
| `IS_PUNCT` | Token is punctuation. ~~bool~~ |
|
||||||
| `IS_SPACE` | Token is whitespace. ~~bool~~ |
|
| `IS_SPACE` | Token is whitespace. ~~bool~~ |
|
||||||
| `IS_STOP` | Token is a stop word. ~~bool~~ |
|
| `IS_STOP` | Token is a stop word. ~~bool~~ |
|
||||||
| `IS_TITLE` | Token text is in titlecase. ~~bool~~ |
|
| `IS_TITLE` | Token text is in titlecase. ~~bool~~ |
|
||||||
| `IS_UPPER` | Token text is in uppercase. ~~bool~~ |
|
| `IS_UPPER` | Token text is in uppercase. ~~bool~~ |
|
||||||
| `LEMMA` | The token's lemma. ~~str~~ |
|
| `LEMMA` | The token's lemma. ~~str~~ |
|
||||||
| `LENGTH` | The length of the token text. ~~int~~ |
|
| `LENGTH` | The length of the token text. ~~int~~ |
|
||||||
| `LIKE_EMAIL` | Token text resembles an email address. ~~bool~~ |
|
| `LIKE_EMAIL` | Token text resembles an email address. ~~bool~~ |
|
||||||
| `LIKE_NUM` | Token text resembles a number. ~~bool~~ |
|
| `LIKE_NUM` | Token text resembles a number. ~~bool~~ |
|
||||||
| `LIKE_URL` | Token text resembles a URL. ~~bool~~ |
|
| `LIKE_URL` | Token text resembles a URL. ~~bool~~ |
|
||||||
| `LOWER` | The lowercase form of the token text. ~~str~~ |
|
| `LOWER` | The lowercase form of the token text. ~~str~~ |
|
||||||
| `MORPH` | The token's morphological analysis. ~~MorphAnalysis~~ |
|
| `MORPH` | The token's morphological analysis. ~~MorphAnalysis~~ |
|
||||||
| `NORM` | The normalized form of the token text. ~~str~~ |
|
| `NORM` | The normalized form of the token text. ~~str~~ |
|
||||||
| `ORTH` | The exact verbatim text of a token. ~~str~~ |
|
| `ORTH` | The exact verbatim text of a token. ~~str~~ |
|
||||||
| `POS` | The token's universal part of speech (UPOS). ~~str~~ |
|
| `POS` | The token's universal part of speech (UPOS). ~~str~~ |
|
||||||
| `SENT_START` | Token is start of sentence. ~~bool~~ |
|
| `SENT_START` | Token is start of sentence. ~~bool~~ |
|
||||||
| `SHAPE` | The token's shape. ~~str~~ |
|
| `SHAPE` | The token's shape. ~~str~~ |
|
||||||
| `SPACY` | Token has a trailing space. ~~bool~~ |
|
| `SPACY` | Token has a trailing space. ~~bool~~ |
|
||||||
| `TAG` | The token's fine-grained part of speech. ~~str~~ |
|
| `TAG` | The token's fine-grained part of speech. ~~str~~ |
|
||||||
|
|
|
@ -1650,10 +1650,10 @@ $ python -m spacy huggingface-hub push [whl_path] [--org] [--msg] [--verbose]
|
||||||
> $ python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl
|
> $ python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------- | ------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `whl_path` | The path to the `.whl` file packaged with [`spacy package`](https://spacy.io/api/cli#package). ~~Path(positional)~~ |
|
| `whl_path` | The path to the `.whl` file packaged with [`spacy package`](https://spacy.io/api/cli#package). ~~Path(positional)~~ |
|
||||||
| `--org`, `-o` | Optional name of organization to which the pipeline should be uploaded. ~~str (option)~~ |
|
| `--org`, `-o` | Optional name of organization to which the pipeline should be uploaded. ~~str (option)~~ |
|
||||||
| `--msg`, `-m` | Commit message to use for update. Defaults to `"Update spaCy pipeline"`. ~~str (option)~~ |
|
| `--msg`, `-m` | Commit message to use for update. Defaults to `"Update spaCy pipeline"`. ~~str (option)~~ |
|
||||||
| `--verbose`, `-V` | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~ |
|
| `--verbose`, `-V` | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~ |
|
||||||
| **UPLOADS** | The pipeline to the hub. |
|
| **UPLOADS** | The pipeline to the hub. |
|
||||||
|
|
|
@ -67,7 +67,7 @@ architectures and their arguments and hyperparameters.
|
||||||
| `generate_empty_kb` <Tag variant="new">3.5.1</Tag> | Function that generates an empty `KnowledgeBase` object. Defaults to [`spacy.EmptyKB.v2`](/api/architectures#EmptyKB), which generates an empty [`InMemoryLookupKB`](/api/inmemorylookupkb). ~~Callable[[Vocab, int], KnowledgeBase]~~ |
|
| `generate_empty_kb` <Tag variant="new">3.5.1</Tag> | Function that generates an empty `KnowledgeBase` object. Defaults to [`spacy.EmptyKB.v2`](/api/architectures#EmptyKB), which generates an empty [`InMemoryLookupKB`](/api/inmemorylookupkb). ~~Callable[[Vocab, int], KnowledgeBase]~~ |
|
||||||
| `overwrite` <Tag variant="new">3.2</Tag> | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~ |
|
| `overwrite` <Tag variant="new">3.2</Tag> | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~ |
|
||||||
| `scorer` <Tag variant="new">3.2</Tag> | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~ |
|
| `scorer` <Tag variant="new">3.2</Tag> | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~ |
|
||||||
| `threshold` <Tag variant="new">3.4</Tag> | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the threshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~ |
|
| `threshold` <Tag variant="new">3.4</Tag> | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the threshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~ |
|
||||||
|
|
||||||
```python
|
```python
|
||||||
%%GITHUB_SPACY/spacy/pipeline/entity_linker.py
|
%%GITHUB_SPACY/spacy/pipeline/entity_linker.py
|
||||||
|
@ -100,20 +100,20 @@ custom knowledge base, you should either call
|
||||||
[`set_kb`](/api/entitylinker#set_kb) or provide a `kb_loader` in the
|
[`set_kb`](/api/entitylinker#set_kb) or provide a `kb_loader` in the
|
||||||
[`initialize`](/api/entitylinker#initialize) call.
|
[`initialize`](/api/entitylinker#initialize) call.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | The shared vocabulary. ~~Vocab~~ |
|
| `vocab` | The shared vocabulary. ~~Vocab~~ |
|
||||||
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~ |
|
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~ |
|
||||||
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
|
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `entity_vector_length` | Size of encoding vectors in the KB. ~~int~~ |
|
| `entity_vector_length` | Size of encoding vectors in the KB. ~~int~~ |
|
||||||
| `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ |
|
| `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ |
|
||||||
| `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~ |
|
| `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~ |
|
||||||
| `n_sents` | The number of neighbouring sentences to take into account. ~~int~~ |
|
| `n_sents` | The number of neighbouring sentences to take into account. ~~int~~ |
|
||||||
| `incl_prior` | Whether or not to include prior probabilities from the KB in the model. ~~bool~~ |
|
| `incl_prior` | Whether or not to include prior probabilities from the KB in the model. ~~bool~~ |
|
||||||
| `incl_context` | Whether or not to include the local context in the model. ~~bool~~ |
|
| `incl_context` | Whether or not to include the local context in the model. ~~bool~~ |
|
||||||
| `overwrite` <Tag variant="new">3.2</Tag> | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~ |
|
| `overwrite` <Tag variant="new">3.2</Tag> | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~ |
|
||||||
| `scorer` <Tag variant="new">3.2</Tag> | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~ |
|
| `scorer` <Tag variant="new">3.2</Tag> | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~ |
|
||||||
| `threshold` <Tag variant="new">3.4</Tag> | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the threshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~ |
|
| `threshold` <Tag variant="new">3.4</Tag> | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the threshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~ |
|
||||||
|
|
||||||
## EntityLinker.\_\_call\_\_ {id="call",tag="method"}
|
## EntityLinker.\_\_call\_\_ {id="call",tag="method"}
|
||||||
|
|
|
@ -58,7 +58,7 @@ how the component should be configured. You can override its settings via the
|
||||||
| Setting | Description |
|
| Setting | Description |
|
||||||
| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
|
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
|
||||||
| `matcher_fuzzy_compare` <Tag variant="new">3.5</Tag> | The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. ~~Callable~~ |
|
| `matcher_fuzzy_compare` <Tag variant="new">3.5</Tag> | The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. ~~Callable~~ |
|
||||||
| `validate` | Whether patterns should be validated (passed to the `Matcher` and `PhraseMatcher`). Defaults to `False`. ~~bool~~ |
|
| `validate` | Whether patterns should be validated (passed to the `Matcher` and `PhraseMatcher`). Defaults to `False`. ~~bool~~ |
|
||||||
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
|
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
|
||||||
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
|
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
|
||||||
|
@ -92,7 +92,7 @@ be a token pattern (list) or a phrase pattern (string). For example:
|
||||||
| `name` <Tag variant="new">3</Tag> | Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current entity ruler while creating phrase patterns with the nlp object. ~~str~~ |
|
| `name` <Tag variant="new">3</Tag> | Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current entity ruler while creating phrase patterns with the nlp object. ~~str~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
|
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
|
||||||
| `matcher_fuzzy_compare` <Tag variant="new">3.5</Tag> | The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. ~~Callable~~ |
|
| `matcher_fuzzy_compare` <Tag variant="new">3.5</Tag> | The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. ~~Callable~~ |
|
||||||
| `validate` | Whether patterns should be validated, passed to Matcher and PhraseMatcher as `validate`. Defaults to `False`. ~~bool~~ |
|
| `validate` | Whether patterns should be validated, passed to Matcher and PhraseMatcher as `validate`. Defaults to `False`. ~~bool~~ |
|
||||||
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
|
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
|
||||||
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
|
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
|
||||||
|
|
|
@ -286,7 +286,7 @@ pipelines.
|
||||||
| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `title` | An optional project title used in `--help` message and [auto-generated docs](#custom-docs). |
|
| `title` | An optional project title used in `--help` message and [auto-generated docs](#custom-docs). |
|
||||||
| `description` | An optional project description used in [auto-generated docs](#custom-docs). |
|
| `description` | An optional project description used in [auto-generated docs](#custom-docs). |
|
||||||
| `vars` | A dictionary of variables that can be referenced in paths, URLs and scripts and overridden on the CLI, just like [`config.cfg` variables](/usage/training#config-interpolation). For example, `${vars.name}` will use the value of the variable `name`. Variables need to be defined in the section `vars`, but can be a nested dict, so you're able to reference `${vars.model.name}`. |
|
| `vars` | A dictionary of variables that can be referenced in paths, URLs and scripts and overridden on the CLI, just like [`config.cfg` variables](/usage/training#config-interpolation). For example, `${vars.name}` will use the value of the variable `name`. Variables need to be defined in the section `vars`, but can be a nested dict, so you're able to reference `${vars.model.name}`. |
|
||||||
| `env` | A dictionary of variables, mapped to the names of environment variables that will be read in when running the project. For example, `${env.name}` will use the value of the environment variable defined as `name`. |
|
| `env` | A dictionary of variables, mapped to the names of environment variables that will be read in when running the project. For example, `${env.name}` will use the value of the environment variable defined as `name`. |
|
||||||
| `directories` | An optional list of [directories](#project-files) that should be created in the project for assets, training outputs, metrics etc. spaCy will make sure that these directories always exist. |
|
| `directories` | An optional list of [directories](#project-files) that should be created in the project for assets, training outputs, metrics etc. spaCy will make sure that these directories always exist. |
|
||||||
| `assets` | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match. Instead of `url`, you can also provide a `git` block with the keys `repo`, `branch` and `path`, to download from a Git repo. |
|
| `assets` | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match. Instead of `url`, you can also provide a `git` block with the keys `repo`, `branch` and `path`, to download from a Git repo. |
|
||||||
|
|
|
@ -306,7 +306,9 @@ installed in the same environment – that's it.
|
||||||
|
|
||||||
### Loading probability tables into existing models
|
### Loading probability tables into existing models
|
||||||
|
|
||||||
You can load a probability table from [spacy-lookups-data](https://github.com/explosion/spacy-lookups-data) into an existing spaCy model like `en_core_web_sm`.
|
You can load a probability table from
|
||||||
|
[spacy-lookups-data](https://github.com/explosion/spacy-lookups-data) into an
|
||||||
|
existing spaCy model like `en_core_web_sm`.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Requirements: pip install spacy-lookups-data
|
# Requirements: pip install spacy-lookups-data
|
||||||
|
@ -317,7 +319,8 @@ lookups = load_lookups("en", ["lexeme_prob"])
|
||||||
nlp.vocab.lookups.add_table("lexeme_prob", lookups.get_table("lexeme_prob"))
|
nlp.vocab.lookups.add_table("lexeme_prob", lookups.get_table("lexeme_prob"))
|
||||||
```
|
```
|
||||||
|
|
||||||
When training a model from scratch you can also specify probability tables in the `config.cfg`.
|
When training a model from scratch you can also specify probability tables in
|
||||||
|
the `config.cfg`.
|
||||||
|
|
||||||
```ini {title="config.cfg (excerpt)"}
|
```ini {title="config.cfg (excerpt)"}
|
||||||
[initialize.lookups]
|
[initialize.lookups]
|
||||||
|
@ -346,8 +349,8 @@ them**!
|
||||||
To stick with the theme of
|
To stick with the theme of
|
||||||
[this entry points blog post](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
|
[this entry points blog post](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
|
||||||
consider the following custom spaCy
|
consider the following custom spaCy
|
||||||
[pipeline component](/usage/processing-pipelines#custom-components) that prints a
|
[pipeline component](/usage/processing-pipelines#custom-components) that prints
|
||||||
snake when it's called:
|
a snake when it's called:
|
||||||
|
|
||||||
> #### Package directory structure
|
> #### Package directory structure
|
||||||
>
|
>
|
||||||
|
|
Loading…
Reference in New Issue
Block a user