Merge branch 'master' into spacy.io

This commit is contained in:
Ines Montani 2021-03-19 12:09:03 +11:00
commit 6db5414668
21 changed files with 254 additions and 58 deletions

View File

@ -3,6 +3,7 @@ recursive-include spacy *.pyx *.pxd *.txt *.cfg *.jinja
include LICENSE
include README.md
include pyproject.toml
include spacy/py.typed
recursive-exclude spacy/lang *.json
recursive-include spacy/lang *.json.gz
recursive-include spacy/cli *.json *.yml

130
examples/README.md Normal file
View File

@ -0,0 +1,130 @@
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
# spaCy examples
For spaCy v3 we've converted many of the [v2 example
scripts](https://github.com/explosion/spaCy/tree/v2.3.x/examples/) into
end-to-end [spacy projects](https://spacy.io/usage/projects) workflows. The
workflows include all the steps to go from data to packaged spaCy models.
## 🪐 Pipeline component demos
The simplest demos for training a single pipeline component are in the
[`pipelines`](https://github.com/explosion/projects/blob/v3/pipelines) category
including:
- [`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo):
Train a named entity recognizer
- [`pipelines/textcat_demo`](https://github.com/explosion/projects/blob/v3/pipelines/textcat_demo):
Train a text classifier
- [`pipelines/parser_intent_demo`](https://github.com/explosion/projects/blob/v3/pipelines/parser_intent_demo):
Train a dependency parser for custom semantics
## 🪐 Tutorials
The [`tutorials`](https://github.com/explosion/projects/blob/v3/tutorials)
category includes examples that work through specific NLP use cases end-to-end:
- [`tutorials/textcat_goemotions`](https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions):
Train a text classifier to categorize emotions in Reddit posts
- [`tutorials/nel_emerson`](https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson):
Use an entity linker to disambiguate mentions of the same name
Check out the [projects documentation](https://spacy.io/usage/projects) and
browse through the [available
projects](https://github.com/explosion/projects/)!
## 🚀 Get started with a demo project
The
[`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo)
project converts the spaCy v2
[`train_ner.py`](https://github.com/explosion/spaCy/blob/v2.3.x/examples/training/train_ner.py)
demo script into a spaCy v3 project.
1. Clone the project:
```bash
python -m spacy project clone pipelines/ner_demo
```
2. Install requirements and download any data assets:
```bash
cd ner_demo
python -m pip install -r requirements.txt
python -m spacy project assets
```
3. Run the default workflow to convert, train and evaluate:
```bash
python -m spacy project run all
```
Sample output:
```none
Running workflow 'all'
================================== convert ==================================
Running command: /home/user/venv/bin/python scripts/convert.py en assets/train.json corpus/train.spacy
Running command: /home/user/venv/bin/python scripts/convert.py en assets/dev.json corpus/dev.spacy
=============================== create-config ===============================
Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force
Generated config template specific for your use case
- Language: en
- Pipeline: ner
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✔ Auto-filled config with all values
✔ Saved config
configs/config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
=================================== train ===================================
Running command: /home/user/venv/bin/python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1
Using CPU
=========================== Initializing pipeline ===========================
[2021-03-11 19:34:59,101] [INFO] Set up nlp object from config
[2021-03-11 19:34:59,109] [INFO] Pipeline: ['tok2vec', 'ner']
[2021-03-11 19:34:59,113] [INFO] Created vocabulary
[2021-03-11 19:34:59,113] [INFO] Finished initializing nlp object
[2021-03-11 19:34:59,265] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline
============================= Training pipeline =============================
Pipeline: ['tok2vec', 'ner']
Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 7.90 0.00 0.00 0.00 0.00
10 10 0.11 71.07 0.00 0.00 0.00 0.00
20 20 0.65 22.44 50.00 50.00 50.00 0.50
30 30 0.22 6.38 80.00 66.67 100.00 0.80
40 40 0.00 0.00 80.00 66.67 100.00 0.80
50 50 0.00 0.00 80.00 66.67 100.00 0.80
60 60 0.00 0.00 100.00 100.00 100.00 1.00
70 70 0.00 0.00 100.00 100.00 100.00 1.00
80 80 0.00 0.00 100.00 100.00 100.00 1.00
90 90 0.00 0.00 100.00 100.00 100.00 1.00
100 100 0.00 0.00 100.00 100.00 100.00 1.00
✔ Saved pipeline to output directory
training/model-last
```
4. Package the model:
```bash
python -m spacy project run package
```
5. Visualize the model's output with [Streamlit](https://streamlit.io):
```bash
python -m spacy project run visualize-model
```

View File

@ -0,0 +1,5 @@
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
# spaCy examples
See [examples/README.md](../README.md)

View File

@ -1,7 +1,25 @@
## Examples of NER/IOB data that can be converted with `spacy convert`
spacy JSON training files were generated with:
To convert an IOB file to `.spacy` ([`DocBin`](https://spacy.io/api/docbin))
for spaCy v3:
```bash
python -m spacy convert -c iob -s -n 10 -b en_core_web_sm file.iob .
```
See all the `spacy convert` options: https://spacy.io/api/cli#convert
---
The spaCy v2 JSON training files were generated using **spaCy v2** with:
```bash
python -m spacy convert -c iob -s -n 10 -b en file.iob
```
To convert an existing JSON training file to `.spacy` for spaCy v3, convert
with **spaCy v3**:
```bash
python -m spacy convert file.json .
```

View File

@ -5,7 +5,7 @@ requires = [
"cymem>=2.0.2,<2.1.0",
"preshed>=3.0.2,<3.1.0",
"murmurhash>=0.28.0,<1.1.0",
"thinc>=8.0.0,<8.1.0",
"thinc>=8.0.2,<8.1.0",
"blis>=0.4.0,<0.8.0",
"pathy",
"numpy>=1.15.0",

View File

@ -2,7 +2,7 @@
spacy-legacy>=3.0.0,<3.1.0
cymem>=2.0.2,<2.1.0
preshed>=3.0.2,<3.1.0
thinc>=8.0.0,<8.1.0
thinc>=8.0.2,<8.1.0
blis>=0.4.0,<0.8.0
ml_datasets>=0.2.0,<0.3.0
murmurhash>=0.28.0,<1.1.0

View File

@ -34,14 +34,14 @@ setup_requires =
cymem>=2.0.2,<2.1.0
preshed>=3.0.2,<3.1.0
murmurhash>=0.28.0,<1.1.0
thinc>=8.0.0,<8.1.0
thinc>=8.0.2,<8.1.0
install_requires =
# Our libraries
spacy-legacy>=3.0.0,<3.1.0
murmurhash>=0.28.0,<1.1.0
cymem>=2.0.2,<2.1.0
preshed>=3.0.2,<3.1.0
thinc>=8.0.0,<8.1.0
thinc>=8.0.2,<8.1.0
blis>=0.4.0,<0.8.0
wasabi>=0.8.1,<1.1.0
srsly>=2.4.0,<3.0.0

View File

@ -28,6 +28,8 @@ if sys.maxunicode == 65535:
def load(
name: Union[str, Path],
*,
vocab: Union[Vocab, bool] = True,
disable: Iterable[str] = util.SimpleFrozenList(),
exclude: Iterable[str] = util.SimpleFrozenList(),
config: Union[Dict[str, Any], Config] = util.SimpleFrozenDict(),
@ -35,6 +37,7 @@ def load(
"""Load a spaCy model from an installed package or a local path.
name (str): Package name or model path.
vocab (Vocab): A Vocab object. If True, a vocab is created.
disable (Iterable[str]): Names of pipeline components to disable. Disabled
pipes will be loaded but they won't be run unless you explicitly
enable them by calling nlp.enable_pipe.
@ -44,7 +47,9 @@ def load(
keyed by section values in dot notation.
RETURNS (Language): The loaded nlp object.
"""
return util.load_model(name, disable=disable, exclude=exclude, config=config)
return util.load_model(
name, vocab=vocab, disable=disable, exclude=exclude, config=config
)
def blank(
@ -52,7 +57,7 @@ def blank(
*,
vocab: Union[Vocab, bool] = True,
config: Union[Dict[str, Any], Config] = util.SimpleFrozenDict(),
meta: Dict[str, Any] = util.SimpleFrozenDict()
meta: Dict[str, Any] = util.SimpleFrozenDict(),
) -> Language:
"""Create a blank nlp object for a given language code.

View File

@ -1,6 +1,6 @@
# fmt: off
__title__ = "spacy"
__version__ = "3.0.4"
__version__ = "3.0.5"
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
__projects__ = "https://github.com/explosion/projects"

View File

@ -20,7 +20,7 @@ def debug_config_cli(
# fmt: off
ctx: typer.Context, # This is only used to read additional arguments
config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
code_path: Optional[Path] = Opt(None, "--code-path", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
code_path: Optional[Path] = Opt(None, "--code-path", "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
show_funcs: bool = Opt(False, "--show-functions", "-F", help="Show an overview of all registered functions used in the config and where they come from (modules, files etc.)"),
show_vars: bool = Opt(False, "--show-variables", "-V", help="Show an overview of all variables referenced in the config and their values. This will also reflect variables overwritten on the CLI.")
# fmt: on

View File

@ -39,7 +39,7 @@ def debug_data_cli(
# fmt: off
ctx: typer.Context, # This is only used to read additional arguments
config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
code_path: Optional[Path] = Opt(None, "--code-path", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
code_path: Optional[Path] = Opt(None, "--code-path", "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
ignore_warnings: bool = Opt(False, "--ignore-warnings", "-IW", help="Ignore warnings, only show stats and errors"),
verbose: bool = Opt(False, "--verbose", "-V", help="Print additional information and explanations"),
no_format: bool = Opt(False, "--no-format", "-NF", help="Don't pretty-print the results"),

View File

@ -10,7 +10,8 @@ from jinja2 import Template
from .. import util
from ..language import DEFAULT_CONFIG_PRETRAIN_PATH
from ..schemas import RecommendationSchema
from ._util import init_cli, Arg, Opt, show_validation_error, COMMAND, string_to_list
from ._util import init_cli, Arg, Opt, show_validation_error, COMMAND
from ._util import string_to_list, import_code
ROOT = Path(__file__).parent / "templates"
@ -70,7 +71,8 @@ def init_fill_config_cli(
base_path: Path = Arg(..., help="Base config to fill", exists=True, dir_okay=False),
output_file: Path = Arg("-", help="File to save config.cfg to (or - for stdout)", allow_dash=True),
pretraining: bool = Opt(False, "--pretraining", "-pt", help="Include config for pretraining (with 'spacy pretrain')"),
diff: bool = Opt(False, "--diff", "-D", help="Print a visual diff highlighting the changes")
diff: bool = Opt(False, "--diff", "-D", help="Print a visual diff highlighting the changes"),
code_path: Optional[Path] = Opt(None, "--code-path", "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
# fmt: on
):
"""
@ -82,6 +84,7 @@ def init_fill_config_cli(
DOCS: https://spacy.io/api/cli#init-fill-config
"""
import_code(code_path)
fill_config(output_file, base_path, pretraining=pretraining, diff=diff)

View File

@ -120,7 +120,7 @@ def parse_deps(orig_doc: Doc, options: Dict[str, Any] = {}) -> Dict[str, Any]:
doc (Doc): Document do parse.
RETURNS (dict): Generated dependency parse keyed by words and arcs.
"""
doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes(exclude=["user_data"]))
doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes(exclude=["user_data", "user_hooks"]))
if not doc.has_annotation("DEP"):
warnings.warn(Warnings.W005)
if options.get("collapse_phrases", False):

View File

@ -90,12 +90,12 @@ class RussianLemmatizer(Lemmatizer):
return [string.lower()]
return list(set([analysis.normal_form for analysis in filtered_analyses]))
def lookup_lemmatize(self, token: Token) -> List[str]:
def pymorphy2_lookup_lemmatize(self, token: Token) -> List[str]:
string = token.text
analyses = self._morph.parse(string)
if len(analyses) == 1:
return analyses[0].normal_form
return string
return [analyses[0].normal_form]
return [string]
def oc2ud(oc_tag: str) -> Tuple[str, Dict[str, str]]:

0
spacy/py.typed Normal file
View File

View File

@ -6,15 +6,14 @@ def test_build_dependencies():
# Check that library requirements are pinned exactly the same across different setup files.
# TODO: correct checks for numpy rather than ignoring
libs_ignore_requirements = [
"numpy",
"pytest",
"pytest-timeout",
"mock",
"flake8",
"hypothesis",
]
# ignore language-specific packages that shouldn't be installed by all
libs_ignore_setup = [
"numpy",
"fugashi",
"natto-py",
"pythainlp",

View File

@ -142,7 +142,7 @@ def create_pretraining_model(nlp, pretrain_config):
# If the config referred to a Tok2VecListener, grab the original model instead
if type(tok2vec).__name__ == "Tok2VecListener":
original_tok2vec = (
tok2vec.upstream_name if tok2vec.upstream_name is not "*" else "tok2vec"
tok2vec.upstream_name if tok2vec.upstream_name != "*" else "tok2vec"
)
tok2vec = nlp.get_pipe(original_tok2vec).model
try:

View File

@ -88,7 +88,7 @@ class registry(thinc.registry):
displacy_colors = catalogue.create("spacy", "displacy_colors", entry_points=True)
misc = catalogue.create("spacy", "misc", entry_points=True)
# Callback functions used to manipulate nlp object etc.
callbacks = catalogue.create("spacy", "callbacks")
callbacks = catalogue.create("spacy", "callbacks", entry_points=True)
batchers = catalogue.create("spacy", "batchers", entry_points=True)
readers = catalogue.create("spacy", "readers", entry_points=True)
augmenters = catalogue.create("spacy", "augmenters", entry_points=True)

View File

@ -170,14 +170,15 @@ validation error with more details.
$ python -m spacy init fill-config [base_path] [output_file] [--diff]
```
| Name | Description |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `base_path` | Path to base config to fill, e.g. generated by the [quickstart widget](/usage/training#quickstart). ~~Path (positional)~~ |
| `output_file` | Path to output `.cfg` file. If not set, the config is written to stdout so you can pipe it forward to a file. ~~Path (positional)~~ |
| `--pretraining`, `-pt` | Include config for pretraining (with [`spacy pretrain`](/api/cli#pretrain)). Defaults to `False`. ~~bool (flag)~~ |
| `--diff`, `-D` | Print a visual diff highlighting the changes. ~~bool (flag)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | Complete and auto-filled config file for training. |
| Name | Description |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `base_path` | Path to base config to fill, e.g. generated by the [quickstart widget](/usage/training#quickstart). ~~Path (positional)~~ |
| `output_file` | Path to output `.cfg` file. If not set, the config is written to stdout so you can pipe it forward to a file. ~~Path (positional)~~ |
| `--code`, `-c` | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
| `--pretraining`, `-pt` | Include config for pretraining (with [`spacy pretrain`](/api/cli#pretrain)). Defaults to `False`. ~~bool (flag)~~ |
| `--diff`, `-D` | Print a visual diff highlighting the changes. ~~bool (flag)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | Complete and auto-filled config file for training. |
### init vectors {#init-vectors new="3" tag="command"}
@ -261,24 +262,24 @@ $ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type]
| `output_dir` | Output directory for converted file. Defaults to `"-"`, meaning data will be written to `stdout`. ~~Optional[Path] \(positional)~~ |
| `--converter`, `-c` <Tag variant="new">2</Tag> | Name of converter to use (see below). ~~str (option)~~ |
| `--file-type`, `-t` <Tag variant="new">2.1</Tag> | Type of file to create. Either `spacy` (default) for binary [`DocBin`](/api/docbin) data or `json` for v2.x JSON format. ~~str (option)~~ |
| `--n-sents`, `-n` | Number of sentences per document. ~~int (option)~~ |
| `--seg-sents`, `-s` <Tag variant="new">2.2</Tag> | Segment sentences (for `--converter ner`). ~~bool (flag)~~ |
| `--n-sents`, `-n` | Number of sentences per document. Supported for: `conll`, `conllu`, `iob`, `ner` ~~int (option)~~ |
| `--seg-sents`, `-s` <Tag variant="new">2.2</Tag> | Segment sentences. Supported for: `conll`, `ner` ~~bool (flag)~~ |
| `--base`, `-b` | Trained spaCy pipeline for sentence segmentation to use as base (for `--seg-sents`). ~~Optional[str](option)~~ |
| `--morphology`, `-m` | Enable appending morphology to tags. ~~bool (flag)~~ |
| `--ner-map`, `-nm` | NER tag mapping (as JSON-encoded dict of entity types). ~~Optional[Path](option)~~ |
| `--morphology`, `-m` | Enable appending morphology to tags. Supported for: `conllu` ~~bool (flag)~~ |
| `--ner-map`, `-nm` | NER tag mapping (as JSON-encoded dict of entity types). Supported for: `conllu` ~~Optional[Path](option)~~ |
| `--lang`, `-l` <Tag variant="new">2.1</Tag> | Language code (if tokenizer required). ~~Optional[str] \(option)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | Binary [`DocBin`](/api/docbin) training data that can be used with [`spacy train`](/api/cli#train). |
### Converters {#converters}
| ID | Description |
| ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto` | Automatically pick converter based on file extension and file content (default). |
| `json` | JSON-formatted training data used in spaCy v2.x. |
| `conll` | Universal Dependencies `.conllu` or `.conll` format. |
| `ner` | NER with IOB/IOB2 tags, one token per line with columns separated by whitespace. The first column is the token and the final column is the IOB tag. Sentences are separated by blank lines and documents are separated by the line `-DOCSTART- -X- O O`. Supports CoNLL 2003 NER format. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
| `iob` | NER with IOB/IOB2 tags, one sentence per line with tokens separated by whitespace and annotation separated by `\|`, either `word\|B-ENT`or`word\|POS\|B-ENT`. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
| ID | Description |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto` | Automatically pick converter based on file extension and file content (default). |
| `json` | JSON-formatted training data used in spaCy v2.x. |
| `conllu` | Universal Dependencies `.conllu` format. |
| `ner` / `conll` | NER with IOB/IOB2/BILUO tags, one token per line with columns separated by whitespace. The first column is the token and the final column is the NER tag. Sentences are separated by blank lines and documents are separated by the line `-DOCSTART- -X- O O`. Supports CoNLL 2003 NER format. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
| `iob` | NER with IOB/IOB2/BILUO tags, one sentence per line with tokens separated by whitespace and annotation separated by `\|`, either `word\|B-ENT`or`word\|POS\|B-ENT`. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
## debug {#debug new="3"}
@ -805,7 +806,7 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
| Name | Description |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config_path` | Path to [training config](/api/data-formats#config) file containing all settings and hyperparameters. If `-`, the data will be [read from stdin](/usage/training#config-stdin). ~~Union[Path, str] \(positional)~~ |
| `--output`, `-o` | Directory to store trained pipeline in. Will be created if it doesn't exist. ~~Optional[Path] \(positional)~~ |
| `--output`, `-o` | Directory to store trained pipeline in. Will be created if it doesn't exist. ~~Optional[Path] \(option)~~ |
| `--code`, `-c` | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
| `--verbose`, `-V` | Show more detailed messages during training. ~~bool (flag)~~ |
| `--gpu-id`, `-g` | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~ |

View File

@ -48,6 +48,7 @@ specified separately using the new `exclude` keyword argument.
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | Pipeline to load, i.e. package name or path. ~~Union[str, Path]~~ |
| _keyword-only_ | |
| `vocab` | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ |
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ |
@ -83,9 +84,9 @@ Create a blank pipeline of a given language class. This function is the twin of
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) of the language class to load. ~~str~~ |
| _keyword-only_ | |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
| `vocab` | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ |
| `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ |
| `meta` <Tag variant="new">3</Tag> | Optional meta overrides for [`nlp.meta`](/api/language#meta). ~~Dict[str, Any]~~ |
| `meta` | Optional meta overrides for [`nlp.meta`](/api/language#meta). ~~Dict[str, Any]~~ |
| **RETURNS** | An empty `Language` object of the appropriate subclass. ~~Language~~ |
### spacy.info {#spacy.info tag="function"}
@ -140,9 +141,9 @@ pipelines.
<Infobox variant="warning" title="Jupyter notebook usage">
In a Jupyter notebook, run `prefer_gpu()` in the same cell as `spacy.load()`
to ensure that the model is loaded on the correct device. See [more
details](/usage/v3#jupyter-notebook-gpu).
In a Jupyter notebook, run `prefer_gpu()` in the same cell as `spacy.load()` to
ensure that the model is loaded on the correct device. See
[more details](/usage/v3#jupyter-notebook-gpu).
</Infobox>
@ -168,9 +169,9 @@ and _before_ loading any pipelines.
<Infobox variant="warning" title="Jupyter notebook usage">
In a Jupyter notebook, run `require_gpu()` in the same cell as `spacy.load()`
to ensure that the model is loaded on the correct device. See [more
details](/usage/v3#jupyter-notebook-gpu).
In a Jupyter notebook, run `require_gpu()` in the same cell as `spacy.load()` to
ensure that the model is loaded on the correct device. See
[more details](/usage/v3#jupyter-notebook-gpu).
</Infobox>
@ -195,9 +196,9 @@ after importing spaCy and _before_ loading any pipelines.
<Infobox variant="warning" title="Jupyter notebook usage">
In a Jupyter notebook, run `require_cpu()` in the same cell as `spacy.load()`
to ensure that the model is loaded on the correct device. See [more
details](/usage/v3#jupyter-notebook-gpu).
In a Jupyter notebook, run `require_cpu()` in the same cell as `spacy.load()` to
ensure that the model is loaded on the correct device. See
[more details](/usage/v3#jupyter-notebook-gpu).
</Infobox>
@ -945,7 +946,8 @@ and create a `Language` object. The model data will then be loaded in via
| Name | Description |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | Package name or path. ~~str~~ |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
| _keyword-only_ | |
| `vocab` | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ |
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [`nlp.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
@ -968,7 +970,8 @@ A helper function to use in the `load()` method of a pipeline package's
| Name | Description |
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `init_file` | Path to package's `__init__.py`, i.e. `__file__`. ~~Union[str, Path]~~ |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
| _keyword-only_ | |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ |
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
@ -1147,11 +1150,11 @@ vary on each step.
> nlp.update(batch)
> ```
| Name | Description |
| ---------- | ---------------------------------------- |
| `items` | The items to batch up. ~~Iterable[Any]~~ |
| `size` | int / iterable | The batch size(s). ~~Union[int, Sequence[int]]~~ |
| **YIELDS** | The batches. |
| Name | Description |
| ---------- | ------------------------------------------------ |
| `items` | The items to batch up. ~~Iterable[Any]~~ |
| `size` | The batch size(s). ~~Union[int, Sequence[int]]~~ |
| **YIELDS** | The batches. |
### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}

View File

@ -1,5 +1,36 @@
{
"resources": [
{
"id": "spikex",
"title": "SpikeX - SpaCy Pipes for Knowledge Extraction",
"slogan": "Use SpikeX to build knowledge extraction tools with almost-zero effort",
"description": "SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.",
"github": "erre-quadro/spikex",
"pip": "spikex",
"code_example": [
"from spacy import load as spacy_load",
"from spikex.wikigraph import load as wg_load",
"from spikex.pipes import WikiPageX",
"",
"# load a spacy model and get a doc",
"nlp = spacy_load('en_core_web_sm')",
"doc = nlp('An apple a day keeps the doctor away')",
"# load a WikiGraph",
"wg = wg_load('simplewiki_core')",
"# get a WikiPageX and extract all pages",
"wikipagex = WikiPageX(wg)",
"doc = wikipagex(doc)",
"# see all pages extracted from the doc",
"for span in doc._.wiki_spans:",
" print(span._.wiki_pages)"
],
"category": ["pipeline", "standalone"],
"author": "Erre Quadro",
"author_links": {
"github": "erre-quadro",
"website": "https://www.errequadrosrl.com"
}
},
{
"id": "spacy-dbpedia-spotlight",
"title": "DBpedia Spotlight for SpaCy",