mirror of
https://github.com/explosion/spaCy.git
synced 2025-06-29 09:23:12 +03:00
Merge branch 'master' into feature-improve-model-download
This commit is contained in:
commit
7ca49c2061
107
.github/contributors/kwhumphreys.md
vendored
Normal file
107
.github/contributors/kwhumphreys.md
vendored
Normal file
|
@ -0,0 +1,107 @@
|
||||||
|
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------------------- |
|
||||||
|
| Name | Kevin Humphreys |
|
||||||
|
| Company name (if applicable) | Textio Inc. |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 01-03-2018 |
|
||||||
|
| GitHub username | kwhumphreys |
|
||||||
|
| Website (optional) | |
|
|
@ -150,7 +150,7 @@ recipes, that does provide some argument for bringing it "in house".
|
||||||
|
|
||||||
### Getting started
|
### Getting started
|
||||||
|
|
||||||
To make changes to spaCy's code base, you need to clone the GitHub repository
|
To make changes to spaCy's code base, you need to fork then clone the GitHub repository
|
||||||
and build spaCy from source. You'll need to make sure that you have a
|
and build spaCy from source. You'll need to make sure that you have a
|
||||||
development environment consisting of a Python distribution including header
|
development environment consisting of a Python distribution including header
|
||||||
files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
|
files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
|
||||||
|
|
|
@ -45,20 +45,25 @@ This is a list of everyone who has made significant contributions to spaCy, in a
|
||||||
* Maxim Samsonov, [@maxirmx](https://github.com/maxirmx)
|
* Maxim Samsonov, [@maxirmx](https://github.com/maxirmx)
|
||||||
* Michael Wallin, [@wallinm1](https://github.com/wallinm1)
|
* Michael Wallin, [@wallinm1](https://github.com/wallinm1)
|
||||||
* Miguel Almeida, [@mamoit](https://github.com/mamoit)
|
* Miguel Almeida, [@mamoit](https://github.com/mamoit)
|
||||||
|
* Motoki Wu, [@tokestermw](https://github.com/tokestermw)
|
||||||
* Oleg Zd, [@olegzd](https://github.com/olegzd)
|
* Oleg Zd, [@olegzd](https://github.com/olegzd)
|
||||||
|
* Orhan Bilgin, [@melanuria](https://github.com/melanuria)
|
||||||
* Orion Montoya, [@mdcclv](https://github.com/mdcclv)
|
* Orion Montoya, [@mdcclv](https://github.com/mdcclv)
|
||||||
* Paul O'Leary McCann, [@polm](https://github.com/polm)
|
* Paul O'Leary McCann, [@polm](https://github.com/polm)
|
||||||
* Pokey Rule, [@pokey](https://github.com/pokey)
|
* Pokey Rule, [@pokey](https://github.com/pokey)
|
||||||
* Ramanan Balakrishnan, [@ramananbalakrishnan](https://github.com/ramananbalakrishnan)
|
* Ramanan Balakrishnan, [@ramananbalakrishnan](https://github.com/ramananbalakrishnan)
|
||||||
* Raphaël Bournhonesque, [@raphael0202](https://github.com/raphael0202)
|
* Raphaël Bournhonesque, [@raphael0202](https://github.com/raphael0202)
|
||||||
* Rob van Nieuwpoort, [@RvanNieuwpoort](https://github.com/RvanNieuwpoort)
|
* Rob van Nieuwpoort, [@RvanNieuwpoort](https://github.com/RvanNieuwpoort)
|
||||||
|
* Roman Domrachev, [@ligser](https://github.com/ligser)
|
||||||
* Roman Inflianskas, [@rominf](https://github.com/rominf)
|
* Roman Inflianskas, [@rominf](https://github.com/rominf)
|
||||||
* Sam Bozek, [@sambozek](https://github.com/sambozek)
|
* Sam Bozek, [@sambozek](https://github.com/sambozek)
|
||||||
* Sasho Savkov, [@savkov](https://github.com/savkov)
|
* Sasho Savkov, [@savkov](https://github.com/savkov)
|
||||||
* Shuvanon Razik, [@shuvanon](https://github.com/shuvanon)
|
* Shuvanon Razik, [@shuvanon](https://github.com/shuvanon)
|
||||||
|
* Søren Lind Kristiansen, [@sorenlind](https://github.com/sorenlind)
|
||||||
* Swier, [@swierh](https://github.com/swierh)
|
* Swier, [@swierh](https://github.com/swierh)
|
||||||
* Thomas Tanon, [@Tpt](https://github.com/Tpt)
|
* Thomas Tanon, [@Tpt](https://github.com/Tpt)
|
||||||
* Tiago Rodrigues, [@TiagoMRodrigues](https://github.com/TiagoMRodrigues)
|
* Tiago Rodrigues, [@TiagoMRodrigues](https://github.com/TiagoMRodrigues)
|
||||||
|
* Vadim Mazaev, [@GreenRiverRUS](https://github.com/GreenRiverRUS)
|
||||||
* Vimos Tan, [@Vimos](https://github.com/Vimos)
|
* Vimos Tan, [@Vimos](https://github.com/Vimos)
|
||||||
* Vsevolod Solovyov, [@vsolovyov](https://github.com/vsolovyov)
|
* Vsevolod Solovyov, [@vsolovyov](https://github.com/vsolovyov)
|
||||||
* Wah Loon Keng, [@kengz](https://github.com/kengz)
|
* Wah Loon Keng, [@kengz](https://github.com/kengz)
|
||||||
|
|
|
@ -25,4 +25,4 @@ def blank(name, **kwargs):
|
||||||
|
|
||||||
|
|
||||||
def info(model=None, markdown=False):
|
def info(model=None, markdown=False):
|
||||||
return cli_info(None, model, markdown)
|
return cli_info(model, markdown)
|
||||||
|
|
|
@ -28,7 +28,7 @@ if __name__ == '__main__':
|
||||||
command = sys.argv.pop(1)
|
command = sys.argv.pop(1)
|
||||||
sys.argv[0] = 'spacy %s' % command
|
sys.argv[0] = 'spacy %s' % command
|
||||||
if command in commands:
|
if command in commands:
|
||||||
plac.call(commands[command])
|
plac.call(commands[command], sys.argv[1:])
|
||||||
else:
|
else:
|
||||||
prints(
|
prints(
|
||||||
"Available: %s" % ', '.join(commands),
|
"Available: %s" % ', '.join(commands),
|
||||||
|
|
|
@ -24,8 +24,7 @@ CONVERTERS = {
|
||||||
n_sents=("Number of sentences per doc", "option", "n", int),
|
n_sents=("Number of sentences per doc", "option", "n", int),
|
||||||
converter=("Name of converter (auto, iob, conllu or ner)", "option", "c", str),
|
converter=("Name of converter (auto, iob, conllu or ner)", "option", "c", str),
|
||||||
morphology=("Enable appending morphology to tags", "flag", "m", bool))
|
morphology=("Enable appending morphology to tags", "flag", "m", bool))
|
||||||
def convert(cmd, input_file, output_dir, n_sents=1, morphology=False,
|
def convert(input_file, output_dir, n_sents=1, morphology=False, converter='auto'):
|
||||||
converter='auto'):
|
|
||||||
"""
|
"""
|
||||||
Convert files into JSON format for use with train command and other
|
Convert files into JSON format for use with train command and other
|
||||||
experiment management functions.
|
experiment management functions.
|
||||||
|
|
|
@ -16,7 +16,7 @@ from .. import about
|
||||||
model=("model to download, shortcut or name)", "positional", None, str),
|
model=("model to download, shortcut or name)", "positional", None, str),
|
||||||
direct=("force direct download. Needs model name with version and won't "
|
direct=("force direct download. Needs model name with version and won't "
|
||||||
"perform compatibility check", "flag", "d", bool))
|
"perform compatibility check", "flag", "d", bool))
|
||||||
def download(cmd, model, direct=False):
|
def download(model, direct=False):
|
||||||
"""
|
"""
|
||||||
Download compatible model from default download path using pip. Model
|
Download compatible model from default download path using pip. Model
|
||||||
can be shortcut, model name or, if --direct flag is set, full model name
|
can be shortcut, model name or, if --direct flag is set, full model name
|
||||||
|
|
|
@ -25,8 +25,8 @@ numpy.random.seed(0)
|
||||||
displacy_path=("directory to output rendered parses as HTML", "option",
|
displacy_path=("directory to output rendered parses as HTML", "option",
|
||||||
"dp", str),
|
"dp", str),
|
||||||
displacy_limit=("limit of parses to render as HTML", "option", "dl", int))
|
displacy_limit=("limit of parses to render as HTML", "option", "dl", int))
|
||||||
def evaluate(cmd, model, data_path, gpu_id=-1, gold_preproc=False,
|
def evaluate(model, data_path, gpu_id=-1, gold_preproc=False, displacy_path=None,
|
||||||
displacy_path=None, displacy_limit=25):
|
displacy_limit=25):
|
||||||
"""
|
"""
|
||||||
Evaluate a model. To render a sample of parses in a HTML file, set an
|
Evaluate a model. To render a sample of parses in a HTML file, set an
|
||||||
output directory as the displacy_path argument.
|
output directory as the displacy_path argument.
|
||||||
|
|
|
@ -13,7 +13,7 @@ from .. import util
|
||||||
@plac.annotations(
|
@plac.annotations(
|
||||||
model=("optional: shortcut link of model", "positional", None, str),
|
model=("optional: shortcut link of model", "positional", None, str),
|
||||||
markdown=("generate Markdown for GitHub issues", "flag", "md", str))
|
markdown=("generate Markdown for GitHub issues", "flag", "md", str))
|
||||||
def info(cmd, model=None, markdown=False):
|
def info(model=None, markdown=False):
|
||||||
"""Print info about spaCy installation. If a model shortcut link is
|
"""Print info about spaCy installation. If a model shortcut link is
|
||||||
speficied as an argument, print model information. Flag --markdown
|
speficied as an argument, print model information. Flag --markdown
|
||||||
prints details in Markdown for easy copy-pasting to GitHub issues.
|
prints details in Markdown for easy copy-pasting to GitHub issues.
|
||||||
|
|
|
@ -25,7 +25,7 @@ from ..util import prints, ensure_path, get_lang_class
|
||||||
prune_vectors=("optional: number of vectors to prune to",
|
prune_vectors=("optional: number of vectors to prune to",
|
||||||
"option", "V", int)
|
"option", "V", int)
|
||||||
)
|
)
|
||||||
def init_model(_cmd, lang, output_dir, freqs_loc, clusters_loc=None, vectors_loc=None, prune_vectors=-1):
|
def init_model(lang, output_dir, freqs_loc, clusters_loc=None, vectors_loc=None, prune_vectors=-1):
|
||||||
"""
|
"""
|
||||||
Create a new model from raw data, like word frequencies, Brown clusters
|
Create a new model from raw data, like word frequencies, Brown clusters
|
||||||
and word vectors.
|
and word vectors.
|
||||||
|
|
|
@ -13,7 +13,7 @@ from .. import util
|
||||||
origin=("package name or local path to model", "positional", None, str),
|
origin=("package name or local path to model", "positional", None, str),
|
||||||
link_name=("name of shortuct link to create", "positional", None, str),
|
link_name=("name of shortuct link to create", "positional", None, str),
|
||||||
force=("force overwriting of existing link", "flag", "f", bool))
|
force=("force overwriting of existing link", "flag", "f", bool))
|
||||||
def link(cmd, origin, link_name, force=False, model_path=None):
|
def link(origin, link_name, force=False, model_path=None):
|
||||||
"""
|
"""
|
||||||
Create a symlink for models within the spacy/data directory. Accepts
|
Create a symlink for models within the spacy/data directory. Accepts
|
||||||
either the name of a pip package, or the local path to the model data
|
either the name of a pip package, or the local path to the model data
|
||||||
|
|
|
@ -20,7 +20,7 @@ from .. import about
|
||||||
"the command line prompt", "flag", "c", bool),
|
"the command line prompt", "flag", "c", bool),
|
||||||
force=("force overwriting of existing model directory in output directory",
|
force=("force overwriting of existing model directory in output directory",
|
||||||
"flag", "f", bool))
|
"flag", "f", bool))
|
||||||
def package(cmd, input_dir, output_dir, meta_path=None, create_meta=False,
|
def package(input_dir, output_dir, meta_path=None, create_meta=False,
|
||||||
force=False):
|
force=False):
|
||||||
"""
|
"""
|
||||||
Generate Python package for model data, including meta and required
|
Generate Python package for model data, including meta and required
|
||||||
|
|
|
@ -29,7 +29,7 @@ def read_inputs(loc):
|
||||||
@plac.annotations(
|
@plac.annotations(
|
||||||
lang=("model/language", "positional", None, str),
|
lang=("model/language", "positional", None, str),
|
||||||
inputs=("Location of input file", "positional", None, read_inputs))
|
inputs=("Location of input file", "positional", None, read_inputs))
|
||||||
def profile(cmd, lang, inputs=None):
|
def profile(lang, inputs=None):
|
||||||
"""
|
"""
|
||||||
Profile a spaCy pipeline, to find out which functions take the most time.
|
Profile a spaCy pipeline, to find out which functions take the most time.
|
||||||
"""
|
"""
|
||||||
|
|
|
@ -38,7 +38,7 @@ numpy.random.seed(0)
|
||||||
version=("Model version", "option", "V", str),
|
version=("Model version", "option", "V", str),
|
||||||
meta_path=("Optional path to meta.json. All relevant properties will be "
|
meta_path=("Optional path to meta.json. All relevant properties will be "
|
||||||
"overwritten.", "option", "m", Path))
|
"overwritten.", "option", "m", Path))
|
||||||
def train(cmd, lang, output_dir, train_data, dev_data, n_iter=30, n_sents=0,
|
def train(lang, output_dir, train_data, dev_data, n_iter=30, n_sents=0,
|
||||||
use_gpu=-1, vectors=None, no_tagger=False,
|
use_gpu=-1, vectors=None, no_tagger=False,
|
||||||
no_parser=False, no_entities=False, gold_preproc=False,
|
no_parser=False, no_entities=False, gold_preproc=False,
|
||||||
version="0.0.0", meta_path=None):
|
version="0.0.0", meta_path=None):
|
||||||
|
|
|
@ -11,7 +11,7 @@ from ..util import prints, get_data_path, read_json
|
||||||
from .. import about
|
from .. import about
|
||||||
|
|
||||||
|
|
||||||
def validate(cmd):
|
def validate():
|
||||||
"""Validate that the currently installed version of spaCy is compatible
|
"""Validate that the currently installed version of spaCy is compatible
|
||||||
with the installed models. Should be run after `pip install -U spacy`.
|
with the installed models. Should be run after `pip install -U spacy`.
|
||||||
"""
|
"""
|
||||||
|
|
|
@ -21,8 +21,7 @@ from ..util import prints, ensure_path
|
||||||
prune_vectors=("optional: number of vectors to prune to.",
|
prune_vectors=("optional: number of vectors to prune to.",
|
||||||
"option", "V", int)
|
"option", "V", int)
|
||||||
)
|
)
|
||||||
def make_vocab(cmd, lang, output_dir, lexemes_loc,
|
def make_vocab(lang, output_dir, lexemes_loc, vectors_loc=None, prune_vectors=-1):
|
||||||
vectors_loc=None, prune_vectors=-1):
|
|
||||||
"""Compile a vocabulary from a lexicon jsonl file and word vectors."""
|
"""Compile a vocabulary from a lexicon jsonl file and word vectors."""
|
||||||
if not lexemes_loc.exists():
|
if not lexemes_loc.exists():
|
||||||
prints(lexemes_loc, title="Can't find lexical data", exits=1)
|
prints(lexemes_loc, title="Can't find lexical data", exits=1)
|
||||||
|
|
|
@ -213,7 +213,8 @@ for verb_data in [
|
||||||
{ORTH: "could", NORM: "could", TAG: "MD"},
|
{ORTH: "could", NORM: "could", TAG: "MD"},
|
||||||
{ORTH: "might", NORM: "might", TAG: "MD"},
|
{ORTH: "might", NORM: "might", TAG: "MD"},
|
||||||
{ORTH: "must", NORM: "must", TAG: "MD"},
|
{ORTH: "must", NORM: "must", TAG: "MD"},
|
||||||
{ORTH: "should", NORM: "should", TAG: "MD"}]:
|
{ORTH: "should", NORM: "should", TAG: "MD"},
|
||||||
|
{ORTH: "would", NORM: "would", TAG: "MD"}]:
|
||||||
verb_data_tc = dict(verb_data)
|
verb_data_tc = dict(verb_data)
|
||||||
verb_data_tc[ORTH] = verb_data_tc[ORTH].title()
|
verb_data_tc[ORTH] = verb_data_tc[ORTH].title()
|
||||||
for data in [verb_data, verb_data_tc]:
|
for data in [verb_data, verb_data_tc]:
|
||||||
|
|
|
@ -9,7 +9,6 @@ from ...cli.train import train
|
||||||
|
|
||||||
@pytest.mark.xfail
|
@pytest.mark.xfail
|
||||||
def test_cli_trained_model_can_be_saved(tmpdir):
|
def test_cli_trained_model_can_be_saved(tmpdir):
|
||||||
cmd = None
|
|
||||||
lang = 'nl'
|
lang = 'nl'
|
||||||
output_dir = str(tmpdir)
|
output_dir = str(tmpdir)
|
||||||
train_file = NamedTemporaryFile('wb', dir=output_dir, delete=False)
|
train_file = NamedTemporaryFile('wb', dir=output_dir, delete=False)
|
||||||
|
@ -86,6 +85,6 @@ def test_cli_trained_model_can_be_saved(tmpdir):
|
||||||
|
|
||||||
# spacy train -n 1 -g -1 nl output_nl training_corpus.json training \
|
# spacy train -n 1 -g -1 nl output_nl training_corpus.json training \
|
||||||
# corpus.json
|
# corpus.json
|
||||||
train(cmd, lang, output_dir, train_data, dev_data, n_iter=1)
|
train(lang, output_dir, train_data, dev_data, n_iter=1)
|
||||||
|
|
||||||
assert True
|
assert True
|
||||||
|
|
13
spacy/tests/regression/test_issue1758.py
Normal file
13
spacy/tests/regression/test_issue1758.py
Normal file
|
@ -0,0 +1,13 @@
|
||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["would've"])
|
||||||
|
def test_issue1758(en_tokenizer, text):
|
||||||
|
"""Test that "would've" is handled by the English tokenizer exceptions."""
|
||||||
|
tokens = en_tokenizer(text)
|
||||||
|
assert len(tokens) == 2
|
||||||
|
assert tokens[0].tag_ == "MD"
|
||||||
|
assert tokens[1].lemma_ == "have"
|
|
@ -51,7 +51,9 @@ p
|
||||||
p
|
p
|
||||||
| Import and load a #[code Language] class. Allows lazy-loading
|
| Import and load a #[code Language] class. Allows lazy-loading
|
||||||
| #[+a("/usage/adding-languages") language data] and importing
|
| #[+a("/usage/adding-languages") language data] and importing
|
||||||
| languages using the two-letter language code.
|
| languages using the two-letter language code. To add a language code
|
||||||
|
| for a custom language class, you can use the
|
||||||
|
| #[+api("top-level#util.set_lang_class") #[code set_lang_class]] helper.
|
||||||
|
|
||||||
+aside-code("Example").
|
+aside-code("Example").
|
||||||
for lang_id in ['en', 'de']:
|
for lang_id in ['en', 'de']:
|
||||||
|
@ -70,6 +72,33 @@ p
|
||||||
+cell #[code Language]
|
+cell #[code Language]
|
||||||
+cell Language class.
|
+cell Language class.
|
||||||
|
|
||||||
|
+h(3, "util.set_lang_class") util.set_lang_class
|
||||||
|
+tag function
|
||||||
|
|
||||||
|
p
|
||||||
|
| Set a custom #[code Language] class name that can be loaded via
|
||||||
|
| #[+api("top-level#util.get_lang_class") #[code get_lang_class]]. If
|
||||||
|
| your model uses a custom language, this is required so that spaCy can
|
||||||
|
| load the correct class from the two-letter language code.
|
||||||
|
|
||||||
|
+aside-code("Example").
|
||||||
|
from spacy.lang.xy import CustomLanguage
|
||||||
|
|
||||||
|
util.set_lang_class('xy', CustomLanguage)
|
||||||
|
lang_class = util.get_lang_class('xy')
|
||||||
|
nlp = lang_class()
|
||||||
|
|
||||||
|
+table(["Name", "Type", "Description"])
|
||||||
|
+row
|
||||||
|
+cell #[code name]
|
||||||
|
+cell unicode
|
||||||
|
+cell Two-letter language code, e.g. #[code 'en'].
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code cls]
|
||||||
|
+cell #[code Language]
|
||||||
|
+cell The language class, e.g. #[code English].
|
||||||
|
|
||||||
+h(3, "util.load_model") util.load_model
|
+h(3, "util.load_model") util.load_model
|
||||||
+tag function
|
+tag function
|
||||||
+tag-new(2)
|
+tag-new(2)
|
||||||
|
|
|
@ -136,7 +136,7 @@ p
|
||||||
+aside-code("Example").
|
+aside-code("Example").
|
||||||
from spacy.gold import biluo_tags_from_offsets
|
from spacy.gold import biluo_tags_from_offsets
|
||||||
|
|
||||||
doc = nlp('I like London.')
|
doc = nlp(u'I like London.')
|
||||||
entities = [(7, 13, 'LOC')]
|
entities = [(7, 13, 'LOC')]
|
||||||
tags = biluo_tags_from_offsets(doc, entities)
|
tags = biluo_tags_from_offsets(doc, entities)
|
||||||
assert tags == ['O', 'O', 'U-LOC', 'O']
|
assert tags == ['O', 'O', 'U-LOC', 'O']
|
||||||
|
|
Loading…
Reference in New Issue
Block a user