mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-05 22:20:34 +03:00
Merge remote-tracking branch 'origin/develop' into rliaw-develop
This commit is contained in:
commit
8eb2484504
106
.github/contributors/hertelm.md
vendored
Normal file
106
.github/contributors/hertelm.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Matthias Hertel |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | June 29, 2020 |
|
||||||
|
| GitHub username | hertelm |
|
||||||
|
| Website (optional) | |
|
|
@ -1,6 +1,8 @@
|
||||||
redirects = [
|
redirects = [
|
||||||
# Netlify
|
# Netlify
|
||||||
{from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
|
{from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
|
||||||
|
# Subdomain for branches
|
||||||
|
{from = "https://nightly.spacy.io/*", to="https://spacy-io-develop.spacy.io/:splat", force = true, status = 200},
|
||||||
# Old subdomains
|
# Old subdomains
|
||||||
{from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
{from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||||
{from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
{from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||||
|
|
4
setup.py
4
setup.py
|
@ -81,7 +81,7 @@ def is_new_osx():
|
||||||
return False
|
return False
|
||||||
mac_ver = platform.mac_ver()[0]
|
mac_ver = platform.mac_ver()[0]
|
||||||
if mac_ver.startswith("10"):
|
if mac_ver.startswith("10"):
|
||||||
minor_version = int(mac_ver.split('.')[1])
|
minor_version = int(mac_ver.split(".")[1])
|
||||||
if minor_version >= 7:
|
if minor_version >= 7:
|
||||||
return True
|
return True
|
||||||
else:
|
else:
|
||||||
|
@ -158,7 +158,7 @@ def setup_package():
|
||||||
ext_modules = cythonize(ext_modules, compiler_directives=COMPILER_DIRECTIVES)
|
ext_modules = cythonize(ext_modules, compiler_directives=COMPILER_DIRECTIVES)
|
||||||
|
|
||||||
setup(
|
setup(
|
||||||
name="spacy",
|
name="spacy-nightly",
|
||||||
packages=PACKAGES,
|
packages=PACKAGES,
|
||||||
version=about["__version__"],
|
version=about["__version__"],
|
||||||
ext_modules=ext_modules,
|
ext_modules=ext_modules,
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# fmt: off
|
# fmt: off
|
||||||
__title__ = "spacy"
|
__title__ = "spacy-nightly"
|
||||||
__version__ = "3.0.0.dev12"
|
__version__ = "3.0.0a0"
|
||||||
__release__ = True
|
__release__ = True
|
||||||
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
||||||
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
||||||
|
|
|
@ -242,12 +242,16 @@ def project_clone(
|
||||||
try:
|
try:
|
||||||
run_command(cmd)
|
run_command(cmd)
|
||||||
except SystemExit:
|
except SystemExit:
|
||||||
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'"
|
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'."
|
||||||
msg.fail(err)
|
msg.fail(err)
|
||||||
with (tmp_dir / ".git" / "info" / "sparse-checkout").open("w") as f:
|
with (tmp_dir / ".git" / "info" / "sparse-checkout").open("w") as f:
|
||||||
f.write(name)
|
f.write(name)
|
||||||
run_command(["git", "-C", str(tmp_dir), "fetch"])
|
try:
|
||||||
run_command(["git", "-C", str(tmp_dir), "checkout"])
|
run_command(["git", "-C", str(tmp_dir), "fetch"])
|
||||||
|
run_command(["git", "-C", str(tmp_dir), "checkout"])
|
||||||
|
except SystemExit:
|
||||||
|
err = f"Could not clone '{name}' in the repo '{repo}'."
|
||||||
|
msg.fail(err)
|
||||||
shutil.move(str(tmp_dir / Path(name).name), str(project_dir))
|
shutil.move(str(tmp_dir / Path(name).name), str(project_dir))
|
||||||
msg.good(f"Cloned project '{name}' from {repo} into {project_dir}")
|
msg.good(f"Cloned project '{name}' from {repo} into {project_dir}")
|
||||||
for sub_dir in DIRS:
|
for sub_dir in DIRS:
|
||||||
|
@ -525,9 +529,9 @@ def update_dvc_config(
|
||||||
outputs_no_cache = command.get("outputs_no_cache", [])
|
outputs_no_cache = command.get("outputs_no_cache", [])
|
||||||
if not deps and not outputs and not outputs_no_cache:
|
if not deps and not outputs and not outputs_no_cache:
|
||||||
continue
|
continue
|
||||||
# Default to "." as the project path since dvc.yaml is auto-generated
|
# Default to the working dir as the project path since dvc.yaml is auto-generated
|
||||||
# and we don't want arbitrary paths in there
|
# and we don't want arbitrary paths in there
|
||||||
project_cmd = ["python", "-m", NAME, "project", ".", "exec", name]
|
project_cmd = ["python", "-m", NAME, "project", "exec", name]
|
||||||
deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
|
deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
|
||||||
outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
|
outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
|
||||||
outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
|
outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
|
||||||
|
|
|
@ -339,6 +339,7 @@ def create_train_batches(nlp, corpus, cfg, randomization_index):
|
||||||
yield epoch, batch
|
yield epoch, batch
|
||||||
if max_epochs >= 1 and epoch >= max_epochs:
|
if max_epochs >= 1 and epoch >= max_epochs:
|
||||||
break
|
break
|
||||||
|
random.shuffle(train_examples)
|
||||||
|
|
||||||
|
|
||||||
def create_evaluation_callback(nlp, optimizer, corpus, cfg):
|
def create_evaluation_callback(nlp, optimizer, corpus, cfg):
|
||||||
|
@ -350,13 +351,14 @@ def create_evaluation_callback(nlp, optimizer, corpus, cfg):
|
||||||
)
|
)
|
||||||
|
|
||||||
n_words = sum(len(ex.predicted) for ex in dev_examples)
|
n_words = sum(len(ex.predicted) for ex in dev_examples)
|
||||||
|
batch_size = cfg.get("evaluation_batch_size", 128)
|
||||||
start_time = timer()
|
start_time = timer()
|
||||||
|
|
||||||
if optimizer.averages:
|
if optimizer.averages:
|
||||||
with nlp.use_params(optimizer.averages):
|
with nlp.use_params(optimizer.averages):
|
||||||
scorer = nlp.evaluate(dev_examples, batch_size=32)
|
scorer = nlp.evaluate(dev_examples, batch_size=batch_size)
|
||||||
else:
|
else:
|
||||||
scorer = nlp.evaluate(dev_examples, batch_size=32)
|
scorer = nlp.evaluate(dev_examples, batch_size=batch_size)
|
||||||
end_time = timer()
|
end_time = timer()
|
||||||
wps = n_words / (end_time - start_time)
|
wps = n_words / (end_time - start_time)
|
||||||
scores = scorer.scores
|
scores = scorer.scores
|
||||||
|
@ -479,7 +481,7 @@ def train_while_improving(
|
||||||
if patience and (step - best_step) >= patience:
|
if patience and (step - best_step) >= patience:
|
||||||
break
|
break
|
||||||
# Stop if we've exhausted our max steps (if specified)
|
# Stop if we've exhausted our max steps (if specified)
|
||||||
if max_steps and (step * accumulate_gradient) >= max_steps:
|
if max_steps and step >= max_steps:
|
||||||
break
|
break
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -45,18 +45,22 @@ class Corpus:
|
||||||
|
|
||||||
def make_examples(self, nlp, reference_docs, max_length=0):
|
def make_examples(self, nlp, reference_docs, max_length=0):
|
||||||
for reference in reference_docs:
|
for reference in reference_docs:
|
||||||
if len(reference) >= max_length >= 1:
|
if len(reference) == 0:
|
||||||
if reference.is_sentenced:
|
continue
|
||||||
for ref_sent in reference.sents:
|
elif max_length == 0 or len(reference) < max_length:
|
||||||
yield Example(
|
|
||||||
nlp.make_doc(ref_sent.text),
|
|
||||||
ref_sent.as_doc()
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
yield Example(
|
yield Example(
|
||||||
nlp.make_doc(reference.text),
|
nlp.make_doc(reference.text),
|
||||||
reference
|
reference
|
||||||
)
|
)
|
||||||
|
elif reference.is_sentenced:
|
||||||
|
for ref_sent in reference.sents:
|
||||||
|
if len(ref_sent) == 0:
|
||||||
|
continue
|
||||||
|
elif max_length == 0 or len(ref_sent) < max_length:
|
||||||
|
yield Example(
|
||||||
|
nlp.make_doc(ref_sent.text),
|
||||||
|
ref_sent.as_doc()
|
||||||
|
)
|
||||||
|
|
||||||
def make_examples_gold_preproc(self, nlp, reference_docs):
|
def make_examples_gold_preproc(self, nlp, reference_docs):
|
||||||
for reference in reference_docs:
|
for reference in reference_docs:
|
||||||
|
@ -65,7 +69,7 @@ class Corpus:
|
||||||
else:
|
else:
|
||||||
ref_sents = [reference]
|
ref_sents = [reference]
|
||||||
for ref_sent in ref_sents:
|
for ref_sent in ref_sents:
|
||||||
yield Example(
|
eg = Example(
|
||||||
Doc(
|
Doc(
|
||||||
nlp.vocab,
|
nlp.vocab,
|
||||||
words=[w.text for w in ref_sent],
|
words=[w.text for w in ref_sent],
|
||||||
|
@ -73,6 +77,8 @@ class Corpus:
|
||||||
),
|
),
|
||||||
ref_sent
|
ref_sent
|
||||||
)
|
)
|
||||||
|
if len(eg.x):
|
||||||
|
yield eg
|
||||||
|
|
||||||
def read_docbin(self, vocab, locs):
|
def read_docbin(self, vocab, locs):
|
||||||
""" Yield training examples as example dicts """
|
""" Yield training examples as example dicts """
|
||||||
|
|
|
@ -110,6 +110,7 @@ def init(model, X=None, Y=None):
|
||||||
|
|
||||||
ops = model.ops
|
ops = model.ops
|
||||||
W = normal_init(ops, W.shape, mean=float(ops.xp.sqrt(1.0 / nF * nI)))
|
W = normal_init(ops, W.shape, mean=float(ops.xp.sqrt(1.0 / nF * nI)))
|
||||||
|
pad = normal_init(ops, pad.shape, mean=1.0)
|
||||||
model.set_param("W", W)
|
model.set_param("W", W)
|
||||||
model.set_param("b", b)
|
model.set_param("b", b)
|
||||||
model.set_param("pad", pad)
|
model.set_param("pad", pad)
|
||||||
|
|
|
@ -339,6 +339,7 @@ cdef class precompute_hiddens:
|
||||||
cdef readonly int nF, nO, nP
|
cdef readonly int nF, nO, nP
|
||||||
cdef bint _is_synchronized
|
cdef bint _is_synchronized
|
||||||
cdef public object ops
|
cdef public object ops
|
||||||
|
cdef public object numpy_ops
|
||||||
cdef np.ndarray _features
|
cdef np.ndarray _features
|
||||||
cdef np.ndarray _cached
|
cdef np.ndarray _cached
|
||||||
cdef np.ndarray bias
|
cdef np.ndarray bias
|
||||||
|
@ -368,6 +369,7 @@ cdef class precompute_hiddens:
|
||||||
self.nP = 1
|
self.nP = 1
|
||||||
self.nO = cached.shape[2]
|
self.nO = cached.shape[2]
|
||||||
self.ops = lower_model.ops
|
self.ops = lower_model.ops
|
||||||
|
self.numpy_ops = NumpyOps()
|
||||||
assert activation in (None, "relu", "maxout")
|
assert activation in (None, "relu", "maxout")
|
||||||
self.activation = activation
|
self.activation = activation
|
||||||
self._is_synchronized = False
|
self._is_synchronized = False
|
||||||
|
@ -446,44 +448,32 @@ cdef class precompute_hiddens:
|
||||||
return state_vector, backward
|
return state_vector, backward
|
||||||
|
|
||||||
def _nonlinearity(self, state_vector):
|
def _nonlinearity(self, state_vector):
|
||||||
if isinstance(state_vector, numpy.ndarray):
|
|
||||||
ops = NumpyOps()
|
|
||||||
else:
|
|
||||||
ops = CupyOps()
|
|
||||||
|
|
||||||
if self.activation == "maxout":
|
if self.activation == "maxout":
|
||||||
state_vector, mask = ops.maxout(state_vector)
|
return self._maxout_nonlinearity(state_vector)
|
||||||
else:
|
else:
|
||||||
state_vector = state_vector.reshape(state_vector.shape[:-1])
|
return self._relu_nonlinearity(state_vector)
|
||||||
if self.activation == "relu":
|
|
||||||
mask = state_vector >= 0.
|
|
||||||
state_vector *= mask
|
|
||||||
else:
|
|
||||||
mask = None
|
|
||||||
|
|
||||||
def backprop_nonlinearity(d_best):
|
def _maxout_nonlinearity(self, state_vector):
|
||||||
if isinstance(d_best, numpy.ndarray):
|
state_vector, mask = self.numpy_ops.maxout(state_vector)
|
||||||
ops = NumpyOps()
|
# We're outputting to CPU, but we need this variable on GPU for the
|
||||||
else:
|
# backward pass.
|
||||||
ops = CupyOps()
|
mask = self.ops.asarray(mask)
|
||||||
if mask is not None:
|
|
||||||
mask_ = ops.asarray(mask)
|
def backprop_maxout(d_best):
|
||||||
# This will usually be on GPU
|
return self.ops.backprop_maxout(d_best, mask, self.nP)
|
||||||
d_best = ops.asarray(d_best)
|
|
||||||
# Fix nans (which can occur from unseen classes.)
|
return state_vector, backprop_maxout
|
||||||
try:
|
|
||||||
d_best[ops.xp.isnan(d_best)] = 0.
|
def _relu_nonlinearity(self, state_vector):
|
||||||
except:
|
state_vector = state_vector.reshape((state_vector.shape[0], -1))
|
||||||
print(ops.xp.isnan(d_best))
|
mask = state_vector >= 0.
|
||||||
raise
|
state_vector *= mask
|
||||||
if self.activation == "maxout":
|
# We're outputting to CPU, but we need this variable on GPU for the
|
||||||
mask_ = ops.asarray(mask)
|
# backward pass.
|
||||||
return ops.backprop_maxout(d_best, mask_, self.nP)
|
mask = self.ops.asarray(mask)
|
||||||
elif self.activation == "relu":
|
|
||||||
mask_ = ops.asarray(mask)
|
def backprop_relu(d_best):
|
||||||
d_best *= mask_
|
d_best *= mask
|
||||||
d_best = d_best.reshape((d_best.shape + (1,)))
|
return d_best.reshape((d_best.shape + (1,)))
|
||||||
return d_best
|
|
||||||
else:
|
return state_vector, backprop_relu
|
||||||
return d_best.reshape((d_best.shape + (1,)))
|
|
||||||
return state_vector, backprop_nonlinearity
|
|
||||||
|
|
|
@ -742,21 +742,14 @@ cdef class ArcEager(TransitionSystem):
|
||||||
if n_gold < 1:
|
if n_gold < 1:
|
||||||
raise ValueError
|
raise ValueError
|
||||||
|
|
||||||
def get_oracle_sequence(self, Example example):
|
def get_oracle_sequence_from_state(self, StateClass state, ArcEagerGold gold, _debug=None):
|
||||||
cdef StateClass state
|
cdef int i
|
||||||
cdef ArcEagerGold gold
|
|
||||||
states, golds, n_steps = self.init_gold_batch([example])
|
|
||||||
if not golds:
|
|
||||||
return []
|
|
||||||
|
|
||||||
cdef Pool mem = Pool()
|
cdef Pool mem = Pool()
|
||||||
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
||||||
assert self.n_moves > 0
|
assert self.n_moves > 0
|
||||||
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
||||||
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
||||||
|
|
||||||
state = states[0]
|
|
||||||
gold = golds[0]
|
|
||||||
history = []
|
history = []
|
||||||
debug_log = []
|
debug_log = []
|
||||||
failed = False
|
failed = False
|
||||||
|
@ -772,18 +765,21 @@ cdef class ArcEager(TransitionSystem):
|
||||||
history.append(i)
|
history.append(i)
|
||||||
s0 = state.S(0)
|
s0 = state.S(0)
|
||||||
b0 = state.B(0)
|
b0 = state.B(0)
|
||||||
debug_log.append(" ".join((
|
if _debug:
|
||||||
self.get_class_name(i),
|
example = _debug
|
||||||
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
debug_log.append(" ".join((
|
||||||
"B0=", (example.x[b0].text if b0 >= 0 else "__"),
|
self.get_class_name(i),
|
||||||
"S0 head?", str(state.has_head(state.S(0))),
|
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
||||||
)))
|
"B0=", (example.x[b0].text if b0 >= 0 else "__"),
|
||||||
|
"S0 head?", str(state.has_head(state.S(0))),
|
||||||
|
)))
|
||||||
action.do(state.c, action.label)
|
action.do(state.c, action.label)
|
||||||
break
|
break
|
||||||
else:
|
else:
|
||||||
failed = False
|
failed = False
|
||||||
break
|
break
|
||||||
if failed:
|
if failed:
|
||||||
|
example = _debug
|
||||||
print("Actions")
|
print("Actions")
|
||||||
for i in range(self.n_moves):
|
for i in range(self.n_moves):
|
||||||
print(self.get_class_name(i))
|
print(self.get_class_name(i))
|
||||||
|
|
|
@ -63,7 +63,9 @@ cdef class Parser:
|
||||||
self.model = model
|
self.model = model
|
||||||
if self.moves.n_moves != 0:
|
if self.moves.n_moves != 0:
|
||||||
self.set_output(self.moves.n_moves)
|
self.set_output(self.moves.n_moves)
|
||||||
self.cfg = cfg
|
self.cfg = dict(cfg)
|
||||||
|
self.cfg.setdefault("update_with_oracle_cut_size", 100)
|
||||||
|
self.cfg.setdefault("normalize_gradients_with_batch_size", True)
|
||||||
self._multitasks = []
|
self._multitasks = []
|
||||||
for multitask in cfg.get("multitasks", []):
|
for multitask in cfg.get("multitasks", []):
|
||||||
self.add_multitask_objective(multitask)
|
self.add_multitask_objective(multitask)
|
||||||
|
@ -263,22 +265,32 @@ cdef class Parser:
|
||||||
free(is_valid)
|
free(is_valid)
|
||||||
|
|
||||||
def update(self, examples, drop=0., set_annotations=False, sgd=None, losses=None):
|
def update(self, examples, drop=0., set_annotations=False, sgd=None, losses=None):
|
||||||
|
cdef StateClass state
|
||||||
if losses is None:
|
if losses is None:
|
||||||
losses = {}
|
losses = {}
|
||||||
losses.setdefault(self.name, 0.)
|
losses.setdefault(self.name, 0.)
|
||||||
for multitask in self._multitasks:
|
for multitask in self._multitasks:
|
||||||
multitask.update(examples, drop=drop, sgd=sgd)
|
multitask.update(examples, drop=drop, sgd=sgd)
|
||||||
|
n_examples = len([eg for eg in examples if self.moves.has_gold(eg)])
|
||||||
|
if n_examples == 0:
|
||||||
|
return losses
|
||||||
set_dropout_rate(self.model, drop)
|
set_dropout_rate(self.model, drop)
|
||||||
# Prepare the stepwise model, and get the callback for finishing the batch
|
# Prepare the stepwise model, and get the callback for finishing the batch
|
||||||
model, backprop_tok2vec = self.model.begin_update(
|
model, backprop_tok2vec = self.model.begin_update(
|
||||||
[eg.predicted for eg in examples])
|
[eg.predicted for eg in examples])
|
||||||
# Chop sequences into lengths of this many transitions, to make the
|
if self.cfg["update_with_oracle_cut_size"] >= 1:
|
||||||
# batch uniform length. We randomize this to overfit less.
|
# Chop sequences into lengths of this many transitions, to make the
|
||||||
cut_gold = numpy.random.choice(range(20, 100))
|
# batch uniform length. We randomize this to overfit less.
|
||||||
states, golds, max_steps = self._init_gold_batch(
|
cut_size = self.cfg["update_with_oracle_cut_size"]
|
||||||
examples,
|
states, golds, max_steps = self._init_gold_batch(
|
||||||
max_length=cut_gold
|
examples,
|
||||||
)
|
max_length=numpy.random.choice(range(5, cut_size))
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
states, golds, _ = self.moves.init_gold_batch(examples)
|
||||||
|
max_steps = max([len(eg.x) for eg in examples])
|
||||||
|
if not states:
|
||||||
|
return losses
|
||||||
all_states = list(states)
|
all_states = list(states)
|
||||||
states_golds = zip(states, golds)
|
states_golds = zip(states, golds)
|
||||||
for _ in range(max_steps):
|
for _ in range(max_steps):
|
||||||
|
@ -287,6 +299,17 @@ cdef class Parser:
|
||||||
states, golds = zip(*states_golds)
|
states, golds = zip(*states_golds)
|
||||||
scores, backprop = model.begin_update(states)
|
scores, backprop = model.begin_update(states)
|
||||||
d_scores = self.get_batch_loss(states, golds, scores, losses)
|
d_scores = self.get_batch_loss(states, golds, scores, losses)
|
||||||
|
if self.cfg["normalize_gradients_with_batch_size"]:
|
||||||
|
# We have to be very careful how we do this, because of the way we
|
||||||
|
# cut up the batch. We subdivide long sequences. If we normalize
|
||||||
|
# naively, we end up normalizing by sequence length, which
|
||||||
|
# is bad: that would mean that states in long sequences
|
||||||
|
# consistently get smaller gradients. Imagine if we have two
|
||||||
|
# sequences, one length 1000, one length 20. If we cut up
|
||||||
|
# the 1k sequence so that we have a "batch" of 50 subsequences,
|
||||||
|
# we don't want the gradients to get 50 times smaller!
|
||||||
|
d_scores /= n_examples
|
||||||
|
|
||||||
backprop(d_scores)
|
backprop(d_scores)
|
||||||
# Follow the predicted action
|
# Follow the predicted action
|
||||||
self.transition_states(states, scores)
|
self.transition_states(states, scores)
|
||||||
|
@ -384,8 +407,6 @@ cdef class Parser:
|
||||||
cpu_log_loss(c_d_scores,
|
cpu_log_loss(c_d_scores,
|
||||||
costs, is_valid, &scores[i, 0], d_scores.shape[1])
|
costs, is_valid, &scores[i, 0], d_scores.shape[1])
|
||||||
c_d_scores += d_scores.shape[1]
|
c_d_scores += d_scores.shape[1]
|
||||||
if len(states):
|
|
||||||
d_scores /= len(states)
|
|
||||||
if losses is not None:
|
if losses is not None:
|
||||||
losses.setdefault(self.name, 0.)
|
losses.setdefault(self.name, 0.)
|
||||||
losses[self.name] += (d_scores**2).sum()
|
losses[self.name] += (d_scores**2).sum()
|
||||||
|
@ -428,7 +449,7 @@ cdef class Parser:
|
||||||
if component is self:
|
if component is self:
|
||||||
break
|
break
|
||||||
if hasattr(component, "pipe"):
|
if hasattr(component, "pipe"):
|
||||||
doc_sample = list(component.pipe(doc_sample))
|
doc_sample = list(component.pipe(doc_sample, batch_size=8))
|
||||||
else:
|
else:
|
||||||
doc_sample = [component(doc) for doc in doc_sample]
|
doc_sample = [component(doc) for doc in doc_sample]
|
||||||
if doc_sample:
|
if doc_sample:
|
||||||
|
@ -498,40 +519,49 @@ cdef class Parser:
|
||||||
return self
|
return self
|
||||||
|
|
||||||
def _init_gold_batch(self, examples, min_length=5, max_length=500):
|
def _init_gold_batch(self, examples, min_length=5, max_length=500):
|
||||||
"""Make a square batch, of length equal to the shortest doc. A long
|
"""Make a square batch, of length equal to the shortest transition
|
||||||
|
sequence or a cap. A long
|
||||||
doc will get multiple states. Let's say we have a doc of length 2*N,
|
doc will get multiple states. Let's say we have a doc of length 2*N,
|
||||||
where N is the shortest doc. We'll make two states, one representing
|
where N is the shortest doc. We'll make two states, one representing
|
||||||
long_doc[:N], and another representing long_doc[N:]."""
|
long_doc[:N], and another representing long_doc[N:]."""
|
||||||
cdef:
|
cdef:
|
||||||
|
StateClass start_state
|
||||||
StateClass state
|
StateClass state
|
||||||
Transition action
|
Transition action
|
||||||
all_states = self.moves.init_batch([eg.predicted for eg in examples])
|
all_states = self.moves.init_batch([eg.predicted for eg in examples])
|
||||||
kept = []
|
kept = []
|
||||||
|
max_length_seen = 0
|
||||||
for state, eg in zip(all_states, examples):
|
for state, eg in zip(all_states, examples):
|
||||||
if self.moves.has_gold(eg) and not state.is_final():
|
if self.moves.has_gold(eg) and not state.is_final():
|
||||||
gold = self.moves.init_gold(state, eg)
|
gold = self.moves.init_gold(state, eg)
|
||||||
kept.append((eg, state, gold))
|
oracle_actions = self.moves.get_oracle_sequence_from_state(
|
||||||
max_length = max(min_length, min(max_length, min([len(eg.x) for eg in examples])))
|
state.copy(), gold)
|
||||||
max_moves = 0
|
kept.append((eg, state, gold, oracle_actions))
|
||||||
|
min_length = min(min_length, len(oracle_actions))
|
||||||
|
max_length_seen = max(max_length, len(oracle_actions))
|
||||||
|
if not kept:
|
||||||
|
return [], [], 0
|
||||||
|
max_length = max(min_length, min(max_length, max_length_seen))
|
||||||
states = []
|
states = []
|
||||||
golds = []
|
golds = []
|
||||||
for eg, state, gold in kept:
|
cdef int clas
|
||||||
oracle_actions = self.moves.get_oracle_sequence(eg)
|
max_moves = 0
|
||||||
start = 0
|
for eg, state, gold, oracle_actions in kept:
|
||||||
while start < len(eg.predicted):
|
for i in range(0, len(oracle_actions), max_length):
|
||||||
state = state.copy()
|
start_state = state.copy()
|
||||||
n_moves = 0
|
n_moves = 0
|
||||||
while state.B(0) < start and not state.is_final():
|
for clas in oracle_actions[i:i+max_length]:
|
||||||
action = self.moves.c[oracle_actions.pop(0)]
|
action = self.moves.c[clas]
|
||||||
action.do(state.c, action.label)
|
action.do(state.c, action.label)
|
||||||
state.c.push_hist(action.clas)
|
state.c.push_hist(action.clas)
|
||||||
n_moves += 1
|
n_moves += 1
|
||||||
has_gold = self.moves.has_gold(eg, start=start,
|
if state.is_final():
|
||||||
end=start+max_length)
|
break
|
||||||
if not state.is_final() and has_gold:
|
max_moves = max(max_moves, n_moves)
|
||||||
states.append(state)
|
if self.moves.has_gold(eg, start_state.B(0), state.B(0)):
|
||||||
|
states.append(start_state)
|
||||||
golds.append(gold)
|
golds.append(gold)
|
||||||
max_moves = max(max_moves, n_moves)
|
max_moves = max(max_moves, n_moves)
|
||||||
start += min(max_length, len(eg.x)-start)
|
if state.is_final():
|
||||||
max_moves = max(max_moves, len(oracle_actions))
|
break
|
||||||
return states, golds, max_moves
|
return states, golds, max_moves
|
||||||
|
|
|
@ -60,20 +60,25 @@ cdef class TransitionSystem:
|
||||||
states.append(state)
|
states.append(state)
|
||||||
offset += len(doc)
|
offset += len(doc)
|
||||||
return states
|
return states
|
||||||
|
|
||||||
def get_oracle_sequence(self, Example example, _debug=False):
|
def get_oracle_sequence(self, Example example, _debug=False):
|
||||||
|
states, golds, _ = self.init_gold_batch([example])
|
||||||
|
if not states:
|
||||||
|
return []
|
||||||
|
state = states[0]
|
||||||
|
gold = golds[0]
|
||||||
|
if _debug:
|
||||||
|
return self.get_oracle_sequence_from_state(state, gold, _debug=example)
|
||||||
|
else:
|
||||||
|
return self.get_oracle_sequence_from_state(state, gold)
|
||||||
|
|
||||||
|
def get_oracle_sequence_from_state(self, StateClass state, gold, _debug=None):
|
||||||
cdef Pool mem = Pool()
|
cdef Pool mem = Pool()
|
||||||
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
||||||
assert self.n_moves > 0
|
assert self.n_moves > 0
|
||||||
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
||||||
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
||||||
|
|
||||||
cdef StateClass state
|
|
||||||
states, golds, n_steps = self.init_gold_batch([example])
|
|
||||||
if not states:
|
|
||||||
return []
|
|
||||||
state = states[0]
|
|
||||||
gold = golds[0]
|
|
||||||
history = []
|
history = []
|
||||||
debug_log = []
|
debug_log = []
|
||||||
while not state.is_final():
|
while not state.is_final():
|
||||||
|
@ -82,9 +87,10 @@ cdef class TransitionSystem:
|
||||||
if is_valid[i] and costs[i] <= 0:
|
if is_valid[i] and costs[i] <= 0:
|
||||||
action = self.c[i]
|
action = self.c[i]
|
||||||
history.append(i)
|
history.append(i)
|
||||||
s0 = state.S(0)
|
|
||||||
b0 = state.B(0)
|
|
||||||
if _debug:
|
if _debug:
|
||||||
|
s0 = state.S(0)
|
||||||
|
b0 = state.B(0)
|
||||||
|
example = _debug
|
||||||
debug_log.append(" ".join((
|
debug_log.append(" ".join((
|
||||||
self.get_class_name(i),
|
self.get_class_name(i),
|
||||||
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
||||||
|
@ -95,6 +101,7 @@ cdef class TransitionSystem:
|
||||||
break
|
break
|
||||||
else:
|
else:
|
||||||
if _debug:
|
if _debug:
|
||||||
|
example = _debug
|
||||||
print("Actions")
|
print("Actions")
|
||||||
for i in range(self.n_moves):
|
for i in range(self.n_moves):
|
||||||
print(self.get_class_name(i))
|
print(self.get_class_name(i))
|
||||||
|
|
|
@ -91,7 +91,7 @@ Match a stream of documents, yielding them in turn.
|
||||||
> ```python
|
> ```python
|
||||||
> from spacy.matcher import PhraseMatcher
|
> from spacy.matcher import PhraseMatcher
|
||||||
> matcher = PhraseMatcher(nlp.vocab)
|
> matcher = PhraseMatcher(nlp.vocab)
|
||||||
> for doc in matcher.pipe(texts, batch_size=50):
|
> for doc in matcher.pipe(docs, batch_size=50):
|
||||||
> pass
|
> pass
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
|
|
@ -46,19 +46,19 @@ Update the evaluation scores from a single [`Doc`](/api/doc) /
|
||||||
|
|
||||||
## Properties
|
## Properties
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| --------------------------------------------------- | ----- | ---------------------------------------------------------------------------------------------------------- |
|
| --------------------------------------------------- | ----- | -------------------------------------------------------------------------------------- |
|
||||||
| `token_acc` | float | Tokenization accuracy. |
|
| `token_acc` | float | Tokenization accuracy. |
|
||||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||||
| `uas` | float | Unlabelled dependency score. |
|
| `uas` | float | Unlabelled dependency score. |
|
||||||
| `las` | float | Labelled dependency score. |
|
| `las` | float | Labelled dependency score. |
|
||||||
| `ents_p` | float | Named entity accuracy (precision). |
|
| `ents_p` | float | Named entity accuracy (precision). |
|
||||||
| `ents_r` | float | Named entity accuracy (recall). |
|
| `ents_r` | float | Named entity accuracy (recall). |
|
||||||
| `ents_f` | float | Named entity accuracy (F-score). |
|
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||||
| `ents_per_type` <Tag variant="new">2.1.5</Tag> | dict | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores. |
|
| `ents_per_type` <Tag variant="new">2.1.5</Tag> | dict | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores. |
|
||||||
| `textcat_f` <Tag variant="new">3.0</Tag> | float | F-score on positive label for binary classification, macro-averaged F-score otherwise. |
|
| `textcat_f` <Tag variant="new">3.0</Tag> | float | F-score on positive label for binary classification, macro-averaged F-score otherwise. |
|
||||||
| `textcat_auc` <Tag variant="new"3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
| `textcat_auc` <Tag variant="new">3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
||||||
| `textcats_f_per_cat` <Tag variant="new">3.0</Tag> | dict | F-scores per textcat label, keyed by label. |
|
| `textcats_f_per_cat` <Tag variant="new">3.0</Tag> | dict | F-scores per textcat label, keyed by label. |
|
||||||
| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict | ROC AUC scores per textcat label, keyed by label. |
|
| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict | ROC AUC scores per textcat label, keyed by label. |
|
||||||
| `las_per_type` <Tag variant="new">2.2.3</Tag> | dict | Labelled dependency scores, keyed by label. |
|
| `las_per_type` <Tag variant="new">2.2.3</Tag> | dict | Labelled dependency scores, keyed by label. |
|
||||||
| `scores` | dict | All scores, keyed by type. |
|
| `scores` | dict | All scores, keyed by type. |
|
||||||
|
|
|
@ -122,7 +122,7 @@ for match_id, start, end in matches:
|
||||||
```
|
```
|
||||||
|
|
||||||
The matcher returns a list of `(match_id, start, end)` tuples – in this case,
|
The matcher returns a list of `(match_id, start, end)` tuples – in this case,
|
||||||
`[('15578876784678163569', 0, 2)]`, which maps to the span `doc[0:2]` of our
|
`[('15578876784678163569', 0, 3)]`, which maps to the span `doc[0:3]` of our
|
||||||
original document. The `match_id` is the [hash value](/usage/spacy-101#vocab) of
|
original document. The `match_id` is the [hash value](/usage/spacy-101#vocab) of
|
||||||
the string ID "HelloWorld". To get the string value, you can look up the ID in
|
the string ID "HelloWorld". To get the string value, you can look up the ID in
|
||||||
the [`StringStore`](/api/stringstore).
|
the [`StringStore`](/api/stringstore).
|
||||||
|
|
|
@ -161,10 +161,18 @@ debugging your tokenizer configuration.
|
||||||
|
|
||||||
spaCy's custom warnings have been replaced with native Python
|
spaCy's custom warnings have been replaced with native Python
|
||||||
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
||||||
setting `SPACY_WARNING_IGNORE`, use the
|
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
|
||||||
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||||
to manage warnings.
|
to manage warnings.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
import spacy
|
||||||
|
+ import warnings
|
||||||
|
|
||||||
|
- spacy.errors.SPACY_WARNING_IGNORE.append('W007')
|
||||||
|
+ warnings.filterwarnings("ignore", message=r"\\[W007\\]", category=UserWarning)
|
||||||
|
```
|
||||||
|
|
||||||
#### Normalization tables
|
#### Normalization tables
|
||||||
|
|
||||||
The normalization tables have moved from the language data in
|
The normalization tables have moved from the language data in
|
||||||
|
@ -174,6 +182,65 @@ If you're adding data for a new language, the normalization table should be
|
||||||
added to `spacy-lookups-data`. See
|
added to `spacy-lookups-data`. See
|
||||||
[adding norm exceptions](/usage/adding-languages#norm-exceptions).
|
[adding norm exceptions](/usage/adding-languages#norm-exceptions).
|
||||||
|
|
||||||
|
#### No preloaded vocab for models with vectors
|
||||||
|
|
||||||
|
To reduce the initial loading time, the lexemes in `nlp.vocab` are no longer
|
||||||
|
loaded on initialization for models with vectors. As you process texts, the
|
||||||
|
lexemes will be added to the vocab automatically, just as in small models
|
||||||
|
without vectors.
|
||||||
|
|
||||||
|
To see the number of unique vectors and number of words with vectors, see
|
||||||
|
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
|
||||||
|
unique vectors and `684830` words with vectors:
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
'width': 300,
|
||||||
|
'vectors': 20000,
|
||||||
|
'keys': 684830,
|
||||||
|
'name': 'en_core_web_md.vectors'
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If required, for instance if you are working directly with word vectors rather
|
||||||
|
than processing texts, you can load all lexemes for words with vectors at once:
|
||||||
|
|
||||||
|
```python
|
||||||
|
for orth in nlp.vocab.vectors:
|
||||||
|
_ = nlp.vocab[orth]
|
||||||
|
```
|
||||||
|
|
||||||
|
If your workflow previously iterated over `nlp.vocab`, a similar alternative
|
||||||
|
is to iterate over words with vectors instead:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- lexemes = [w for w in nlp.vocab]
|
||||||
|
+ lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
|
||||||
|
```
|
||||||
|
|
||||||
|
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
||||||
|
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
||||||
|
provided lexemes but only 685K words with vectors. The vectors have been
|
||||||
|
updated for most languages in v2.2, but the English models contain the same
|
||||||
|
vectors for both v2.2 and v2.3.
|
||||||
|
|
||||||
|
#### Lexeme.is_oov and Token.is_oov
|
||||||
|
|
||||||
|
<Infobox title="Important note" variant="warning">
|
||||||
|
|
||||||
|
Due to a bug, the values for `is_oov` are reversed in v2.3.0, but this will be
|
||||||
|
fixed in the next patch release v2.3.1.
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
|
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
||||||
|
have a word vector. This is equivalent to `token.orth not in
|
||||||
|
nlp.vocab.vectors`.
|
||||||
|
|
||||||
|
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
||||||
|
probability and cluster features. The probability and cluster features are no
|
||||||
|
longer included in the provided medium and large models (see the next section).
|
||||||
|
|
||||||
#### Probability and cluster features
|
#### Probability and cluster features
|
||||||
|
|
||||||
> #### Load and save extra prob lookups table
|
> #### Load and save extra prob lookups table
|
||||||
|
@ -201,6 +268,28 @@ model vocab, which will take a few seconds on initial loading. When you save
|
||||||
this model after loading the `prob` table, the full `prob` table will be saved
|
this model after loading the `prob` table, the full `prob` table will be saved
|
||||||
as part of the model vocab.
|
as part of the model vocab.
|
||||||
|
|
||||||
|
To load the probability table into a provided model, first make sure you have
|
||||||
|
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
||||||
|
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
|
||||||
|
table from `spacy-lookups-data`:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
+ # prerequisite: pip install spacy-lookups-data
|
||||||
|
import spacy
|
||||||
|
|
||||||
|
nlp = spacy.load("en_core_web_md")
|
||||||
|
|
||||||
|
# remove the empty placeholder prob table
|
||||||
|
+ if nlp.vocab.lookups_extra.has_table("lexeme_prob"):
|
||||||
|
+ nlp.vocab.lookups_extra.remove_table("lexeme_prob")
|
||||||
|
|
||||||
|
# access any `.prob` to load the full table into the model
|
||||||
|
assert nlp.vocab["a"].prob == -3.9297883511
|
||||||
|
|
||||||
|
# if desired, save this model with the probability table included
|
||||||
|
nlp.to_disk("/path/to/model")
|
||||||
|
```
|
||||||
|
|
||||||
If you'd like to include custom `cluster`, `prob`, or `sentiment` tables as part
|
If you'd like to include custom `cluster`, `prob`, or `sentiment` tables as part
|
||||||
of a new model, add the data to
|
of a new model, add the data to
|
||||||
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) under
|
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) under
|
||||||
|
@ -218,3 +307,39 @@ When you initialize a new model with [`spacy init-model`](/api/cli#init-model),
|
||||||
the `prob` table from `spacy-lookups-data` may be loaded as part of the
|
the `prob` table from `spacy-lookups-data` may be loaded as part of the
|
||||||
initialization. If you'd like to omit this extra data as in spaCy's provided
|
initialization. If you'd like to omit this extra data as in spaCy's provided
|
||||||
v2.3 models, use the new flag `--omit-extra-lookups`.
|
v2.3 models, use the new flag `--omit-extra-lookups`.
|
||||||
|
|
||||||
|
#### Tag maps in provided models vs. blank models
|
||||||
|
|
||||||
|
The tag maps in the provided models may differ from the tag maps in the spaCy
|
||||||
|
library. You can access the tag map in a loaded model under
|
||||||
|
`nlp.vocab.morphology.tag_map`.
|
||||||
|
|
||||||
|
The tag map from `spacy.lang.lg.tag_map` is still used when a blank model is
|
||||||
|
initialized. If you want to provide an alternate tag map, update
|
||||||
|
`nlp.vocab.morphology.tag_map` after initializing the model or if you're using
|
||||||
|
the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
|
||||||
|
provide in the tag map as a JSON dict.
|
||||||
|
|
||||||
|
If you want to export a tag map from a provided model for use with the train
|
||||||
|
CLI, you can save it as a JSON dict. To only use string keys as required by
|
||||||
|
JSON and to make it easier to read and edit, any internal integer IDs need to
|
||||||
|
be converted back to strings:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import spacy
|
||||||
|
import srsly
|
||||||
|
|
||||||
|
nlp = spacy.load("en_core_web_sm")
|
||||||
|
tag_map = {}
|
||||||
|
|
||||||
|
# convert any integer IDs to strings for JSON
|
||||||
|
for tag, morph in nlp.vocab.morphology.tag_map.items():
|
||||||
|
tag_map[tag] = {}
|
||||||
|
for feat, val in morph.items():
|
||||||
|
feat = nlp.vocab.strings.as_string(feat)
|
||||||
|
if not isinstance(val, bool):
|
||||||
|
val = nlp.vocab.strings.as_string(val)
|
||||||
|
tag_map[tag][feat] = val
|
||||||
|
|
||||||
|
srsly.write_json("tag_map.json", tag_map)
|
||||||
|
```
|
||||||
|
|
17
website/docs/usage/v3.md
Normal file
17
website/docs/usage/v3.md
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
---
|
||||||
|
title: What's New in v3.0
|
||||||
|
teaser: New features, backwards incompatibilities and migration guide
|
||||||
|
menu:
|
||||||
|
- ['Summary', 'summary']
|
||||||
|
- ['New Features', 'features']
|
||||||
|
- ['Backwards Incompatibilities', 'incompat']
|
||||||
|
- ['Migrating from v2.x', 'migrating']
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary {#summary}
|
||||||
|
|
||||||
|
## New Features {#features}
|
||||||
|
|
||||||
|
## Backwards Incompatibilities {#incompat}
|
||||||
|
|
||||||
|
## Migrating from v2.x {#migrating}
|
|
@ -15,6 +15,11 @@ const universe = require('./meta/universe.json')
|
||||||
|
|
||||||
const DEFAULT_TEMPLATE = path.resolve('./src/templates/index.js')
|
const DEFAULT_TEMPLATE = path.resolve('./src/templates/index.js')
|
||||||
|
|
||||||
|
const isNightly = !!+process.env.SPACY_NIGHTLY || site.nightlyBranches.includes(process.env.BRANCH)
|
||||||
|
const favicon = isNightly ? `src/images/icon_nightly.png` : `src/images/icon.png`
|
||||||
|
const binderBranch = isNightly ? 'nightly' : site.binderBranch
|
||||||
|
const siteUrl = isNightly ? site.siteUrlNightly : site.siteUrl
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
siteMetadata: {
|
siteMetadata: {
|
||||||
...site,
|
...site,
|
||||||
|
@ -22,6 +27,9 @@ module.exports = {
|
||||||
sidebars,
|
sidebars,
|
||||||
...models,
|
...models,
|
||||||
universe,
|
universe,
|
||||||
|
nightly: isNightly,
|
||||||
|
binderBranch,
|
||||||
|
siteUrl,
|
||||||
},
|
},
|
||||||
|
|
||||||
plugins: [
|
plugins: [
|
||||||
|
@ -128,7 +136,7 @@ module.exports = {
|
||||||
background_color: site.theme,
|
background_color: site.theme,
|
||||||
theme_color: site.theme,
|
theme_color: site.theme,
|
||||||
display: `minimal-ui`,
|
display: `minimal-ui`,
|
||||||
icon: `src/images/icon.png`,
|
icon: favicon,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -140,6 +148,23 @@ module.exports = {
|
||||||
respectDNT: true,
|
respectDNT: true,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
resolve: 'gatsby-plugin-robots-txt',
|
||||||
|
options: {
|
||||||
|
host: siteUrl,
|
||||||
|
sitemap: `${siteUrl}/sitemap.xml`,
|
||||||
|
// If we're in a special state (nightly, legacy) prevent indexing
|
||||||
|
resolveEnv: () => (isNightly ? 'development' : 'production'),
|
||||||
|
env: {
|
||||||
|
production: {
|
||||||
|
policy: [{ userAgent: '*', allow: '/' }],
|
||||||
|
},
|
||||||
|
development: {
|
||||||
|
policy: [{ userAgent: '*', disallow: ['/'] }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
`gatsby-plugin-offline`,
|
`gatsby-plugin-offline`,
|
||||||
],
|
],
|
||||||
}
|
}
|
||||||
|
|
|
@ -78,11 +78,14 @@
|
||||||
"name": "Japanese",
|
"name": "Japanese",
|
||||||
"models": ["ja_core_news_sm", "ja_core_news_md", "ja_core_news_lg"],
|
"models": ["ja_core_news_sm", "ja_core_news_md", "ja_core_news_lg"],
|
||||||
"dependencies": [
|
"dependencies": [
|
||||||
|
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
||||||
|
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
||||||
{
|
{
|
||||||
"name": "SudachiPy",
|
"name": "SudachiPy",
|
||||||
"url": "https://github.com/WorksApplications/SudachiPy"
|
"url": "https://github.com/WorksApplications/SudachiPy"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
"example": "これは文章です。",
|
||||||
"has_examples": true
|
"has_examples": true
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -191,17 +194,6 @@
|
||||||
"example": "นี่คือประโยค",
|
"example": "นี่คือประโยค",
|
||||||
"has_examples": true
|
"has_examples": true
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"code": "ja",
|
|
||||||
"name": "Japanese",
|
|
||||||
"dependencies": [
|
|
||||||
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
|
||||||
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
|
||||||
{ "name": "fugashi", "url": "https://github.com/polm/fugashi" }
|
|
||||||
],
|
|
||||||
"example": "これは文章です。",
|
|
||||||
"has_examples": true
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"code": "ko",
|
"code": "ko",
|
||||||
"name": "Korean",
|
"name": "Korean",
|
||||||
|
|
|
@ -8,11 +8,7 @@
|
||||||
{ "text": "Installation", "url": "/usage" },
|
{ "text": "Installation", "url": "/usage" },
|
||||||
{ "text": "Models & Languages", "url": "/usage/models" },
|
{ "text": "Models & Languages", "url": "/usage/models" },
|
||||||
{ "text": "Facts & Figures", "url": "/usage/facts-figures" },
|
{ "text": "Facts & Figures", "url": "/usage/facts-figures" },
|
||||||
{ "text": "spaCy 101", "url": "/usage/spacy-101" },
|
{ "text": "New in v3.0", "url": "/usage/v3" }
|
||||||
{ "text": "New in v2.3", "url": "/usage/v2-3" },
|
|
||||||
{ "text": "New in v2.2", "url": "/usage/v2-2" },
|
|
||||||
{ "text": "New in v2.1", "url": "/usage/v2-1" },
|
|
||||||
{ "text": "New in v2.0", "url": "/usage/v2" }
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
|
@ -3,6 +3,8 @@
|
||||||
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
|
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
|
||||||
"slogan": "Industrial-strength Natural Language Processing in Python",
|
"slogan": "Industrial-strength Natural Language Processing in Python",
|
||||||
"siteUrl": "https://spacy.io",
|
"siteUrl": "https://spacy.io",
|
||||||
|
"siteUrlNightly": "https://nightly.spacy.io",
|
||||||
|
"nightlyBranches": ["spacy.io-develop"],
|
||||||
"email": "contact@explosion.ai",
|
"email": "contact@explosion.ai",
|
||||||
"company": "Explosion AI",
|
"company": "Explosion AI",
|
||||||
"companyUrl": "https://explosion.ai",
|
"companyUrl": "https://explosion.ai",
|
||||||
|
|
13584
website/package-lock.json
generated
13584
website/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
|
@ -16,7 +16,7 @@
|
||||||
"autoprefixer": "^9.4.7",
|
"autoprefixer": "^9.4.7",
|
||||||
"classnames": "^2.2.6",
|
"classnames": "^2.2.6",
|
||||||
"codemirror": "^5.43.0",
|
"codemirror": "^5.43.0",
|
||||||
"gatsby": "^2.1.18",
|
"gatsby": "^2.11.1",
|
||||||
"gatsby-image": "^2.0.29",
|
"gatsby-image": "^2.0.29",
|
||||||
"gatsby-mdx": "^0.3.6",
|
"gatsby-mdx": "^0.3.6",
|
||||||
"gatsby-plugin-catch-links": "^2.0.11",
|
"gatsby-plugin-catch-links": "^2.0.11",
|
||||||
|
@ -25,6 +25,7 @@
|
||||||
"gatsby-plugin-offline": "^2.0.24",
|
"gatsby-plugin-offline": "^2.0.24",
|
||||||
"gatsby-plugin-react-helmet": "^3.0.6",
|
"gatsby-plugin-react-helmet": "^3.0.6",
|
||||||
"gatsby-plugin-react-svg": "^2.0.0",
|
"gatsby-plugin-react-svg": "^2.0.0",
|
||||||
|
"gatsby-plugin-robots-txt": "^1.5.1",
|
||||||
"gatsby-plugin-sass": "^2.0.10",
|
"gatsby-plugin-sass": "^2.0.10",
|
||||||
"gatsby-plugin-sharp": "^2.0.20",
|
"gatsby-plugin-sharp": "^2.0.20",
|
||||||
"gatsby-plugin-sitemap": "^2.0.5",
|
"gatsby-plugin-sitemap": "^2.0.5",
|
||||||
|
@ -52,6 +53,7 @@
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"build": "gatsby build",
|
"build": "gatsby build",
|
||||||
"dev": "gatsby develop",
|
"dev": "gatsby develop",
|
||||||
|
"dev:nightly": "BRANCH=spacy.io-develop npm run dev",
|
||||||
"lint": "eslint **",
|
"lint": "eslint **",
|
||||||
"clear": "rm -rf .cache",
|
"clear": "rm -rf .cache",
|
||||||
"test": "echo \"Write tests! -> https://gatsby.app/unit-testing\""
|
"test": "echo \"Write tests! -> https://gatsby.app/unit-testing\""
|
||||||
|
|
|
@ -27,7 +27,7 @@ Button.defaultProps = {
|
||||||
}
|
}
|
||||||
|
|
||||||
Button.propTypes = {
|
Button.propTypes = {
|
||||||
to: PropTypes.string.isRequired,
|
to: PropTypes.string,
|
||||||
variant: PropTypes.oneOf(['primary', 'secondary', 'tertiary']),
|
variant: PropTypes.oneOf(['primary', 'secondary', 'tertiary']),
|
||||||
large: PropTypes.bool,
|
large: PropTypes.bool,
|
||||||
icon: PropTypes.string,
|
icon: PropTypes.string,
|
||||||
|
|
|
@ -19,6 +19,7 @@ import { ReactComponent as NoIcon } from '../images/icons/no.svg'
|
||||||
import { ReactComponent as NeutralIcon } from '../images/icons/neutral.svg'
|
import { ReactComponent as NeutralIcon } from '../images/icons/neutral.svg'
|
||||||
import { ReactComponent as OfflineIcon } from '../images/icons/offline.svg'
|
import { ReactComponent as OfflineIcon } from '../images/icons/offline.svg'
|
||||||
import { ReactComponent as SearchIcon } from '../images/icons/search.svg'
|
import { ReactComponent as SearchIcon } from '../images/icons/search.svg'
|
||||||
|
import { ReactComponent as MoonIcon } from '../images/icons/moon.svg'
|
||||||
|
|
||||||
import classes from '../styles/icon.module.sass'
|
import classes from '../styles/icon.module.sass'
|
||||||
|
|
||||||
|
@ -41,6 +42,7 @@ const icons = {
|
||||||
neutral: NeutralIcon,
|
neutral: NeutralIcon,
|
||||||
offline: OfflineIcon,
|
offline: OfflineIcon,
|
||||||
search: SearchIcon,
|
search: SearchIcon,
|
||||||
|
moon: MoonIcon,
|
||||||
}
|
}
|
||||||
|
|
||||||
const Icon = ({ name, width, height, inline, variant, className }) => {
|
const Icon = ({ name, width, height, inline, variant, className }) => {
|
||||||
|
|
|
@ -2,7 +2,9 @@ import React, { Fragment } from 'react'
|
||||||
import classNames from 'classnames'
|
import classNames from 'classnames'
|
||||||
|
|
||||||
import pattern from '../images/pattern_blue.jpg'
|
import pattern from '../images/pattern_blue.jpg'
|
||||||
|
import patternNightly from '../images/pattern_nightly.jpg'
|
||||||
import patternOverlay from '../images/pattern_landing.jpg'
|
import patternOverlay from '../images/pattern_landing.jpg'
|
||||||
|
import patternOverlayNightly from '../images/pattern_landing_nightly.jpg'
|
||||||
import logoSvgs from '../images/logos'
|
import logoSvgs from '../images/logos'
|
||||||
|
|
||||||
import Grid from './grid'
|
import Grid from './grid'
|
||||||
|
@ -14,9 +16,10 @@ import Link from './link'
|
||||||
import { chunkArray } from './util'
|
import { chunkArray } from './util'
|
||||||
import classes from '../styles/landing.module.sass'
|
import classes from '../styles/landing.module.sass'
|
||||||
|
|
||||||
export const LandingHeader = ({ style = {}, children }) => {
|
export const LandingHeader = ({ nightly, style = {}, children }) => {
|
||||||
const wrapperStyle = { backgroundImage: `url(${pattern})` }
|
const overlay = nightly ? patternOverlayNightly : patternOverlay
|
||||||
const contentStyle = { backgroundImage: `url(${patternOverlay})`, ...style }
|
const wrapperStyle = { backgroundImage: `url(${nightly ? patternNightly : pattern})` }
|
||||||
|
const contentStyle = { backgroundImage: `url(${overlay})`, ...style }
|
||||||
return (
|
return (
|
||||||
<header className={classes.header}>
|
<header className={classes.header}>
|
||||||
<div className={classes.headerWrapper} style={wrapperStyle}>
|
<div className={classes.headerWrapper} style={wrapperStyle}>
|
||||||
|
|
|
@ -5,15 +5,22 @@ import classNames from 'classnames'
|
||||||
import patternBlue from '../images/pattern_blue.jpg'
|
import patternBlue from '../images/pattern_blue.jpg'
|
||||||
import patternGreen from '../images/pattern_green.jpg'
|
import patternGreen from '../images/pattern_green.jpg'
|
||||||
import patternPurple from '../images/pattern_purple.jpg'
|
import patternPurple from '../images/pattern_purple.jpg'
|
||||||
|
import patternNightly from '../images/pattern_nightly.jpg'
|
||||||
import classes from '../styles/main.module.sass'
|
import classes from '../styles/main.module.sass'
|
||||||
|
|
||||||
const patterns = { blue: patternBlue, green: patternGreen, purple: patternPurple }
|
const patterns = {
|
||||||
|
blue: patternBlue,
|
||||||
|
green: patternGreen,
|
||||||
|
purple: patternPurple,
|
||||||
|
nightly: patternNightly,
|
||||||
|
}
|
||||||
|
|
||||||
export const Content = ({ Component = 'div', className, children }) => (
|
export const Content = ({ Component = 'div', className, children }) => (
|
||||||
<Component className={classNames(classes.content, className)}>{children}</Component>
|
<Component className={classNames(classes.content, className)}>{children}</Component>
|
||||||
)
|
)
|
||||||
|
|
||||||
const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
||||||
|
const pattern = patterns[theme]
|
||||||
const mainClassNames = classNames(classes.root, {
|
const mainClassNames = classNames(classes.root, {
|
||||||
[classes.withSidebar]: sidebar,
|
[classes.withSidebar]: sidebar,
|
||||||
[classes.withAsides]: asides,
|
[classes.withAsides]: asides,
|
||||||
|
@ -23,10 +30,7 @@ const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
||||||
<main className={mainClassNames}>
|
<main className={mainClassNames}>
|
||||||
{wrapContent ? <Content Component="article">{children}</Content> : children}
|
{wrapContent ? <Content Component="article">{children}</Content> : children}
|
||||||
{asides && (
|
{asides && (
|
||||||
<div
|
<div className={classes.asides} style={{ backgroundImage: `url(${pattern}` }} />
|
||||||
className={classes.asides}
|
|
||||||
style={{ backgroundImage: `url(${patterns[theme]}` }}
|
|
||||||
/>
|
|
||||||
)}
|
)}
|
||||||
{footer}
|
{footer}
|
||||||
</main>
|
</main>
|
||||||
|
|
|
@ -6,6 +6,7 @@ import { StaticQuery, graphql } from 'gatsby'
|
||||||
import socialImageDefault from '../images/social_default.jpg'
|
import socialImageDefault from '../images/social_default.jpg'
|
||||||
import socialImageApi from '../images/social_api.jpg'
|
import socialImageApi from '../images/social_api.jpg'
|
||||||
import socialImageUniverse from '../images/social_universe.jpg'
|
import socialImageUniverse from '../images/social_universe.jpg'
|
||||||
|
import socialImageNightly from '../images/social_nightly.jpg'
|
||||||
|
|
||||||
function getPageTitle(title, sitename, slogan, sectionTitle) {
|
function getPageTitle(title, sitename, slogan, sectionTitle) {
|
||||||
if (sectionTitle && title) {
|
if (sectionTitle && title) {
|
||||||
|
@ -17,13 +18,14 @@ function getPageTitle(title, sitename, slogan, sectionTitle) {
|
||||||
return `${sitename} · ${slogan}`
|
return `${sitename} · ${slogan}`
|
||||||
}
|
}
|
||||||
|
|
||||||
function getImage(section) {
|
function getImage(section, nightly) {
|
||||||
|
if (nightly) return socialImageNightly
|
||||||
if (section === 'api') return socialImageApi
|
if (section === 'api') return socialImageApi
|
||||||
if (section === 'universe') return socialImageUniverse
|
if (section === 'universe') return socialImageUniverse
|
||||||
return socialImageDefault
|
return socialImageDefault
|
||||||
}
|
}
|
||||||
|
|
||||||
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) => (
|
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass, nightly }) => (
|
||||||
<StaticQuery
|
<StaticQuery
|
||||||
query={query}
|
query={query}
|
||||||
render={data => {
|
render={data => {
|
||||||
|
@ -35,7 +37,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
|
||||||
siteMetadata.slogan,
|
siteMetadata.slogan,
|
||||||
sectionTitle
|
sectionTitle
|
||||||
)
|
)
|
||||||
const socialImage = siteMetadata.siteUrl + getImage(section)
|
const socialImage = siteMetadata.siteUrl + getImage(section, nightly)
|
||||||
const meta = [
|
const meta = [
|
||||||
{
|
{
|
||||||
name: 'description',
|
name: 'description',
|
||||||
|
|
|
@ -11,6 +11,9 @@ const Tag = ({ spaced, variant, tooltip, children }) => {
|
||||||
const isValid = isString(children) && !isNaN(children)
|
const isValid = isString(children) && !isNaN(children)
|
||||||
const version = isValid ? Number(children).toFixed(1) : children
|
const version = isValid ? Number(children).toFixed(1) : children
|
||||||
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
||||||
|
// TODO: we probably want to handle this more elegantly, but the idea is
|
||||||
|
// that we can hide tags referring to old versions
|
||||||
|
// const hideTag = version.startsWith('2')
|
||||||
return (
|
return (
|
||||||
<TagTemplate spaced={spaced} tooltip={tooltipText}>
|
<TagTemplate spaced={spaced} tooltip={tooltipText}>
|
||||||
v{version}
|
v{version}
|
||||||
|
|
BIN
website/src/images/icon_nightly.png
Normal file
BIN
website/src/images/icon_nightly.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 18 KiB |
3
website/src/images/icons/moon.svg
Normal file
3
website/src/images/icons/moon.svg
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
<svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32">
|
||||||
|
<path d="M10.895 7.574c0 7.55 5.179 13.67 11.567 13.67 1.588 0 3.101-0.38 4.479-1.063-1.695 4.46-5.996 7.636-11.051 7.636-6.533 0-11.83-5.297-11.83-11.83 0-4.82 2.888-8.959 7.023-10.803-0.116 0.778-0.188 1.573-0.188 2.39z"></path>
|
||||||
|
</svg>
|
After Width: | Height: | Size: 322 B |
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 126 KiB |
BIN
website/src/images/pattern_nightly.jpg
Normal file
BIN
website/src/images/pattern_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 157 KiB |
BIN
website/src/images/social_nightly.jpg
Normal file
BIN
website/src/images/social_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 354 KiB |
47
website/src/pages/404.js
Normal file
47
website/src/pages/404.js
Normal file
|
@ -0,0 +1,47 @@
|
||||||
|
import React from 'react'
|
||||||
|
import { window } from 'browser-monads'
|
||||||
|
import { graphql } from 'gatsby'
|
||||||
|
|
||||||
|
import Template from '../templates/index'
|
||||||
|
import { LandingHeader, LandingTitle } from '../components/landing'
|
||||||
|
import Button from '../components/button'
|
||||||
|
|
||||||
|
export default ({ data, location }) => {
|
||||||
|
const { nightly } = data.site.siteMetadata
|
||||||
|
const pageContext = { title: '404 Error', searchExclude: true, isIndex: false }
|
||||||
|
return (
|
||||||
|
<Template data={data} pageContext={pageContext} location={location}>
|
||||||
|
<LandingHeader style={{ minHeight: 400 }} nightly={nightly}>
|
||||||
|
<LandingTitle>
|
||||||
|
Ooops, this page
|
||||||
|
<br />
|
||||||
|
does not exist!
|
||||||
|
</LandingTitle>
|
||||||
|
<br />
|
||||||
|
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
||||||
|
Click here to go back
|
||||||
|
</Button>
|
||||||
|
</LandingHeader>
|
||||||
|
</Template>
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
export const pageQuery = graphql`
|
||||||
|
query {
|
||||||
|
site {
|
||||||
|
siteMetadata {
|
||||||
|
nightly
|
||||||
|
title
|
||||||
|
description
|
||||||
|
navigation {
|
||||||
|
text
|
||||||
|
url
|
||||||
|
}
|
||||||
|
docSearch {
|
||||||
|
apiKey
|
||||||
|
indexName
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
`
|
|
@ -1,7 +0,0 @@
|
||||||
---
|
|
||||||
title: 404 Error
|
|
||||||
---
|
|
||||||
|
|
||||||
import Error from 'widgets/404.js'
|
|
||||||
|
|
||||||
<Error />
|
|
|
@ -3,11 +3,14 @@
|
||||||
bottom: 0
|
bottom: 0
|
||||||
left: 0
|
left: 0
|
||||||
width: 100%
|
width: 100%
|
||||||
background: var(--color-subtle-light)
|
background: var(--color-back)
|
||||||
z-index: 100
|
z-index: 100
|
||||||
font: var(--font-size-sm)/var(--line-height-md) var(--font-primary)
|
font: var(--font-size-sm)/var(--line-height-md) var(--font-primary)
|
||||||
text-align: center
|
text-align: center
|
||||||
padding: 1rem
|
padding: 1rem
|
||||||
|
box-shadow: var(--box-shadow)
|
||||||
|
border-top: 2px solid
|
||||||
|
color: var(--color-theme)
|
||||||
|
|
||||||
.warning
|
.warning
|
||||||
--alert-bg: var(--color-yellow-light)
|
--alert-bg: var(--color-yellow-light)
|
||||||
|
|
|
@ -47,6 +47,11 @@
|
||||||
--color-theme-purple-light: hsla(255, 61%, 54%, 0.06)
|
--color-theme-purple-light: hsla(255, 61%, 54%, 0.06)
|
||||||
--color-theme-purple-opaque: hsla(255, 61%, 54%, 0.11)
|
--color-theme-purple-opaque: hsla(255, 61%, 54%, 0.11)
|
||||||
|
|
||||||
|
--color-theme-nightly: hsl(257, 99%, 67%)
|
||||||
|
--color-theme-nightly-dark: hsl(257, 99%, 57%)
|
||||||
|
--color-theme-nightly-light: hsla(257, 99%, 67%, 0.06)
|
||||||
|
--color-theme-nightly-opaque: hsla(257, 99%, 67%, 0.11)
|
||||||
|
|
||||||
// Regular colors
|
// Regular colors
|
||||||
--color-back: hsl(0, 0%, 100%)
|
--color-back: hsl(0, 0%, 100%)
|
||||||
--color-front: hsl(213, 15%, 12%)
|
--color-front: hsl(213, 15%, 12%)
|
||||||
|
@ -106,6 +111,12 @@
|
||||||
--color-theme-light: var(--color-theme-purple-light)
|
--color-theme-light: var(--color-theme-purple-light)
|
||||||
--color-theme-opaque: var(--color-theme-purple-opaque)
|
--color-theme-opaque: var(--color-theme-purple-opaque)
|
||||||
|
|
||||||
|
.theme-nightly
|
||||||
|
--color-theme: var(--color-theme-nightly)
|
||||||
|
--color-theme-dark: var(--color-theme-nightly-dark)
|
||||||
|
--color-theme-light: var(--color-theme-nightly-light)
|
||||||
|
--color-theme-opaque: var(--color-theme-nightly-opaque)
|
||||||
|
|
||||||
|
|
||||||
/* Fonts */
|
/* Fonts */
|
||||||
|
|
||||||
|
|
|
@ -22,6 +22,9 @@ $crumb-bar: 2px
|
||||||
& > *
|
& > *
|
||||||
padding: 0 2rem 0.35rem
|
padding: 0 2rem 0.35rem
|
||||||
|
|
||||||
|
&:last-child
|
||||||
|
margin-bottom: 5rem
|
||||||
|
|
||||||
.label
|
.label
|
||||||
color: var(--color-dark)
|
color: var(--color-dark)
|
||||||
font: bold var(--font-size-lg)/var(--line-height-md) var(--font-secondary)
|
font: bold var(--font-size-lg)/var(--line-height-md) var(--font-secondary)
|
||||||
|
|
|
@ -31,7 +31,7 @@ const Docs = ({ pageContext, children }) => (
|
||||||
theme,
|
theme,
|
||||||
version,
|
version,
|
||||||
} = pageContext
|
} = pageContext
|
||||||
const { sidebars = [], modelsRepo, languages } = site.siteMetadata
|
const { sidebars = [], modelsRepo, languages, nightly } = site.siteMetadata
|
||||||
const isModels = section === 'models'
|
const isModels = section === 'models'
|
||||||
const sidebar = pageContext.sidebar
|
const sidebar = pageContext.sidebar
|
||||||
? { items: pageContext.sidebar }
|
? { items: pageContext.sidebar }
|
||||||
|
@ -83,7 +83,7 @@ const Docs = ({ pageContext, children }) => (
|
||||||
{sidebar && <Sidebar items={sidebar.items} pageMenu={pageMenu} slug={slug} />}
|
{sidebar && <Sidebar items={sidebar.items} pageMenu={pageMenu} slug={slug} />}
|
||||||
<Main
|
<Main
|
||||||
section={section}
|
section={section}
|
||||||
theme={theme}
|
theme={nightly ? 'nightly' : theme}
|
||||||
sidebar
|
sidebar
|
||||||
asides
|
asides
|
||||||
wrapContent
|
wrapContent
|
||||||
|
@ -146,6 +146,7 @@ const query = graphql`
|
||||||
models
|
models
|
||||||
starters
|
starters
|
||||||
}
|
}
|
||||||
|
nightly
|
||||||
sidebars {
|
sidebars {
|
||||||
section
|
section
|
||||||
items {
|
items {
|
||||||
|
|
|
@ -75,10 +75,23 @@ const scopeComponents = {
|
||||||
InlineCode,
|
InlineCode,
|
||||||
}
|
}
|
||||||
|
|
||||||
const AlertSpace = () => {
|
const AlertSpace = ({ nightly }) => {
|
||||||
const isOnline = useOnlineStatus()
|
const isOnline = useOnlineStatus()
|
||||||
return (
|
return (
|
||||||
<>
|
<>
|
||||||
|
{nightly && (
|
||||||
|
<Alert
|
||||||
|
title="You're viewing the pre-release docs."
|
||||||
|
icon="moon"
|
||||||
|
closeOnClick={false}
|
||||||
|
>
|
||||||
|
The page reflects{' '}
|
||||||
|
<Link to="https://pypi.org/project/spacy-nightly/">
|
||||||
|
<InlineCode>spacy-nightly</InlineCode>
|
||||||
|
</Link>
|
||||||
|
, not the latest <Link to="https://spacy.io">stable version</Link>.
|
||||||
|
</Alert>
|
||||||
|
)}
|
||||||
{!isOnline && (
|
{!isOnline && (
|
||||||
<Alert title="Looks like you're offline." icon="offline" variant="warning">
|
<Alert title="Looks like you're offline." icon="offline" variant="warning">
|
||||||
But don't worry, your visited pages should be saved for you.
|
But don't worry, your visited pages should be saved for you.
|
||||||
|
@ -130,9 +143,10 @@ class Layout extends React.Component {
|
||||||
const { data, pageContext, location, children } = this.props
|
const { data, pageContext, location, children } = this.props
|
||||||
const { file, site = {} } = data || {}
|
const { file, site = {} } = data || {}
|
||||||
const mdx = file ? file.childMdx : null
|
const mdx = file ? file.childMdx : null
|
||||||
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
|
||||||
const bodyClass = classNames(`theme-${theme}`, { 'search-exclude': !!searchExclude })
|
|
||||||
const meta = site.siteMetadata || {}
|
const meta = site.siteMetadata || {}
|
||||||
|
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
||||||
|
const uiTheme = meta.nightly ? 'nightly' : theme
|
||||||
|
const bodyClass = classNames(`theme-${uiTheme}`, { 'search-exclude': !!searchExclude })
|
||||||
const isDocs = ['usage', 'models', 'api', 'styleguide'].includes(section)
|
const isDocs = ['usage', 'models', 'api', 'styleguide'].includes(section)
|
||||||
const content = !mdx ? null : (
|
const content = !mdx ? null : (
|
||||||
<MDXProvider components={mdxComponents}>
|
<MDXProvider components={mdxComponents}>
|
||||||
|
@ -148,8 +162,9 @@ class Layout extends React.Component {
|
||||||
section={section}
|
section={section}
|
||||||
sectionTitle={sectionTitle}
|
sectionTitle={sectionTitle}
|
||||||
bodyClass={bodyClass}
|
bodyClass={bodyClass}
|
||||||
|
nightly={meta.nightly}
|
||||||
/>
|
/>
|
||||||
<AlertSpace />
|
<AlertSpace nightly={meta.nightly} />
|
||||||
<Navigation
|
<Navigation
|
||||||
title={meta.title}
|
title={meta.title}
|
||||||
items={meta.navigation}
|
items={meta.navigation}
|
||||||
|
@ -167,11 +182,11 @@ class Layout extends React.Component {
|
||||||
mdxComponents={mdxComponents}
|
mdxComponents={mdxComponents}
|
||||||
/>
|
/>
|
||||||
) : (
|
) : (
|
||||||
<>
|
<div>
|
||||||
{children}
|
{children}
|
||||||
{content}
|
{content}
|
||||||
<Footer wide />
|
<Footer wide />
|
||||||
</>
|
</div>
|
||||||
)}
|
)}
|
||||||
</>
|
</>
|
||||||
)
|
)
|
||||||
|
@ -184,6 +199,7 @@ export const pageQuery = graphql`
|
||||||
query($slug: String!) {
|
query($slug: String!) {
|
||||||
site {
|
site {
|
||||||
siteMetadata {
|
siteMetadata {
|
||||||
|
nightly
|
||||||
title
|
title
|
||||||
description
|
description
|
||||||
navigation {
|
navigation {
|
||||||
|
|
|
@ -30,8 +30,8 @@ function filterResources(resources, data) {
|
||||||
return sorted.filter(res => (res.category || []).includes(data.id))
|
return sorted.filter(res => (res.category || []).includes(data.id))
|
||||||
}
|
}
|
||||||
|
|
||||||
const UniverseContent = ({ content = [], categories, pageContext, location, mdxComponents }) => {
|
const UniverseContent = ({ content = [], categories, theme, pageContext, mdxComponents }) => {
|
||||||
const { theme, data = {} } = pageContext
|
const { data = {} } = pageContext
|
||||||
const filteredResources = filterResources(content, data)
|
const filteredResources = filterResources(content, data)
|
||||||
const activeData = data ? content.find(({ id }) => id === data.id) : null
|
const activeData = data ? content.find(({ id }) => id === data.id) : null
|
||||||
const markdownComponents = { ...mdxComponents, code: InlineCode }
|
const markdownComponents = { ...mdxComponents, code: InlineCode }
|
||||||
|
@ -302,15 +302,16 @@ const Universe = ({ pageContext, location, mdxComponents }) => (
|
||||||
<StaticQuery
|
<StaticQuery
|
||||||
query={query}
|
query={query}
|
||||||
render={data => {
|
render={data => {
|
||||||
const content = data.site.siteMetadata.universe.resources
|
const { universe, nightly } = data.site.siteMetadata
|
||||||
const categories = data.site.siteMetadata.universe.categories
|
const theme = nightly ? 'nightly' : pageContext.theme
|
||||||
return (
|
return (
|
||||||
<UniverseContent
|
<UniverseContent
|
||||||
content={content}
|
content={universe.resources}
|
||||||
categories={categories}
|
categories={universe.categories}
|
||||||
pageContext={pageContext}
|
pageContext={pageContext}
|
||||||
location={location}
|
location={location}
|
||||||
mdxComponents={mdxComponents}
|
mdxComponents={mdxComponents}
|
||||||
|
theme={theme}
|
||||||
/>
|
/>
|
||||||
)
|
)
|
||||||
}}
|
}}
|
||||||
|
@ -323,6 +324,7 @@ const query = graphql`
|
||||||
query UniverseQuery {
|
query UniverseQuery {
|
||||||
site {
|
site {
|
||||||
siteMetadata {
|
siteMetadata {
|
||||||
|
nightly
|
||||||
universe {
|
universe {
|
||||||
resources {
|
resources {
|
||||||
type
|
type
|
||||||
|
|
|
@ -1,19 +0,0 @@
|
||||||
import React from 'react'
|
|
||||||
import { window } from 'browser-monads'
|
|
||||||
|
|
||||||
import { LandingHeader, LandingTitle } from '../components/landing'
|
|
||||||
import Button from '../components/button'
|
|
||||||
|
|
||||||
export default () => (
|
|
||||||
<LandingHeader style={{ minHeight: 400 }}>
|
|
||||||
<LandingTitle>
|
|
||||||
Ooops, this page
|
|
||||||
<br />
|
|
||||||
does not exist!
|
|
||||||
</LandingTitle>
|
|
||||||
<br />
|
|
||||||
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
|
||||||
Click here to go back
|
|
||||||
</Button>
|
|
||||||
</LandingHeader>
|
|
||||||
)
|
|
|
@ -68,7 +68,7 @@ const Landing = ({ data }) => {
|
||||||
const counts = getCounts(data.languages)
|
const counts = getCounts(data.languages)
|
||||||
return (
|
return (
|
||||||
<>
|
<>
|
||||||
<LandingHeader>
|
<LandingHeader nightly={data.nightly}>
|
||||||
<LandingTitle>
|
<LandingTitle>
|
||||||
Industrial-Strength
|
Industrial-Strength
|
||||||
<br />
|
<br />
|
||||||
|
@ -268,6 +268,7 @@ const landingQuery = graphql`
|
||||||
query LandingQuery {
|
query LandingQuery {
|
||||||
site {
|
site {
|
||||||
siteMetadata {
|
siteMetadata {
|
||||||
|
nightly
|
||||||
repo
|
repo
|
||||||
languages {
|
languages {
|
||||||
models
|
models
|
||||||
|
|
Loading…
Reference in New Issue
Block a user