mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-05 22:20:34 +03:00
Merge remote-tracking branch 'origin/develop' into rliaw-develop
This commit is contained in:
commit
8eb2484504
106
.github/contributors/hertelm.md
vendored
Normal file
106
.github/contributors/hertelm.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Matthias Hertel |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | June 29, 2020 |
|
||||
| GitHub username | hertelm |
|
||||
| Website (optional) | |
|
|
@ -1,6 +1,8 @@
|
|||
redirects = [
|
||||
# Netlify
|
||||
{from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
|
||||
# Subdomain for branches
|
||||
{from = "https://nightly.spacy.io/*", to="https://spacy-io-develop.spacy.io/:splat", force = true, status = 200},
|
||||
# Old subdomains
|
||||
{from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
{from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
|
|
4
setup.py
4
setup.py
|
@ -81,7 +81,7 @@ def is_new_osx():
|
|||
return False
|
||||
mac_ver = platform.mac_ver()[0]
|
||||
if mac_ver.startswith("10"):
|
||||
minor_version = int(mac_ver.split('.')[1])
|
||||
minor_version = int(mac_ver.split(".")[1])
|
||||
if minor_version >= 7:
|
||||
return True
|
||||
else:
|
||||
|
@ -158,7 +158,7 @@ def setup_package():
|
|||
ext_modules = cythonize(ext_modules, compiler_directives=COMPILER_DIRECTIVES)
|
||||
|
||||
setup(
|
||||
name="spacy",
|
||||
name="spacy-nightly",
|
||||
packages=PACKAGES,
|
||||
version=about["__version__"],
|
||||
ext_modules=ext_modules,
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# fmt: off
|
||||
__title__ = "spacy"
|
||||
__version__ = "3.0.0.dev12"
|
||||
__title__ = "spacy-nightly"
|
||||
__version__ = "3.0.0a0"
|
||||
__release__ = True
|
||||
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
||||
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
||||
|
|
|
@ -242,12 +242,16 @@ def project_clone(
|
|||
try:
|
||||
run_command(cmd)
|
||||
except SystemExit:
|
||||
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'"
|
||||
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'."
|
||||
msg.fail(err)
|
||||
with (tmp_dir / ".git" / "info" / "sparse-checkout").open("w") as f:
|
||||
f.write(name)
|
||||
try:
|
||||
run_command(["git", "-C", str(tmp_dir), "fetch"])
|
||||
run_command(["git", "-C", str(tmp_dir), "checkout"])
|
||||
except SystemExit:
|
||||
err = f"Could not clone '{name}' in the repo '{repo}'."
|
||||
msg.fail(err)
|
||||
shutil.move(str(tmp_dir / Path(name).name), str(project_dir))
|
||||
msg.good(f"Cloned project '{name}' from {repo} into {project_dir}")
|
||||
for sub_dir in DIRS:
|
||||
|
@ -525,9 +529,9 @@ def update_dvc_config(
|
|||
outputs_no_cache = command.get("outputs_no_cache", [])
|
||||
if not deps and not outputs and not outputs_no_cache:
|
||||
continue
|
||||
# Default to "." as the project path since dvc.yaml is auto-generated
|
||||
# Default to the working dir as the project path since dvc.yaml is auto-generated
|
||||
# and we don't want arbitrary paths in there
|
||||
project_cmd = ["python", "-m", NAME, "project", ".", "exec", name]
|
||||
project_cmd = ["python", "-m", NAME, "project", "exec", name]
|
||||
deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
|
||||
outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
|
||||
outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
|
||||
|
|
|
@ -339,6 +339,7 @@ def create_train_batches(nlp, corpus, cfg, randomization_index):
|
|||
yield epoch, batch
|
||||
if max_epochs >= 1 and epoch >= max_epochs:
|
||||
break
|
||||
random.shuffle(train_examples)
|
||||
|
||||
|
||||
def create_evaluation_callback(nlp, optimizer, corpus, cfg):
|
||||
|
@ -350,13 +351,14 @@ def create_evaluation_callback(nlp, optimizer, corpus, cfg):
|
|||
)
|
||||
|
||||
n_words = sum(len(ex.predicted) for ex in dev_examples)
|
||||
batch_size = cfg.get("evaluation_batch_size", 128)
|
||||
start_time = timer()
|
||||
|
||||
if optimizer.averages:
|
||||
with nlp.use_params(optimizer.averages):
|
||||
scorer = nlp.evaluate(dev_examples, batch_size=32)
|
||||
scorer = nlp.evaluate(dev_examples, batch_size=batch_size)
|
||||
else:
|
||||
scorer = nlp.evaluate(dev_examples, batch_size=32)
|
||||
scorer = nlp.evaluate(dev_examples, batch_size=batch_size)
|
||||
end_time = timer()
|
||||
wps = n_words / (end_time - start_time)
|
||||
scores = scorer.scores
|
||||
|
@ -479,7 +481,7 @@ def train_while_improving(
|
|||
if patience and (step - best_step) >= patience:
|
||||
break
|
||||
# Stop if we've exhausted our max steps (if specified)
|
||||
if max_steps and (step * accumulate_gradient) >= max_steps:
|
||||
if max_steps and step >= max_steps:
|
||||
break
|
||||
|
||||
|
||||
|
|
|
@ -45,18 +45,22 @@ class Corpus:
|
|||
|
||||
def make_examples(self, nlp, reference_docs, max_length=0):
|
||||
for reference in reference_docs:
|
||||
if len(reference) >= max_length >= 1:
|
||||
if reference.is_sentenced:
|
||||
for ref_sent in reference.sents:
|
||||
yield Example(
|
||||
nlp.make_doc(ref_sent.text),
|
||||
ref_sent.as_doc()
|
||||
)
|
||||
else:
|
||||
if len(reference) == 0:
|
||||
continue
|
||||
elif max_length == 0 or len(reference) < max_length:
|
||||
yield Example(
|
||||
nlp.make_doc(reference.text),
|
||||
reference
|
||||
)
|
||||
elif reference.is_sentenced:
|
||||
for ref_sent in reference.sents:
|
||||
if len(ref_sent) == 0:
|
||||
continue
|
||||
elif max_length == 0 or len(ref_sent) < max_length:
|
||||
yield Example(
|
||||
nlp.make_doc(ref_sent.text),
|
||||
ref_sent.as_doc()
|
||||
)
|
||||
|
||||
def make_examples_gold_preproc(self, nlp, reference_docs):
|
||||
for reference in reference_docs:
|
||||
|
@ -65,7 +69,7 @@ class Corpus:
|
|||
else:
|
||||
ref_sents = [reference]
|
||||
for ref_sent in ref_sents:
|
||||
yield Example(
|
||||
eg = Example(
|
||||
Doc(
|
||||
nlp.vocab,
|
||||
words=[w.text for w in ref_sent],
|
||||
|
@ -73,6 +77,8 @@ class Corpus:
|
|||
),
|
||||
ref_sent
|
||||
)
|
||||
if len(eg.x):
|
||||
yield eg
|
||||
|
||||
def read_docbin(self, vocab, locs):
|
||||
""" Yield training examples as example dicts """
|
||||
|
|
|
@ -110,6 +110,7 @@ def init(model, X=None, Y=None):
|
|||
|
||||
ops = model.ops
|
||||
W = normal_init(ops, W.shape, mean=float(ops.xp.sqrt(1.0 / nF * nI)))
|
||||
pad = normal_init(ops, pad.shape, mean=1.0)
|
||||
model.set_param("W", W)
|
||||
model.set_param("b", b)
|
||||
model.set_param("pad", pad)
|
||||
|
|
|
@ -339,6 +339,7 @@ cdef class precompute_hiddens:
|
|||
cdef readonly int nF, nO, nP
|
||||
cdef bint _is_synchronized
|
||||
cdef public object ops
|
||||
cdef public object numpy_ops
|
||||
cdef np.ndarray _features
|
||||
cdef np.ndarray _cached
|
||||
cdef np.ndarray bias
|
||||
|
@ -368,6 +369,7 @@ cdef class precompute_hiddens:
|
|||
self.nP = 1
|
||||
self.nO = cached.shape[2]
|
||||
self.ops = lower_model.ops
|
||||
self.numpy_ops = NumpyOps()
|
||||
assert activation in (None, "relu", "maxout")
|
||||
self.activation = activation
|
||||
self._is_synchronized = False
|
||||
|
@ -446,44 +448,32 @@ cdef class precompute_hiddens:
|
|||
return state_vector, backward
|
||||
|
||||
def _nonlinearity(self, state_vector):
|
||||
if isinstance(state_vector, numpy.ndarray):
|
||||
ops = NumpyOps()
|
||||
else:
|
||||
ops = CupyOps()
|
||||
|
||||
if self.activation == "maxout":
|
||||
state_vector, mask = ops.maxout(state_vector)
|
||||
return self._maxout_nonlinearity(state_vector)
|
||||
else:
|
||||
state_vector = state_vector.reshape(state_vector.shape[:-1])
|
||||
if self.activation == "relu":
|
||||
return self._relu_nonlinearity(state_vector)
|
||||
|
||||
def _maxout_nonlinearity(self, state_vector):
|
||||
state_vector, mask = self.numpy_ops.maxout(state_vector)
|
||||
# We're outputting to CPU, but we need this variable on GPU for the
|
||||
# backward pass.
|
||||
mask = self.ops.asarray(mask)
|
||||
|
||||
def backprop_maxout(d_best):
|
||||
return self.ops.backprop_maxout(d_best, mask, self.nP)
|
||||
|
||||
return state_vector, backprop_maxout
|
||||
|
||||
def _relu_nonlinearity(self, state_vector):
|
||||
state_vector = state_vector.reshape((state_vector.shape[0], -1))
|
||||
mask = state_vector >= 0.
|
||||
state_vector *= mask
|
||||
else:
|
||||
mask = None
|
||||
# We're outputting to CPU, but we need this variable on GPU for the
|
||||
# backward pass.
|
||||
mask = self.ops.asarray(mask)
|
||||
|
||||
def backprop_nonlinearity(d_best):
|
||||
if isinstance(d_best, numpy.ndarray):
|
||||
ops = NumpyOps()
|
||||
else:
|
||||
ops = CupyOps()
|
||||
if mask is not None:
|
||||
mask_ = ops.asarray(mask)
|
||||
# This will usually be on GPU
|
||||
d_best = ops.asarray(d_best)
|
||||
# Fix nans (which can occur from unseen classes.)
|
||||
try:
|
||||
d_best[ops.xp.isnan(d_best)] = 0.
|
||||
except:
|
||||
print(ops.xp.isnan(d_best))
|
||||
raise
|
||||
if self.activation == "maxout":
|
||||
mask_ = ops.asarray(mask)
|
||||
return ops.backprop_maxout(d_best, mask_, self.nP)
|
||||
elif self.activation == "relu":
|
||||
mask_ = ops.asarray(mask)
|
||||
d_best *= mask_
|
||||
d_best = d_best.reshape((d_best.shape + (1,)))
|
||||
return d_best
|
||||
else:
|
||||
def backprop_relu(d_best):
|
||||
d_best *= mask
|
||||
return d_best.reshape((d_best.shape + (1,)))
|
||||
return state_vector, backprop_nonlinearity
|
||||
|
||||
return state_vector, backprop_relu
|
||||
|
|
|
@ -742,21 +742,14 @@ cdef class ArcEager(TransitionSystem):
|
|||
if n_gold < 1:
|
||||
raise ValueError
|
||||
|
||||
def get_oracle_sequence(self, Example example):
|
||||
cdef StateClass state
|
||||
cdef ArcEagerGold gold
|
||||
states, golds, n_steps = self.init_gold_batch([example])
|
||||
if not golds:
|
||||
return []
|
||||
|
||||
def get_oracle_sequence_from_state(self, StateClass state, ArcEagerGold gold, _debug=None):
|
||||
cdef int i
|
||||
cdef Pool mem = Pool()
|
||||
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
||||
assert self.n_moves > 0
|
||||
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
||||
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
||||
|
||||
state = states[0]
|
||||
gold = golds[0]
|
||||
history = []
|
||||
debug_log = []
|
||||
failed = False
|
||||
|
@ -772,6 +765,8 @@ cdef class ArcEager(TransitionSystem):
|
|||
history.append(i)
|
||||
s0 = state.S(0)
|
||||
b0 = state.B(0)
|
||||
if _debug:
|
||||
example = _debug
|
||||
debug_log.append(" ".join((
|
||||
self.get_class_name(i),
|
||||
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
||||
|
@ -784,6 +779,7 @@ cdef class ArcEager(TransitionSystem):
|
|||
failed = False
|
||||
break
|
||||
if failed:
|
||||
example = _debug
|
||||
print("Actions")
|
||||
for i in range(self.n_moves):
|
||||
print(self.get_class_name(i))
|
||||
|
|
|
@ -63,7 +63,9 @@ cdef class Parser:
|
|||
self.model = model
|
||||
if self.moves.n_moves != 0:
|
||||
self.set_output(self.moves.n_moves)
|
||||
self.cfg = cfg
|
||||
self.cfg = dict(cfg)
|
||||
self.cfg.setdefault("update_with_oracle_cut_size", 100)
|
||||
self.cfg.setdefault("normalize_gradients_with_batch_size", True)
|
||||
self._multitasks = []
|
||||
for multitask in cfg.get("multitasks", []):
|
||||
self.add_multitask_objective(multitask)
|
||||
|
@ -263,22 +265,32 @@ cdef class Parser:
|
|||
free(is_valid)
|
||||
|
||||
def update(self, examples, drop=0., set_annotations=False, sgd=None, losses=None):
|
||||
cdef StateClass state
|
||||
if losses is None:
|
||||
losses = {}
|
||||
losses.setdefault(self.name, 0.)
|
||||
for multitask in self._multitasks:
|
||||
multitask.update(examples, drop=drop, sgd=sgd)
|
||||
n_examples = len([eg for eg in examples if self.moves.has_gold(eg)])
|
||||
if n_examples == 0:
|
||||
return losses
|
||||
set_dropout_rate(self.model, drop)
|
||||
# Prepare the stepwise model, and get the callback for finishing the batch
|
||||
model, backprop_tok2vec = self.model.begin_update(
|
||||
[eg.predicted for eg in examples])
|
||||
if self.cfg["update_with_oracle_cut_size"] >= 1:
|
||||
# Chop sequences into lengths of this many transitions, to make the
|
||||
# batch uniform length. We randomize this to overfit less.
|
||||
cut_gold = numpy.random.choice(range(20, 100))
|
||||
cut_size = self.cfg["update_with_oracle_cut_size"]
|
||||
states, golds, max_steps = self._init_gold_batch(
|
||||
examples,
|
||||
max_length=cut_gold
|
||||
max_length=numpy.random.choice(range(5, cut_size))
|
||||
)
|
||||
else:
|
||||
states, golds, _ = self.moves.init_gold_batch(examples)
|
||||
max_steps = max([len(eg.x) for eg in examples])
|
||||
if not states:
|
||||
return losses
|
||||
all_states = list(states)
|
||||
states_golds = zip(states, golds)
|
||||
for _ in range(max_steps):
|
||||
|
@ -287,6 +299,17 @@ cdef class Parser:
|
|||
states, golds = zip(*states_golds)
|
||||
scores, backprop = model.begin_update(states)
|
||||
d_scores = self.get_batch_loss(states, golds, scores, losses)
|
||||
if self.cfg["normalize_gradients_with_batch_size"]:
|
||||
# We have to be very careful how we do this, because of the way we
|
||||
# cut up the batch. We subdivide long sequences. If we normalize
|
||||
# naively, we end up normalizing by sequence length, which
|
||||
# is bad: that would mean that states in long sequences
|
||||
# consistently get smaller gradients. Imagine if we have two
|
||||
# sequences, one length 1000, one length 20. If we cut up
|
||||
# the 1k sequence so that we have a "batch" of 50 subsequences,
|
||||
# we don't want the gradients to get 50 times smaller!
|
||||
d_scores /= n_examples
|
||||
|
||||
backprop(d_scores)
|
||||
# Follow the predicted action
|
||||
self.transition_states(states, scores)
|
||||
|
@ -384,8 +407,6 @@ cdef class Parser:
|
|||
cpu_log_loss(c_d_scores,
|
||||
costs, is_valid, &scores[i, 0], d_scores.shape[1])
|
||||
c_d_scores += d_scores.shape[1]
|
||||
if len(states):
|
||||
d_scores /= len(states)
|
||||
if losses is not None:
|
||||
losses.setdefault(self.name, 0.)
|
||||
losses[self.name] += (d_scores**2).sum()
|
||||
|
@ -428,7 +449,7 @@ cdef class Parser:
|
|||
if component is self:
|
||||
break
|
||||
if hasattr(component, "pipe"):
|
||||
doc_sample = list(component.pipe(doc_sample))
|
||||
doc_sample = list(component.pipe(doc_sample, batch_size=8))
|
||||
else:
|
||||
doc_sample = [component(doc) for doc in doc_sample]
|
||||
if doc_sample:
|
||||
|
@ -498,40 +519,49 @@ cdef class Parser:
|
|||
return self
|
||||
|
||||
def _init_gold_batch(self, examples, min_length=5, max_length=500):
|
||||
"""Make a square batch, of length equal to the shortest doc. A long
|
||||
"""Make a square batch, of length equal to the shortest transition
|
||||
sequence or a cap. A long
|
||||
doc will get multiple states. Let's say we have a doc of length 2*N,
|
||||
where N is the shortest doc. We'll make two states, one representing
|
||||
long_doc[:N], and another representing long_doc[N:]."""
|
||||
cdef:
|
||||
StateClass start_state
|
||||
StateClass state
|
||||
Transition action
|
||||
all_states = self.moves.init_batch([eg.predicted for eg in examples])
|
||||
kept = []
|
||||
max_length_seen = 0
|
||||
for state, eg in zip(all_states, examples):
|
||||
if self.moves.has_gold(eg) and not state.is_final():
|
||||
gold = self.moves.init_gold(state, eg)
|
||||
kept.append((eg, state, gold))
|
||||
max_length = max(min_length, min(max_length, min([len(eg.x) for eg in examples])))
|
||||
max_moves = 0
|
||||
oracle_actions = self.moves.get_oracle_sequence_from_state(
|
||||
state.copy(), gold)
|
||||
kept.append((eg, state, gold, oracle_actions))
|
||||
min_length = min(min_length, len(oracle_actions))
|
||||
max_length_seen = max(max_length, len(oracle_actions))
|
||||
if not kept:
|
||||
return [], [], 0
|
||||
max_length = max(min_length, min(max_length, max_length_seen))
|
||||
states = []
|
||||
golds = []
|
||||
for eg, state, gold in kept:
|
||||
oracle_actions = self.moves.get_oracle_sequence(eg)
|
||||
start = 0
|
||||
while start < len(eg.predicted):
|
||||
state = state.copy()
|
||||
cdef int clas
|
||||
max_moves = 0
|
||||
for eg, state, gold, oracle_actions in kept:
|
||||
for i in range(0, len(oracle_actions), max_length):
|
||||
start_state = state.copy()
|
||||
n_moves = 0
|
||||
while state.B(0) < start and not state.is_final():
|
||||
action = self.moves.c[oracle_actions.pop(0)]
|
||||
for clas in oracle_actions[i:i+max_length]:
|
||||
action = self.moves.c[clas]
|
||||
action.do(state.c, action.label)
|
||||
state.c.push_hist(action.clas)
|
||||
n_moves += 1
|
||||
has_gold = self.moves.has_gold(eg, start=start,
|
||||
end=start+max_length)
|
||||
if not state.is_final() and has_gold:
|
||||
states.append(state)
|
||||
if state.is_final():
|
||||
break
|
||||
max_moves = max(max_moves, n_moves)
|
||||
if self.moves.has_gold(eg, start_state.B(0), state.B(0)):
|
||||
states.append(start_state)
|
||||
golds.append(gold)
|
||||
max_moves = max(max_moves, n_moves)
|
||||
start += min(max_length, len(eg.x)-start)
|
||||
max_moves = max(max_moves, len(oracle_actions))
|
||||
if state.is_final():
|
||||
break
|
||||
return states, golds, max_moves
|
||||
|
|
|
@ -62,18 +62,23 @@ cdef class TransitionSystem:
|
|||
return states
|
||||
|
||||
def get_oracle_sequence(self, Example example, _debug=False):
|
||||
states, golds, _ = self.init_gold_batch([example])
|
||||
if not states:
|
||||
return []
|
||||
state = states[0]
|
||||
gold = golds[0]
|
||||
if _debug:
|
||||
return self.get_oracle_sequence_from_state(state, gold, _debug=example)
|
||||
else:
|
||||
return self.get_oracle_sequence_from_state(state, gold)
|
||||
|
||||
def get_oracle_sequence_from_state(self, StateClass state, gold, _debug=None):
|
||||
cdef Pool mem = Pool()
|
||||
# n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
|
||||
assert self.n_moves > 0
|
||||
costs = <float*>mem.alloc(self.n_moves, sizeof(float))
|
||||
is_valid = <int*>mem.alloc(self.n_moves, sizeof(int))
|
||||
|
||||
cdef StateClass state
|
||||
states, golds, n_steps = self.init_gold_batch([example])
|
||||
if not states:
|
||||
return []
|
||||
state = states[0]
|
||||
gold = golds[0]
|
||||
history = []
|
||||
debug_log = []
|
||||
while not state.is_final():
|
||||
|
@ -82,9 +87,10 @@ cdef class TransitionSystem:
|
|||
if is_valid[i] and costs[i] <= 0:
|
||||
action = self.c[i]
|
||||
history.append(i)
|
||||
if _debug:
|
||||
s0 = state.S(0)
|
||||
b0 = state.B(0)
|
||||
if _debug:
|
||||
example = _debug
|
||||
debug_log.append(" ".join((
|
||||
self.get_class_name(i),
|
||||
"S0=", (example.x[s0].text if s0 >= 0 else "__"),
|
||||
|
@ -95,6 +101,7 @@ cdef class TransitionSystem:
|
|||
break
|
||||
else:
|
||||
if _debug:
|
||||
example = _debug
|
||||
print("Actions")
|
||||
for i in range(self.n_moves):
|
||||
print(self.get_class_name(i))
|
||||
|
|
|
@ -91,7 +91,7 @@ Match a stream of documents, yielding them in turn.
|
|||
> ```python
|
||||
> from spacy.matcher import PhraseMatcher
|
||||
> matcher = PhraseMatcher(nlp.vocab)
|
||||
> for doc in matcher.pipe(texts, batch_size=50):
|
||||
> for doc in matcher.pipe(docs, batch_size=50):
|
||||
> pass
|
||||
> ```
|
||||
|
||||
|
|
|
@ -47,7 +47,7 @@ Update the evaluation scores from a single [`Doc`](/api/doc) /
|
|||
## Properties
|
||||
|
||||
| Name | Type | Description |
|
||||
| --------------------------------------------------- | ----- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| --------------------------------------------------- | ----- | -------------------------------------------------------------------------------------- |
|
||||
| `token_acc` | float | Tokenization accuracy. |
|
||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||
| `uas` | float | Unlabelled dependency score. |
|
||||
|
@ -57,7 +57,7 @@ Update the evaluation scores from a single [`Doc`](/api/doc) /
|
|||
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||
| `ents_per_type` <Tag variant="new">2.1.5</Tag> | dict | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores. |
|
||||
| `textcat_f` <Tag variant="new">3.0</Tag> | float | F-score on positive label for binary classification, macro-averaged F-score otherwise. |
|
||||
| `textcat_auc` <Tag variant="new"3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
||||
| `textcat_auc` <Tag variant="new">3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
||||
| `textcats_f_per_cat` <Tag variant="new">3.0</Tag> | dict | F-scores per textcat label, keyed by label. |
|
||||
| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict | ROC AUC scores per textcat label, keyed by label. |
|
||||
| `las_per_type` <Tag variant="new">2.2.3</Tag> | dict | Labelled dependency scores, keyed by label. |
|
||||
|
|
|
@ -122,7 +122,7 @@ for match_id, start, end in matches:
|
|||
```
|
||||
|
||||
The matcher returns a list of `(match_id, start, end)` tuples – in this case,
|
||||
`[('15578876784678163569', 0, 2)]`, which maps to the span `doc[0:2]` of our
|
||||
`[('15578876784678163569', 0, 3)]`, which maps to the span `doc[0:3]` of our
|
||||
original document. The `match_id` is the [hash value](/usage/spacy-101#vocab) of
|
||||
the string ID "HelloWorld". To get the string value, you can look up the ID in
|
||||
the [`StringStore`](/api/stringstore).
|
||||
|
|
|
@ -161,10 +161,18 @@ debugging your tokenizer configuration.
|
|||
|
||||
spaCy's custom warnings have been replaced with native Python
|
||||
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
||||
setting `SPACY_WARNING_IGNORE`, use the
|
||||
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
|
||||
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
to manage warnings.
|
||||
|
||||
```diff
|
||||
import spacy
|
||||
+ import warnings
|
||||
|
||||
- spacy.errors.SPACY_WARNING_IGNORE.append('W007')
|
||||
+ warnings.filterwarnings("ignore", message=r"\\[W007\\]", category=UserWarning)
|
||||
```
|
||||
|
||||
#### Normalization tables
|
||||
|
||||
The normalization tables have moved from the language data in
|
||||
|
@ -174,6 +182,65 @@ If you're adding data for a new language, the normalization table should be
|
|||
added to `spacy-lookups-data`. See
|
||||
[adding norm exceptions](/usage/adding-languages#norm-exceptions).
|
||||
|
||||
#### No preloaded vocab for models with vectors
|
||||
|
||||
To reduce the initial loading time, the lexemes in `nlp.vocab` are no longer
|
||||
loaded on initialization for models with vectors. As you process texts, the
|
||||
lexemes will be added to the vocab automatically, just as in small models
|
||||
without vectors.
|
||||
|
||||
To see the number of unique vectors and number of words with vectors, see
|
||||
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
|
||||
unique vectors and `684830` words with vectors:
|
||||
|
||||
```python
|
||||
{
|
||||
'width': 300,
|
||||
'vectors': 20000,
|
||||
'keys': 684830,
|
||||
'name': 'en_core_web_md.vectors'
|
||||
}
|
||||
```
|
||||
|
||||
If required, for instance if you are working directly with word vectors rather
|
||||
than processing texts, you can load all lexemes for words with vectors at once:
|
||||
|
||||
```python
|
||||
for orth in nlp.vocab.vectors:
|
||||
_ = nlp.vocab[orth]
|
||||
```
|
||||
|
||||
If your workflow previously iterated over `nlp.vocab`, a similar alternative
|
||||
is to iterate over words with vectors instead:
|
||||
|
||||
```diff
|
||||
- lexemes = [w for w in nlp.vocab]
|
||||
+ lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
|
||||
```
|
||||
|
||||
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
||||
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
||||
provided lexemes but only 685K words with vectors. The vectors have been
|
||||
updated for most languages in v2.2, but the English models contain the same
|
||||
vectors for both v2.2 and v2.3.
|
||||
|
||||
#### Lexeme.is_oov and Token.is_oov
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
Due to a bug, the values for `is_oov` are reversed in v2.3.0, but this will be
|
||||
fixed in the next patch release v2.3.1.
|
||||
|
||||
</Infobox>
|
||||
|
||||
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
||||
have a word vector. This is equivalent to `token.orth not in
|
||||
nlp.vocab.vectors`.
|
||||
|
||||
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
||||
probability and cluster features. The probability and cluster features are no
|
||||
longer included in the provided medium and large models (see the next section).
|
||||
|
||||
#### Probability and cluster features
|
||||
|
||||
> #### Load and save extra prob lookups table
|
||||
|
@ -201,6 +268,28 @@ model vocab, which will take a few seconds on initial loading. When you save
|
|||
this model after loading the `prob` table, the full `prob` table will be saved
|
||||
as part of the model vocab.
|
||||
|
||||
To load the probability table into a provided model, first make sure you have
|
||||
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
||||
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
|
||||
table from `spacy-lookups-data`:
|
||||
|
||||
```diff
|
||||
+ # prerequisite: pip install spacy-lookups-data
|
||||
import spacy
|
||||
|
||||
nlp = spacy.load("en_core_web_md")
|
||||
|
||||
# remove the empty placeholder prob table
|
||||
+ if nlp.vocab.lookups_extra.has_table("lexeme_prob"):
|
||||
+ nlp.vocab.lookups_extra.remove_table("lexeme_prob")
|
||||
|
||||
# access any `.prob` to load the full table into the model
|
||||
assert nlp.vocab["a"].prob == -3.9297883511
|
||||
|
||||
# if desired, save this model with the probability table included
|
||||
nlp.to_disk("/path/to/model")
|
||||
```
|
||||
|
||||
If you'd like to include custom `cluster`, `prob`, or `sentiment` tables as part
|
||||
of a new model, add the data to
|
||||
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) under
|
||||
|
@ -218,3 +307,39 @@ When you initialize a new model with [`spacy init-model`](/api/cli#init-model),
|
|||
the `prob` table from `spacy-lookups-data` may be loaded as part of the
|
||||
initialization. If you'd like to omit this extra data as in spaCy's provided
|
||||
v2.3 models, use the new flag `--omit-extra-lookups`.
|
||||
|
||||
#### Tag maps in provided models vs. blank models
|
||||
|
||||
The tag maps in the provided models may differ from the tag maps in the spaCy
|
||||
library. You can access the tag map in a loaded model under
|
||||
`nlp.vocab.morphology.tag_map`.
|
||||
|
||||
The tag map from `spacy.lang.lg.tag_map` is still used when a blank model is
|
||||
initialized. If you want to provide an alternate tag map, update
|
||||
`nlp.vocab.morphology.tag_map` after initializing the model or if you're using
|
||||
the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
|
||||
provide in the tag map as a JSON dict.
|
||||
|
||||
If you want to export a tag map from a provided model for use with the train
|
||||
CLI, you can save it as a JSON dict. To only use string keys as required by
|
||||
JSON and to make it easier to read and edit, any internal integer IDs need to
|
||||
be converted back to strings:
|
||||
|
||||
```python
|
||||
import spacy
|
||||
import srsly
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
tag_map = {}
|
||||
|
||||
# convert any integer IDs to strings for JSON
|
||||
for tag, morph in nlp.vocab.morphology.tag_map.items():
|
||||
tag_map[tag] = {}
|
||||
for feat, val in morph.items():
|
||||
feat = nlp.vocab.strings.as_string(feat)
|
||||
if not isinstance(val, bool):
|
||||
val = nlp.vocab.strings.as_string(val)
|
||||
tag_map[tag][feat] = val
|
||||
|
||||
srsly.write_json("tag_map.json", tag_map)
|
||||
```
|
||||
|
|
17
website/docs/usage/v3.md
Normal file
17
website/docs/usage/v3.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
title: What's New in v3.0
|
||||
teaser: New features, backwards incompatibilities and migration guide
|
||||
menu:
|
||||
- ['Summary', 'summary']
|
||||
- ['New Features', 'features']
|
||||
- ['Backwards Incompatibilities', 'incompat']
|
||||
- ['Migrating from v2.x', 'migrating']
|
||||
---
|
||||
|
||||
## Summary {#summary}
|
||||
|
||||
## New Features {#features}
|
||||
|
||||
## Backwards Incompatibilities {#incompat}
|
||||
|
||||
## Migrating from v2.x {#migrating}
|
|
@ -15,6 +15,11 @@ const universe = require('./meta/universe.json')
|
|||
|
||||
const DEFAULT_TEMPLATE = path.resolve('./src/templates/index.js')
|
||||
|
||||
const isNightly = !!+process.env.SPACY_NIGHTLY || site.nightlyBranches.includes(process.env.BRANCH)
|
||||
const favicon = isNightly ? `src/images/icon_nightly.png` : `src/images/icon.png`
|
||||
const binderBranch = isNightly ? 'nightly' : site.binderBranch
|
||||
const siteUrl = isNightly ? site.siteUrlNightly : site.siteUrl
|
||||
|
||||
module.exports = {
|
||||
siteMetadata: {
|
||||
...site,
|
||||
|
@ -22,6 +27,9 @@ module.exports = {
|
|||
sidebars,
|
||||
...models,
|
||||
universe,
|
||||
nightly: isNightly,
|
||||
binderBranch,
|
||||
siteUrl,
|
||||
},
|
||||
|
||||
plugins: [
|
||||
|
@ -128,7 +136,7 @@ module.exports = {
|
|||
background_color: site.theme,
|
||||
theme_color: site.theme,
|
||||
display: `minimal-ui`,
|
||||
icon: `src/images/icon.png`,
|
||||
icon: favicon,
|
||||
},
|
||||
},
|
||||
{
|
||||
|
@ -140,6 +148,23 @@ module.exports = {
|
|||
respectDNT: true,
|
||||
},
|
||||
},
|
||||
{
|
||||
resolve: 'gatsby-plugin-robots-txt',
|
||||
options: {
|
||||
host: siteUrl,
|
||||
sitemap: `${siteUrl}/sitemap.xml`,
|
||||
// If we're in a special state (nightly, legacy) prevent indexing
|
||||
resolveEnv: () => (isNightly ? 'development' : 'production'),
|
||||
env: {
|
||||
production: {
|
||||
policy: [{ userAgent: '*', allow: '/' }],
|
||||
},
|
||||
development: {
|
||||
policy: [{ userAgent: '*', disallow: ['/'] }],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
`gatsby-plugin-offline`,
|
||||
],
|
||||
}
|
||||
|
|
|
@ -78,11 +78,14 @@
|
|||
"name": "Japanese",
|
||||
"models": ["ja_core_news_sm", "ja_core_news_md", "ja_core_news_lg"],
|
||||
"dependencies": [
|
||||
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
||||
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
||||
{
|
||||
"name": "SudachiPy",
|
||||
"url": "https://github.com/WorksApplications/SudachiPy"
|
||||
}
|
||||
],
|
||||
"example": "これは文章です。",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
|
@ -191,17 +194,6 @@
|
|||
"example": "นี่คือประโยค",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
"code": "ja",
|
||||
"name": "Japanese",
|
||||
"dependencies": [
|
||||
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
||||
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
||||
{ "name": "fugashi", "url": "https://github.com/polm/fugashi" }
|
||||
],
|
||||
"example": "これは文章です。",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
"code": "ko",
|
||||
"name": "Korean",
|
||||
|
|
|
@ -8,11 +8,7 @@
|
|||
{ "text": "Installation", "url": "/usage" },
|
||||
{ "text": "Models & Languages", "url": "/usage/models" },
|
||||
{ "text": "Facts & Figures", "url": "/usage/facts-figures" },
|
||||
{ "text": "spaCy 101", "url": "/usage/spacy-101" },
|
||||
{ "text": "New in v2.3", "url": "/usage/v2-3" },
|
||||
{ "text": "New in v2.2", "url": "/usage/v2-2" },
|
||||
{ "text": "New in v2.1", "url": "/usage/v2-1" },
|
||||
{ "text": "New in v2.0", "url": "/usage/v2" }
|
||||
{ "text": "New in v3.0", "url": "/usage/v3" }
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -3,6 +3,8 @@
|
|||
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
|
||||
"slogan": "Industrial-strength Natural Language Processing in Python",
|
||||
"siteUrl": "https://spacy.io",
|
||||
"siteUrlNightly": "https://nightly.spacy.io",
|
||||
"nightlyBranches": ["spacy.io-develop"],
|
||||
"email": "contact@explosion.ai",
|
||||
"company": "Explosion AI",
|
||||
"companyUrl": "https://explosion.ai",
|
||||
|
|
13524
website/package-lock.json
generated
13524
website/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
|
@ -16,7 +16,7 @@
|
|||
"autoprefixer": "^9.4.7",
|
||||
"classnames": "^2.2.6",
|
||||
"codemirror": "^5.43.0",
|
||||
"gatsby": "^2.1.18",
|
||||
"gatsby": "^2.11.1",
|
||||
"gatsby-image": "^2.0.29",
|
||||
"gatsby-mdx": "^0.3.6",
|
||||
"gatsby-plugin-catch-links": "^2.0.11",
|
||||
|
@ -25,6 +25,7 @@
|
|||
"gatsby-plugin-offline": "^2.0.24",
|
||||
"gatsby-plugin-react-helmet": "^3.0.6",
|
||||
"gatsby-plugin-react-svg": "^2.0.0",
|
||||
"gatsby-plugin-robots-txt": "^1.5.1",
|
||||
"gatsby-plugin-sass": "^2.0.10",
|
||||
"gatsby-plugin-sharp": "^2.0.20",
|
||||
"gatsby-plugin-sitemap": "^2.0.5",
|
||||
|
@ -52,6 +53,7 @@
|
|||
"scripts": {
|
||||
"build": "gatsby build",
|
||||
"dev": "gatsby develop",
|
||||
"dev:nightly": "BRANCH=spacy.io-develop npm run dev",
|
||||
"lint": "eslint **",
|
||||
"clear": "rm -rf .cache",
|
||||
"test": "echo \"Write tests! -> https://gatsby.app/unit-testing\""
|
||||
|
|
|
@ -27,7 +27,7 @@ Button.defaultProps = {
|
|||
}
|
||||
|
||||
Button.propTypes = {
|
||||
to: PropTypes.string.isRequired,
|
||||
to: PropTypes.string,
|
||||
variant: PropTypes.oneOf(['primary', 'secondary', 'tertiary']),
|
||||
large: PropTypes.bool,
|
||||
icon: PropTypes.string,
|
||||
|
|
|
@ -19,6 +19,7 @@ import { ReactComponent as NoIcon } from '../images/icons/no.svg'
|
|||
import { ReactComponent as NeutralIcon } from '../images/icons/neutral.svg'
|
||||
import { ReactComponent as OfflineIcon } from '../images/icons/offline.svg'
|
||||
import { ReactComponent as SearchIcon } from '../images/icons/search.svg'
|
||||
import { ReactComponent as MoonIcon } from '../images/icons/moon.svg'
|
||||
|
||||
import classes from '../styles/icon.module.sass'
|
||||
|
||||
|
@ -41,6 +42,7 @@ const icons = {
|
|||
neutral: NeutralIcon,
|
||||
offline: OfflineIcon,
|
||||
search: SearchIcon,
|
||||
moon: MoonIcon,
|
||||
}
|
||||
|
||||
const Icon = ({ name, width, height, inline, variant, className }) => {
|
||||
|
|
|
@ -2,7 +2,9 @@ import React, { Fragment } from 'react'
|
|||
import classNames from 'classnames'
|
||||
|
||||
import pattern from '../images/pattern_blue.jpg'
|
||||
import patternNightly from '../images/pattern_nightly.jpg'
|
||||
import patternOverlay from '../images/pattern_landing.jpg'
|
||||
import patternOverlayNightly from '../images/pattern_landing_nightly.jpg'
|
||||
import logoSvgs from '../images/logos'
|
||||
|
||||
import Grid from './grid'
|
||||
|
@ -14,9 +16,10 @@ import Link from './link'
|
|||
import { chunkArray } from './util'
|
||||
import classes from '../styles/landing.module.sass'
|
||||
|
||||
export const LandingHeader = ({ style = {}, children }) => {
|
||||
const wrapperStyle = { backgroundImage: `url(${pattern})` }
|
||||
const contentStyle = { backgroundImage: `url(${patternOverlay})`, ...style }
|
||||
export const LandingHeader = ({ nightly, style = {}, children }) => {
|
||||
const overlay = nightly ? patternOverlayNightly : patternOverlay
|
||||
const wrapperStyle = { backgroundImage: `url(${nightly ? patternNightly : pattern})` }
|
||||
const contentStyle = { backgroundImage: `url(${overlay})`, ...style }
|
||||
return (
|
||||
<header className={classes.header}>
|
||||
<div className={classes.headerWrapper} style={wrapperStyle}>
|
||||
|
|
|
@ -5,15 +5,22 @@ import classNames from 'classnames'
|
|||
import patternBlue from '../images/pattern_blue.jpg'
|
||||
import patternGreen from '../images/pattern_green.jpg'
|
||||
import patternPurple from '../images/pattern_purple.jpg'
|
||||
import patternNightly from '../images/pattern_nightly.jpg'
|
||||
import classes from '../styles/main.module.sass'
|
||||
|
||||
const patterns = { blue: patternBlue, green: patternGreen, purple: patternPurple }
|
||||
const patterns = {
|
||||
blue: patternBlue,
|
||||
green: patternGreen,
|
||||
purple: patternPurple,
|
||||
nightly: patternNightly,
|
||||
}
|
||||
|
||||
export const Content = ({ Component = 'div', className, children }) => (
|
||||
<Component className={classNames(classes.content, className)}>{children}</Component>
|
||||
)
|
||||
|
||||
const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
||||
const pattern = patterns[theme]
|
||||
const mainClassNames = classNames(classes.root, {
|
||||
[classes.withSidebar]: sidebar,
|
||||
[classes.withAsides]: asides,
|
||||
|
@ -23,10 +30,7 @@ const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
|||
<main className={mainClassNames}>
|
||||
{wrapContent ? <Content Component="article">{children}</Content> : children}
|
||||
{asides && (
|
||||
<div
|
||||
className={classes.asides}
|
||||
style={{ backgroundImage: `url(${patterns[theme]}` }}
|
||||
/>
|
||||
<div className={classes.asides} style={{ backgroundImage: `url(${pattern}` }} />
|
||||
)}
|
||||
{footer}
|
||||
</main>
|
||||
|
|
|
@ -6,6 +6,7 @@ import { StaticQuery, graphql } from 'gatsby'
|
|||
import socialImageDefault from '../images/social_default.jpg'
|
||||
import socialImageApi from '../images/social_api.jpg'
|
||||
import socialImageUniverse from '../images/social_universe.jpg'
|
||||
import socialImageNightly from '../images/social_nightly.jpg'
|
||||
|
||||
function getPageTitle(title, sitename, slogan, sectionTitle) {
|
||||
if (sectionTitle && title) {
|
||||
|
@ -17,13 +18,14 @@ function getPageTitle(title, sitename, slogan, sectionTitle) {
|
|||
return `${sitename} · ${slogan}`
|
||||
}
|
||||
|
||||
function getImage(section) {
|
||||
function getImage(section, nightly) {
|
||||
if (nightly) return socialImageNightly
|
||||
if (section === 'api') return socialImageApi
|
||||
if (section === 'universe') return socialImageUniverse
|
||||
return socialImageDefault
|
||||
}
|
||||
|
||||
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) => (
|
||||
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass, nightly }) => (
|
||||
<StaticQuery
|
||||
query={query}
|
||||
render={data => {
|
||||
|
@ -35,7 +37,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
|
|||
siteMetadata.slogan,
|
||||
sectionTitle
|
||||
)
|
||||
const socialImage = siteMetadata.siteUrl + getImage(section)
|
||||
const socialImage = siteMetadata.siteUrl + getImage(section, nightly)
|
||||
const meta = [
|
||||
{
|
||||
name: 'description',
|
||||
|
|
|
@ -11,6 +11,9 @@ const Tag = ({ spaced, variant, tooltip, children }) => {
|
|||
const isValid = isString(children) && !isNaN(children)
|
||||
const version = isValid ? Number(children).toFixed(1) : children
|
||||
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
||||
// TODO: we probably want to handle this more elegantly, but the idea is
|
||||
// that we can hide tags referring to old versions
|
||||
// const hideTag = version.startsWith('2')
|
||||
return (
|
||||
<TagTemplate spaced={spaced} tooltip={tooltipText}>
|
||||
v{version}
|
||||
|
|
BIN
website/src/images/icon_nightly.png
Normal file
BIN
website/src/images/icon_nightly.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 18 KiB |
3
website/src/images/icons/moon.svg
Normal file
3
website/src/images/icons/moon.svg
Normal file
|
@ -0,0 +1,3 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32">
|
||||
<path d="M10.895 7.574c0 7.55 5.179 13.67 11.567 13.67 1.588 0 3.101-0.38 4.479-1.063-1.695 4.46-5.996 7.636-11.051 7.636-6.533 0-11.83-5.297-11.83-11.83 0-4.82 2.888-8.959 7.023-10.803-0.116 0.778-0.188 1.573-0.188 2.39z"></path>
|
||||
</svg>
|
After Width: | Height: | Size: 322 B |
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 126 KiB |
BIN
website/src/images/pattern_nightly.jpg
Normal file
BIN
website/src/images/pattern_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 157 KiB |
BIN
website/src/images/social_nightly.jpg
Normal file
BIN
website/src/images/social_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 354 KiB |
47
website/src/pages/404.js
Normal file
47
website/src/pages/404.js
Normal file
|
@ -0,0 +1,47 @@
|
|||
import React from 'react'
|
||||
import { window } from 'browser-monads'
|
||||
import { graphql } from 'gatsby'
|
||||
|
||||
import Template from '../templates/index'
|
||||
import { LandingHeader, LandingTitle } from '../components/landing'
|
||||
import Button from '../components/button'
|
||||
|
||||
export default ({ data, location }) => {
|
||||
const { nightly } = data.site.siteMetadata
|
||||
const pageContext = { title: '404 Error', searchExclude: true, isIndex: false }
|
||||
return (
|
||||
<Template data={data} pageContext={pageContext} location={location}>
|
||||
<LandingHeader style={{ minHeight: 400 }} nightly={nightly}>
|
||||
<LandingTitle>
|
||||
Ooops, this page
|
||||
<br />
|
||||
does not exist!
|
||||
</LandingTitle>
|
||||
<br />
|
||||
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
||||
Click here to go back
|
||||
</Button>
|
||||
</LandingHeader>
|
||||
</Template>
|
||||
)
|
||||
}
|
||||
|
||||
export const pageQuery = graphql`
|
||||
query {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
title
|
||||
description
|
||||
navigation {
|
||||
text
|
||||
url
|
||||
}
|
||||
docSearch {
|
||||
apiKey
|
||||
indexName
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
`
|
|
@ -1,7 +0,0 @@
|
|||
---
|
||||
title: 404 Error
|
||||
---
|
||||
|
||||
import Error from 'widgets/404.js'
|
||||
|
||||
<Error />
|
|
@ -3,11 +3,14 @@
|
|||
bottom: 0
|
||||
left: 0
|
||||
width: 100%
|
||||
background: var(--color-subtle-light)
|
||||
background: var(--color-back)
|
||||
z-index: 100
|
||||
font: var(--font-size-sm)/var(--line-height-md) var(--font-primary)
|
||||
text-align: center
|
||||
padding: 1rem
|
||||
box-shadow: var(--box-shadow)
|
||||
border-top: 2px solid
|
||||
color: var(--color-theme)
|
||||
|
||||
.warning
|
||||
--alert-bg: var(--color-yellow-light)
|
||||
|
|
|
@ -47,6 +47,11 @@
|
|||
--color-theme-purple-light: hsla(255, 61%, 54%, 0.06)
|
||||
--color-theme-purple-opaque: hsla(255, 61%, 54%, 0.11)
|
||||
|
||||
--color-theme-nightly: hsl(257, 99%, 67%)
|
||||
--color-theme-nightly-dark: hsl(257, 99%, 57%)
|
||||
--color-theme-nightly-light: hsla(257, 99%, 67%, 0.06)
|
||||
--color-theme-nightly-opaque: hsla(257, 99%, 67%, 0.11)
|
||||
|
||||
// Regular colors
|
||||
--color-back: hsl(0, 0%, 100%)
|
||||
--color-front: hsl(213, 15%, 12%)
|
||||
|
@ -106,6 +111,12 @@
|
|||
--color-theme-light: var(--color-theme-purple-light)
|
||||
--color-theme-opaque: var(--color-theme-purple-opaque)
|
||||
|
||||
.theme-nightly
|
||||
--color-theme: var(--color-theme-nightly)
|
||||
--color-theme-dark: var(--color-theme-nightly-dark)
|
||||
--color-theme-light: var(--color-theme-nightly-light)
|
||||
--color-theme-opaque: var(--color-theme-nightly-opaque)
|
||||
|
||||
|
||||
/* Fonts */
|
||||
|
||||
|
|
|
@ -22,6 +22,9 @@ $crumb-bar: 2px
|
|||
& > *
|
||||
padding: 0 2rem 0.35rem
|
||||
|
||||
&:last-child
|
||||
margin-bottom: 5rem
|
||||
|
||||
.label
|
||||
color: var(--color-dark)
|
||||
font: bold var(--font-size-lg)/var(--line-height-md) var(--font-secondary)
|
||||
|
|
|
@ -31,7 +31,7 @@ const Docs = ({ pageContext, children }) => (
|
|||
theme,
|
||||
version,
|
||||
} = pageContext
|
||||
const { sidebars = [], modelsRepo, languages } = site.siteMetadata
|
||||
const { sidebars = [], modelsRepo, languages, nightly } = site.siteMetadata
|
||||
const isModels = section === 'models'
|
||||
const sidebar = pageContext.sidebar
|
||||
? { items: pageContext.sidebar }
|
||||
|
@ -83,7 +83,7 @@ const Docs = ({ pageContext, children }) => (
|
|||
{sidebar && <Sidebar items={sidebar.items} pageMenu={pageMenu} slug={slug} />}
|
||||
<Main
|
||||
section={section}
|
||||
theme={theme}
|
||||
theme={nightly ? 'nightly' : theme}
|
||||
sidebar
|
||||
asides
|
||||
wrapContent
|
||||
|
@ -146,6 +146,7 @@ const query = graphql`
|
|||
models
|
||||
starters
|
||||
}
|
||||
nightly
|
||||
sidebars {
|
||||
section
|
||||
items {
|
||||
|
|
|
@ -75,10 +75,23 @@ const scopeComponents = {
|
|||
InlineCode,
|
||||
}
|
||||
|
||||
const AlertSpace = () => {
|
||||
const AlertSpace = ({ nightly }) => {
|
||||
const isOnline = useOnlineStatus()
|
||||
return (
|
||||
<>
|
||||
{nightly && (
|
||||
<Alert
|
||||
title="You're viewing the pre-release docs."
|
||||
icon="moon"
|
||||
closeOnClick={false}
|
||||
>
|
||||
The page reflects{' '}
|
||||
<Link to="https://pypi.org/project/spacy-nightly/">
|
||||
<InlineCode>spacy-nightly</InlineCode>
|
||||
</Link>
|
||||
, not the latest <Link to="https://spacy.io">stable version</Link>.
|
||||
</Alert>
|
||||
)}
|
||||
{!isOnline && (
|
||||
<Alert title="Looks like you're offline." icon="offline" variant="warning">
|
||||
But don't worry, your visited pages should be saved for you.
|
||||
|
@ -130,9 +143,10 @@ class Layout extends React.Component {
|
|||
const { data, pageContext, location, children } = this.props
|
||||
const { file, site = {} } = data || {}
|
||||
const mdx = file ? file.childMdx : null
|
||||
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
||||
const bodyClass = classNames(`theme-${theme}`, { 'search-exclude': !!searchExclude })
|
||||
const meta = site.siteMetadata || {}
|
||||
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
||||
const uiTheme = meta.nightly ? 'nightly' : theme
|
||||
const bodyClass = classNames(`theme-${uiTheme}`, { 'search-exclude': !!searchExclude })
|
||||
const isDocs = ['usage', 'models', 'api', 'styleguide'].includes(section)
|
||||
const content = !mdx ? null : (
|
||||
<MDXProvider components={mdxComponents}>
|
||||
|
@ -148,8 +162,9 @@ class Layout extends React.Component {
|
|||
section={section}
|
||||
sectionTitle={sectionTitle}
|
||||
bodyClass={bodyClass}
|
||||
nightly={meta.nightly}
|
||||
/>
|
||||
<AlertSpace />
|
||||
<AlertSpace nightly={meta.nightly} />
|
||||
<Navigation
|
||||
title={meta.title}
|
||||
items={meta.navigation}
|
||||
|
@ -167,11 +182,11 @@ class Layout extends React.Component {
|
|||
mdxComponents={mdxComponents}
|
||||
/>
|
||||
) : (
|
||||
<>
|
||||
<div>
|
||||
{children}
|
||||
{content}
|
||||
<Footer wide />
|
||||
</>
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
)
|
||||
|
@ -184,6 +199,7 @@ export const pageQuery = graphql`
|
|||
query($slug: String!) {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
title
|
||||
description
|
||||
navigation {
|
||||
|
|
|
@ -30,8 +30,8 @@ function filterResources(resources, data) {
|
|||
return sorted.filter(res => (res.category || []).includes(data.id))
|
||||
}
|
||||
|
||||
const UniverseContent = ({ content = [], categories, pageContext, location, mdxComponents }) => {
|
||||
const { theme, data = {} } = pageContext
|
||||
const UniverseContent = ({ content = [], categories, theme, pageContext, mdxComponents }) => {
|
||||
const { data = {} } = pageContext
|
||||
const filteredResources = filterResources(content, data)
|
||||
const activeData = data ? content.find(({ id }) => id === data.id) : null
|
||||
const markdownComponents = { ...mdxComponents, code: InlineCode }
|
||||
|
@ -302,15 +302,16 @@ const Universe = ({ pageContext, location, mdxComponents }) => (
|
|||
<StaticQuery
|
||||
query={query}
|
||||
render={data => {
|
||||
const content = data.site.siteMetadata.universe.resources
|
||||
const categories = data.site.siteMetadata.universe.categories
|
||||
const { universe, nightly } = data.site.siteMetadata
|
||||
const theme = nightly ? 'nightly' : pageContext.theme
|
||||
return (
|
||||
<UniverseContent
|
||||
content={content}
|
||||
categories={categories}
|
||||
content={universe.resources}
|
||||
categories={universe.categories}
|
||||
pageContext={pageContext}
|
||||
location={location}
|
||||
mdxComponents={mdxComponents}
|
||||
theme={theme}
|
||||
/>
|
||||
)
|
||||
}}
|
||||
|
@ -323,6 +324,7 @@ const query = graphql`
|
|||
query UniverseQuery {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
universe {
|
||||
resources {
|
||||
type
|
||||
|
|
|
@ -1,19 +0,0 @@
|
|||
import React from 'react'
|
||||
import { window } from 'browser-monads'
|
||||
|
||||
import { LandingHeader, LandingTitle } from '../components/landing'
|
||||
import Button from '../components/button'
|
||||
|
||||
export default () => (
|
||||
<LandingHeader style={{ minHeight: 400 }}>
|
||||
<LandingTitle>
|
||||
Ooops, this page
|
||||
<br />
|
||||
does not exist!
|
||||
</LandingTitle>
|
||||
<br />
|
||||
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
||||
Click here to go back
|
||||
</Button>
|
||||
</LandingHeader>
|
||||
)
|
|
@ -68,7 +68,7 @@ const Landing = ({ data }) => {
|
|||
const counts = getCounts(data.languages)
|
||||
return (
|
||||
<>
|
||||
<LandingHeader>
|
||||
<LandingHeader nightly={data.nightly}>
|
||||
<LandingTitle>
|
||||
Industrial-Strength
|
||||
<br />
|
||||
|
@ -268,6 +268,7 @@ const landingQuery = graphql`
|
|||
query LandingQuery {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
repo
|
||||
languages {
|
||||
models
|
||||
|
|
Loading…
Reference in New Issue
Block a user