mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 02:06:31 +03:00
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
This commit is contained in:
commit
cb51bb637b
106
.github/contributors/hertelm.md
vendored
Normal file
106
.github/contributors/hertelm.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Matthias Hertel |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | June 29, 2020 |
|
||||
| GitHub username | hertelm |
|
||||
| Website (optional) | |
|
|
@ -1,6 +1,8 @@
|
|||
redirects = [
|
||||
# Netlify
|
||||
{from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
|
||||
# Subdomain for branches
|
||||
{from = "https://nightly.spacy.io/*", to="https://spacy-io-develop.spacy.io/:splat", force = true, status = 200},
|
||||
# Old subdomains
|
||||
{from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
{from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
|
|
|
@ -242,12 +242,16 @@ def project_clone(
|
|||
try:
|
||||
run_command(cmd)
|
||||
except SystemExit:
|
||||
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'"
|
||||
err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'."
|
||||
msg.fail(err)
|
||||
with (tmp_dir / ".git" / "info" / "sparse-checkout").open("w") as f:
|
||||
f.write(name)
|
||||
run_command(["git", "-C", str(tmp_dir), "fetch"])
|
||||
run_command(["git", "-C", str(tmp_dir), "checkout"])
|
||||
try:
|
||||
run_command(["git", "-C", str(tmp_dir), "fetch"])
|
||||
run_command(["git", "-C", str(tmp_dir), "checkout"])
|
||||
except SystemExit:
|
||||
err = f"Could not clone '{name}' in the repo '{repo}'."
|
||||
msg.fail(err)
|
||||
shutil.move(str(tmp_dir / Path(name).name), str(project_dir))
|
||||
msg.good(f"Cloned project '{name}' from {repo} into {project_dir}")
|
||||
for sub_dir in DIRS:
|
||||
|
@ -525,9 +529,9 @@ def update_dvc_config(
|
|||
outputs_no_cache = command.get("outputs_no_cache", [])
|
||||
if not deps and not outputs and not outputs_no_cache:
|
||||
continue
|
||||
# Default to "." as the project path since dvc.yaml is auto-generated
|
||||
# Default to the working dir as the project path since dvc.yaml is auto-generated
|
||||
# and we don't want arbitrary paths in there
|
||||
project_cmd = ["python", "-m", NAME, "project", ".", "exec", name]
|
||||
project_cmd = ["python", "-m", NAME, "project", "exec", name]
|
||||
deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
|
||||
outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
|
||||
outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
|
||||
|
|
|
@ -91,7 +91,7 @@ Match a stream of documents, yielding them in turn.
|
|||
> ```python
|
||||
> from spacy.matcher import PhraseMatcher
|
||||
> matcher = PhraseMatcher(nlp.vocab)
|
||||
> for doc in matcher.pipe(texts, batch_size=50):
|
||||
> for doc in matcher.pipe(docs, batch_size=50):
|
||||
> pass
|
||||
> ```
|
||||
|
||||
|
|
|
@ -46,19 +46,19 @@ Update the evaluation scores from a single [`Doc`](/api/doc) /
|
|||
|
||||
## Properties
|
||||
|
||||
| Name | Type | Description |
|
||||
| --------------------------------------------------- | ----- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| `token_acc` | float | Tokenization accuracy. |
|
||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||
| `uas` | float | Unlabelled dependency score. |
|
||||
| `las` | float | Labelled dependency score. |
|
||||
| `ents_p` | float | Named entity accuracy (precision). |
|
||||
| `ents_r` | float | Named entity accuracy (recall). |
|
||||
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||
| `ents_per_type` <Tag variant="new">2.1.5</Tag> | dict | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores. |
|
||||
| Name | Type | Description |
|
||||
| --------------------------------------------------- | ----- | -------------------------------------------------------------------------------------- |
|
||||
| `token_acc` | float | Tokenization accuracy. |
|
||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||
| `uas` | float | Unlabelled dependency score. |
|
||||
| `las` | float | Labelled dependency score. |
|
||||
| `ents_p` | float | Named entity accuracy (precision). |
|
||||
| `ents_r` | float | Named entity accuracy (recall). |
|
||||
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||
| `ents_per_type` <Tag variant="new">2.1.5</Tag> | dict | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores. |
|
||||
| `textcat_f` <Tag variant="new">3.0</Tag> | float | F-score on positive label for binary classification, macro-averaged F-score otherwise. |
|
||||
| `textcat_auc` <Tag variant="new"3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
||||
| `textcats_f_per_cat` <Tag variant="new">3.0</Tag> | dict | F-scores per textcat label, keyed by label. |
|
||||
| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict | ROC AUC scores per textcat label, keyed by label. |
|
||||
| `las_per_type` <Tag variant="new">2.2.3</Tag> | dict | Labelled dependency scores, keyed by label. |
|
||||
| `scores` | dict | All scores, keyed by type. |
|
||||
| `textcat_auc` <Tag variant="new">3.0</Tag> | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined). |
|
||||
| `textcats_f_per_cat` <Tag variant="new">3.0</Tag> | dict | F-scores per textcat label, keyed by label. |
|
||||
| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict | ROC AUC scores per textcat label, keyed by label. |
|
||||
| `las_per_type` <Tag variant="new">2.2.3</Tag> | dict | Labelled dependency scores, keyed by label. |
|
||||
| `scores` | dict | All scores, keyed by type. |
|
||||
|
|
|
@ -122,7 +122,7 @@ for match_id, start, end in matches:
|
|||
```
|
||||
|
||||
The matcher returns a list of `(match_id, start, end)` tuples – in this case,
|
||||
`[('15578876784678163569', 0, 2)]`, which maps to the span `doc[0:2]` of our
|
||||
`[('15578876784678163569', 0, 3)]`, which maps to the span `doc[0:3]` of our
|
||||
original document. The `match_id` is the [hash value](/usage/spacy-101#vocab) of
|
||||
the string ID "HelloWorld". To get the string value, you can look up the ID in
|
||||
the [`StringStore`](/api/stringstore).
|
||||
|
|
|
@ -161,10 +161,18 @@ debugging your tokenizer configuration.
|
|||
|
||||
spaCy's custom warnings have been replaced with native Python
|
||||
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
||||
setting `SPACY_WARNING_IGNORE`, use the
|
||||
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
|
||||
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||
to manage warnings.
|
||||
|
||||
```diff
|
||||
import spacy
|
||||
+ import warnings
|
||||
|
||||
- spacy.errors.SPACY_WARNING_IGNORE.append('W007')
|
||||
+ warnings.filterwarnings("ignore", message=r"\\[W007\\]", category=UserWarning)
|
||||
```
|
||||
|
||||
#### Normalization tables
|
||||
|
||||
The normalization tables have moved from the language data in
|
||||
|
@ -174,6 +182,65 @@ If you're adding data for a new language, the normalization table should be
|
|||
added to `spacy-lookups-data`. See
|
||||
[adding norm exceptions](/usage/adding-languages#norm-exceptions).
|
||||
|
||||
#### No preloaded vocab for models with vectors
|
||||
|
||||
To reduce the initial loading time, the lexemes in `nlp.vocab` are no longer
|
||||
loaded on initialization for models with vectors. As you process texts, the
|
||||
lexemes will be added to the vocab automatically, just as in small models
|
||||
without vectors.
|
||||
|
||||
To see the number of unique vectors and number of words with vectors, see
|
||||
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
|
||||
unique vectors and `684830` words with vectors:
|
||||
|
||||
```python
|
||||
{
|
||||
'width': 300,
|
||||
'vectors': 20000,
|
||||
'keys': 684830,
|
||||
'name': 'en_core_web_md.vectors'
|
||||
}
|
||||
```
|
||||
|
||||
If required, for instance if you are working directly with word vectors rather
|
||||
than processing texts, you can load all lexemes for words with vectors at once:
|
||||
|
||||
```python
|
||||
for orth in nlp.vocab.vectors:
|
||||
_ = nlp.vocab[orth]
|
||||
```
|
||||
|
||||
If your workflow previously iterated over `nlp.vocab`, a similar alternative
|
||||
is to iterate over words with vectors instead:
|
||||
|
||||
```diff
|
||||
- lexemes = [w for w in nlp.vocab]
|
||||
+ lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
|
||||
```
|
||||
|
||||
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
||||
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
||||
provided lexemes but only 685K words with vectors. The vectors have been
|
||||
updated for most languages in v2.2, but the English models contain the same
|
||||
vectors for both v2.2 and v2.3.
|
||||
|
||||
#### Lexeme.is_oov and Token.is_oov
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
Due to a bug, the values for `is_oov` are reversed in v2.3.0, but this will be
|
||||
fixed in the next patch release v2.3.1.
|
||||
|
||||
</Infobox>
|
||||
|
||||
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
||||
have a word vector. This is equivalent to `token.orth not in
|
||||
nlp.vocab.vectors`.
|
||||
|
||||
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
||||
probability and cluster features. The probability and cluster features are no
|
||||
longer included in the provided medium and large models (see the next section).
|
||||
|
||||
#### Probability and cluster features
|
||||
|
||||
> #### Load and save extra prob lookups table
|
||||
|
@ -201,6 +268,28 @@ model vocab, which will take a few seconds on initial loading. When you save
|
|||
this model after loading the `prob` table, the full `prob` table will be saved
|
||||
as part of the model vocab.
|
||||
|
||||
To load the probability table into a provided model, first make sure you have
|
||||
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
||||
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
|
||||
table from `spacy-lookups-data`:
|
||||
|
||||
```diff
|
||||
+ # prerequisite: pip install spacy-lookups-data
|
||||
import spacy
|
||||
|
||||
nlp = spacy.load("en_core_web_md")
|
||||
|
||||
# remove the empty placeholder prob table
|
||||
+ if nlp.vocab.lookups_extra.has_table("lexeme_prob"):
|
||||
+ nlp.vocab.lookups_extra.remove_table("lexeme_prob")
|
||||
|
||||
# access any `.prob` to load the full table into the model
|
||||
assert nlp.vocab["a"].prob == -3.9297883511
|
||||
|
||||
# if desired, save this model with the probability table included
|
||||
nlp.to_disk("/path/to/model")
|
||||
```
|
||||
|
||||
If you'd like to include custom `cluster`, `prob`, or `sentiment` tables as part
|
||||
of a new model, add the data to
|
||||
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) under
|
||||
|
@ -218,3 +307,39 @@ When you initialize a new model with [`spacy init-model`](/api/cli#init-model),
|
|||
the `prob` table from `spacy-lookups-data` may be loaded as part of the
|
||||
initialization. If you'd like to omit this extra data as in spaCy's provided
|
||||
v2.3 models, use the new flag `--omit-extra-lookups`.
|
||||
|
||||
#### Tag maps in provided models vs. blank models
|
||||
|
||||
The tag maps in the provided models may differ from the tag maps in the spaCy
|
||||
library. You can access the tag map in a loaded model under
|
||||
`nlp.vocab.morphology.tag_map`.
|
||||
|
||||
The tag map from `spacy.lang.lg.tag_map` is still used when a blank model is
|
||||
initialized. If you want to provide an alternate tag map, update
|
||||
`nlp.vocab.morphology.tag_map` after initializing the model or if you're using
|
||||
the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
|
||||
provide in the tag map as a JSON dict.
|
||||
|
||||
If you want to export a tag map from a provided model for use with the train
|
||||
CLI, you can save it as a JSON dict. To only use string keys as required by
|
||||
JSON and to make it easier to read and edit, any internal integer IDs need to
|
||||
be converted back to strings:
|
||||
|
||||
```python
|
||||
import spacy
|
||||
import srsly
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
tag_map = {}
|
||||
|
||||
# convert any integer IDs to strings for JSON
|
||||
for tag, morph in nlp.vocab.morphology.tag_map.items():
|
||||
tag_map[tag] = {}
|
||||
for feat, val in morph.items():
|
||||
feat = nlp.vocab.strings.as_string(feat)
|
||||
if not isinstance(val, bool):
|
||||
val = nlp.vocab.strings.as_string(val)
|
||||
tag_map[tag][feat] = val
|
||||
|
||||
srsly.write_json("tag_map.json", tag_map)
|
||||
```
|
||||
|
|
17
website/docs/usage/v3.md
Normal file
17
website/docs/usage/v3.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
title: What's New in v3.0
|
||||
teaser: New features, backwards incompatibilities and migration guide
|
||||
menu:
|
||||
- ['Summary', 'summary']
|
||||
- ['New Features', 'features']
|
||||
- ['Backwards Incompatibilities', 'incompat']
|
||||
- ['Migrating from v2.x', 'migrating']
|
||||
---
|
||||
|
||||
## Summary {#summary}
|
||||
|
||||
## New Features {#features}
|
||||
|
||||
## Backwards Incompatibilities {#incompat}
|
||||
|
||||
## Migrating from v2.x {#migrating}
|
|
@ -15,6 +15,11 @@ const universe = require('./meta/universe.json')
|
|||
|
||||
const DEFAULT_TEMPLATE = path.resolve('./src/templates/index.js')
|
||||
|
||||
const isNightly = !!+process.env.SPACY_NIGHTLY || site.nightlyBranches.includes(process.env.BRANCH)
|
||||
const favicon = isNightly ? `src/images/icon_nightly.png` : `src/images/icon.png`
|
||||
const binderBranch = isNightly ? 'nightly' : site.binderBranch
|
||||
const siteUrl = isNightly ? site.siteUrlNightly : site.siteUrl
|
||||
|
||||
module.exports = {
|
||||
siteMetadata: {
|
||||
...site,
|
||||
|
@ -22,6 +27,9 @@ module.exports = {
|
|||
sidebars,
|
||||
...models,
|
||||
universe,
|
||||
nightly: isNightly,
|
||||
binderBranch,
|
||||
siteUrl,
|
||||
},
|
||||
|
||||
plugins: [
|
||||
|
@ -128,7 +136,7 @@ module.exports = {
|
|||
background_color: site.theme,
|
||||
theme_color: site.theme,
|
||||
display: `minimal-ui`,
|
||||
icon: `src/images/icon.png`,
|
||||
icon: favicon,
|
||||
},
|
||||
},
|
||||
{
|
||||
|
@ -140,6 +148,23 @@ module.exports = {
|
|||
respectDNT: true,
|
||||
},
|
||||
},
|
||||
{
|
||||
resolve: 'gatsby-plugin-robots-txt',
|
||||
options: {
|
||||
host: siteUrl,
|
||||
sitemap: `${siteUrl}/sitemap.xml`,
|
||||
// If we're in a special state (nightly, legacy) prevent indexing
|
||||
resolveEnv: () => (isNightly ? 'development' : 'production'),
|
||||
env: {
|
||||
production: {
|
||||
policy: [{ userAgent: '*', allow: '/' }],
|
||||
},
|
||||
development: {
|
||||
policy: [{ userAgent: '*', disallow: ['/'] }],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
`gatsby-plugin-offline`,
|
||||
],
|
||||
}
|
||||
|
|
|
@ -78,11 +78,14 @@
|
|||
"name": "Japanese",
|
||||
"models": ["ja_core_news_sm", "ja_core_news_md", "ja_core_news_lg"],
|
||||
"dependencies": [
|
||||
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
||||
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
||||
{
|
||||
"name": "SudachiPy",
|
||||
"url": "https://github.com/WorksApplications/SudachiPy"
|
||||
}
|
||||
],
|
||||
"example": "これは文章です。",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
|
@ -191,17 +194,6 @@
|
|||
"example": "นี่คือประโยค",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
"code": "ja",
|
||||
"name": "Japanese",
|
||||
"dependencies": [
|
||||
{ "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
|
||||
{ "name": "Mecab", "url": "https://github.com/taku910/mecab" },
|
||||
{ "name": "fugashi", "url": "https://github.com/polm/fugashi" }
|
||||
],
|
||||
"example": "これは文章です。",
|
||||
"has_examples": true
|
||||
},
|
||||
{
|
||||
"code": "ko",
|
||||
"name": "Korean",
|
||||
|
|
|
@ -8,11 +8,7 @@
|
|||
{ "text": "Installation", "url": "/usage" },
|
||||
{ "text": "Models & Languages", "url": "/usage/models" },
|
||||
{ "text": "Facts & Figures", "url": "/usage/facts-figures" },
|
||||
{ "text": "spaCy 101", "url": "/usage/spacy-101" },
|
||||
{ "text": "New in v2.3", "url": "/usage/v2-3" },
|
||||
{ "text": "New in v2.2", "url": "/usage/v2-2" },
|
||||
{ "text": "New in v2.1", "url": "/usage/v2-1" },
|
||||
{ "text": "New in v2.0", "url": "/usage/v2" }
|
||||
{ "text": "New in v3.0", "url": "/usage/v3" }
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -3,6 +3,8 @@
|
|||
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
|
||||
"slogan": "Industrial-strength Natural Language Processing in Python",
|
||||
"siteUrl": "https://spacy.io",
|
||||
"siteUrlNightly": "https://nightly.spacy.io",
|
||||
"nightlyBranches": ["spacy.io-develop"],
|
||||
"email": "contact@explosion.ai",
|
||||
"company": "Explosion AI",
|
||||
"companyUrl": "https://explosion.ai",
|
||||
|
|
13584
website/package-lock.json
generated
13584
website/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
|
@ -16,7 +16,7 @@
|
|||
"autoprefixer": "^9.4.7",
|
||||
"classnames": "^2.2.6",
|
||||
"codemirror": "^5.43.0",
|
||||
"gatsby": "^2.1.18",
|
||||
"gatsby": "^2.11.1",
|
||||
"gatsby-image": "^2.0.29",
|
||||
"gatsby-mdx": "^0.3.6",
|
||||
"gatsby-plugin-catch-links": "^2.0.11",
|
||||
|
@ -25,6 +25,7 @@
|
|||
"gatsby-plugin-offline": "^2.0.24",
|
||||
"gatsby-plugin-react-helmet": "^3.0.6",
|
||||
"gatsby-plugin-react-svg": "^2.0.0",
|
||||
"gatsby-plugin-robots-txt": "^1.5.1",
|
||||
"gatsby-plugin-sass": "^2.0.10",
|
||||
"gatsby-plugin-sharp": "^2.0.20",
|
||||
"gatsby-plugin-sitemap": "^2.0.5",
|
||||
|
@ -52,6 +53,7 @@
|
|||
"scripts": {
|
||||
"build": "gatsby build",
|
||||
"dev": "gatsby develop",
|
||||
"dev:nightly": "BRANCH=spacy.io-develop npm run dev",
|
||||
"lint": "eslint **",
|
||||
"clear": "rm -rf .cache",
|
||||
"test": "echo \"Write tests! -> https://gatsby.app/unit-testing\""
|
||||
|
|
|
@ -27,7 +27,7 @@ Button.defaultProps = {
|
|||
}
|
||||
|
||||
Button.propTypes = {
|
||||
to: PropTypes.string.isRequired,
|
||||
to: PropTypes.string,
|
||||
variant: PropTypes.oneOf(['primary', 'secondary', 'tertiary']),
|
||||
large: PropTypes.bool,
|
||||
icon: PropTypes.string,
|
||||
|
|
|
@ -19,6 +19,7 @@ import { ReactComponent as NoIcon } from '../images/icons/no.svg'
|
|||
import { ReactComponent as NeutralIcon } from '../images/icons/neutral.svg'
|
||||
import { ReactComponent as OfflineIcon } from '../images/icons/offline.svg'
|
||||
import { ReactComponent as SearchIcon } from '../images/icons/search.svg'
|
||||
import { ReactComponent as MoonIcon } from '../images/icons/moon.svg'
|
||||
|
||||
import classes from '../styles/icon.module.sass'
|
||||
|
||||
|
@ -41,6 +42,7 @@ const icons = {
|
|||
neutral: NeutralIcon,
|
||||
offline: OfflineIcon,
|
||||
search: SearchIcon,
|
||||
moon: MoonIcon,
|
||||
}
|
||||
|
||||
const Icon = ({ name, width, height, inline, variant, className }) => {
|
||||
|
|
|
@ -2,7 +2,9 @@ import React, { Fragment } from 'react'
|
|||
import classNames from 'classnames'
|
||||
|
||||
import pattern from '../images/pattern_blue.jpg'
|
||||
import patternNightly from '../images/pattern_nightly.jpg'
|
||||
import patternOverlay from '../images/pattern_landing.jpg'
|
||||
import patternOverlayNightly from '../images/pattern_landing_nightly.jpg'
|
||||
import logoSvgs from '../images/logos'
|
||||
|
||||
import Grid from './grid'
|
||||
|
@ -14,9 +16,10 @@ import Link from './link'
|
|||
import { chunkArray } from './util'
|
||||
import classes from '../styles/landing.module.sass'
|
||||
|
||||
export const LandingHeader = ({ style = {}, children }) => {
|
||||
const wrapperStyle = { backgroundImage: `url(${pattern})` }
|
||||
const contentStyle = { backgroundImage: `url(${patternOverlay})`, ...style }
|
||||
export const LandingHeader = ({ nightly, style = {}, children }) => {
|
||||
const overlay = nightly ? patternOverlayNightly : patternOverlay
|
||||
const wrapperStyle = { backgroundImage: `url(${nightly ? patternNightly : pattern})` }
|
||||
const contentStyle = { backgroundImage: `url(${overlay})`, ...style }
|
||||
return (
|
||||
<header className={classes.header}>
|
||||
<div className={classes.headerWrapper} style={wrapperStyle}>
|
||||
|
|
|
@ -5,15 +5,22 @@ import classNames from 'classnames'
|
|||
import patternBlue from '../images/pattern_blue.jpg'
|
||||
import patternGreen from '../images/pattern_green.jpg'
|
||||
import patternPurple from '../images/pattern_purple.jpg'
|
||||
import patternNightly from '../images/pattern_nightly.jpg'
|
||||
import classes from '../styles/main.module.sass'
|
||||
|
||||
const patterns = { blue: patternBlue, green: patternGreen, purple: patternPurple }
|
||||
const patterns = {
|
||||
blue: patternBlue,
|
||||
green: patternGreen,
|
||||
purple: patternPurple,
|
||||
nightly: patternNightly,
|
||||
}
|
||||
|
||||
export const Content = ({ Component = 'div', className, children }) => (
|
||||
<Component className={classNames(classes.content, className)}>{children}</Component>
|
||||
)
|
||||
|
||||
const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
||||
const pattern = patterns[theme]
|
||||
const mainClassNames = classNames(classes.root, {
|
||||
[classes.withSidebar]: sidebar,
|
||||
[classes.withAsides]: asides,
|
||||
|
@ -23,10 +30,7 @@ const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
|
|||
<main className={mainClassNames}>
|
||||
{wrapContent ? <Content Component="article">{children}</Content> : children}
|
||||
{asides && (
|
||||
<div
|
||||
className={classes.asides}
|
||||
style={{ backgroundImage: `url(${patterns[theme]}` }}
|
||||
/>
|
||||
<div className={classes.asides} style={{ backgroundImage: `url(${pattern}` }} />
|
||||
)}
|
||||
{footer}
|
||||
</main>
|
||||
|
|
|
@ -6,6 +6,7 @@ import { StaticQuery, graphql } from 'gatsby'
|
|||
import socialImageDefault from '../images/social_default.jpg'
|
||||
import socialImageApi from '../images/social_api.jpg'
|
||||
import socialImageUniverse from '../images/social_universe.jpg'
|
||||
import socialImageNightly from '../images/social_nightly.jpg'
|
||||
|
||||
function getPageTitle(title, sitename, slogan, sectionTitle) {
|
||||
if (sectionTitle && title) {
|
||||
|
@ -17,13 +18,14 @@ function getPageTitle(title, sitename, slogan, sectionTitle) {
|
|||
return `${sitename} · ${slogan}`
|
||||
}
|
||||
|
||||
function getImage(section) {
|
||||
function getImage(section, nightly) {
|
||||
if (nightly) return socialImageNightly
|
||||
if (section === 'api') return socialImageApi
|
||||
if (section === 'universe') return socialImageUniverse
|
||||
return socialImageDefault
|
||||
}
|
||||
|
||||
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) => (
|
||||
const SEO = ({ description, lang, title, section, sectionTitle, bodyClass, nightly }) => (
|
||||
<StaticQuery
|
||||
query={query}
|
||||
render={data => {
|
||||
|
@ -35,7 +37,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
|
|||
siteMetadata.slogan,
|
||||
sectionTitle
|
||||
)
|
||||
const socialImage = siteMetadata.siteUrl + getImage(section)
|
||||
const socialImage = siteMetadata.siteUrl + getImage(section, nightly)
|
||||
const meta = [
|
||||
{
|
||||
name: 'description',
|
||||
|
|
|
@ -11,6 +11,9 @@ const Tag = ({ spaced, variant, tooltip, children }) => {
|
|||
const isValid = isString(children) && !isNaN(children)
|
||||
const version = isValid ? Number(children).toFixed(1) : children
|
||||
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
||||
// TODO: we probably want to handle this more elegantly, but the idea is
|
||||
// that we can hide tags referring to old versions
|
||||
// const hideTag = version.startsWith('2')
|
||||
return (
|
||||
<TagTemplate spaced={spaced} tooltip={tooltipText}>
|
||||
v{version}
|
||||
|
|
BIN
website/src/images/icon_nightly.png
Normal file
BIN
website/src/images/icon_nightly.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
3
website/src/images/icons/moon.svg
Normal file
3
website/src/images/icons/moon.svg
Normal file
|
@ -0,0 +1,3 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32">
|
||||
<path d="M10.895 7.574c0 7.55 5.179 13.67 11.567 13.67 1.588 0 3.101-0.38 4.479-1.063-1.695 4.46-5.996 7.636-11.051 7.636-6.533 0-11.83-5.297-11.83-11.83 0-4.82 2.888-8.959 7.023-10.803-0.116 0.778-0.188 1.573-0.188 2.39z"></path>
|
||||
</svg>
|
After Width: | Height: | Size: 322 B |
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
BIN
website/src/images/pattern_landing_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 134 KiB |
BIN
website/src/images/pattern_nightly.jpg
Normal file
BIN
website/src/images/pattern_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 170 KiB |
BIN
website/src/images/social_nightly.jpg
Normal file
BIN
website/src/images/social_nightly.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 384 KiB |
47
website/src/pages/404.js
Normal file
47
website/src/pages/404.js
Normal file
|
@ -0,0 +1,47 @@
|
|||
import React from 'react'
|
||||
import { window } from 'browser-monads'
|
||||
import { graphql } from 'gatsby'
|
||||
|
||||
import Template from '../templates/index'
|
||||
import { LandingHeader, LandingTitle } from '../components/landing'
|
||||
import Button from '../components/button'
|
||||
|
||||
export default ({ data, location }) => {
|
||||
const { nightly } = data.site.siteMetadata
|
||||
const pageContext = { title: '404 Error', searchExclude: true, isIndex: false }
|
||||
return (
|
||||
<Template data={data} pageContext={pageContext} location={location}>
|
||||
<LandingHeader style={{ minHeight: 400 }} nightly={nightly}>
|
||||
<LandingTitle>
|
||||
Ooops, this page
|
||||
<br />
|
||||
does not exist!
|
||||
</LandingTitle>
|
||||
<br />
|
||||
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
||||
Click here to go back
|
||||
</Button>
|
||||
</LandingHeader>
|
||||
</Template>
|
||||
)
|
||||
}
|
||||
|
||||
export const pageQuery = graphql`
|
||||
query {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
title
|
||||
description
|
||||
navigation {
|
||||
text
|
||||
url
|
||||
}
|
||||
docSearch {
|
||||
apiKey
|
||||
indexName
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
`
|
|
@ -1,7 +0,0 @@
|
|||
---
|
||||
title: 404 Error
|
||||
---
|
||||
|
||||
import Error from 'widgets/404.js'
|
||||
|
||||
<Error />
|
|
@ -3,11 +3,14 @@
|
|||
bottom: 0
|
||||
left: 0
|
||||
width: 100%
|
||||
background: var(--color-subtle-light)
|
||||
background: var(--color-back)
|
||||
z-index: 100
|
||||
font: var(--font-size-sm)/var(--line-height-md) var(--font-primary)
|
||||
text-align: center
|
||||
padding: 1rem
|
||||
box-shadow: var(--box-shadow)
|
||||
border-top: 2px solid
|
||||
color: var(--color-theme)
|
||||
|
||||
.warning
|
||||
--alert-bg: var(--color-yellow-light)
|
||||
|
|
|
@ -47,6 +47,11 @@
|
|||
--color-theme-purple-light: hsla(255, 61%, 54%, 0.06)
|
||||
--color-theme-purple-opaque: hsla(255, 61%, 54%, 0.11)
|
||||
|
||||
--color-theme-nightly: hsl(257, 99%, 67%)
|
||||
--color-theme-nightly-dark: hsl(257, 99%, 57%)
|
||||
--color-theme-nightly-light: hsla(257, 99%, 67%, 0.06)
|
||||
--color-theme-nightly-opaque: hsla(257, 99%, 67%, 0.11)
|
||||
|
||||
// Regular colors
|
||||
--color-back: hsl(0, 0%, 100%)
|
||||
--color-front: hsl(213, 15%, 12%)
|
||||
|
@ -106,6 +111,12 @@
|
|||
--color-theme-light: var(--color-theme-purple-light)
|
||||
--color-theme-opaque: var(--color-theme-purple-opaque)
|
||||
|
||||
.theme-nightly
|
||||
--color-theme: var(--color-theme-nightly)
|
||||
--color-theme-dark: var(--color-theme-nightly-dark)
|
||||
--color-theme-light: var(--color-theme-nightly-light)
|
||||
--color-theme-opaque: var(--color-theme-nightly-opaque)
|
||||
|
||||
|
||||
/* Fonts */
|
||||
|
||||
|
|
|
@ -22,6 +22,9 @@ $crumb-bar: 2px
|
|||
& > *
|
||||
padding: 0 2rem 0.35rem
|
||||
|
||||
&:last-child
|
||||
margin-bottom: 5rem
|
||||
|
||||
.label
|
||||
color: var(--color-dark)
|
||||
font: bold var(--font-size-lg)/var(--line-height-md) var(--font-secondary)
|
||||
|
|
|
@ -31,7 +31,7 @@ const Docs = ({ pageContext, children }) => (
|
|||
theme,
|
||||
version,
|
||||
} = pageContext
|
||||
const { sidebars = [], modelsRepo, languages } = site.siteMetadata
|
||||
const { sidebars = [], modelsRepo, languages, nightly } = site.siteMetadata
|
||||
const isModels = section === 'models'
|
||||
const sidebar = pageContext.sidebar
|
||||
? { items: pageContext.sidebar }
|
||||
|
@ -83,7 +83,7 @@ const Docs = ({ pageContext, children }) => (
|
|||
{sidebar && <Sidebar items={sidebar.items} pageMenu={pageMenu} slug={slug} />}
|
||||
<Main
|
||||
section={section}
|
||||
theme={theme}
|
||||
theme={nightly ? 'nightly' : theme}
|
||||
sidebar
|
||||
asides
|
||||
wrapContent
|
||||
|
@ -146,6 +146,7 @@ const query = graphql`
|
|||
models
|
||||
starters
|
||||
}
|
||||
nightly
|
||||
sidebars {
|
||||
section
|
||||
items {
|
||||
|
|
|
@ -75,10 +75,23 @@ const scopeComponents = {
|
|||
InlineCode,
|
||||
}
|
||||
|
||||
const AlertSpace = () => {
|
||||
const AlertSpace = ({ nightly }) => {
|
||||
const isOnline = useOnlineStatus()
|
||||
return (
|
||||
<>
|
||||
{nightly && (
|
||||
<Alert
|
||||
title="You're viewing the pre-release docs."
|
||||
icon="moon"
|
||||
closeOnClick={false}
|
||||
>
|
||||
The page reflects{' '}
|
||||
<Link to="https://pypi.org/project/spacy-nightly/">
|
||||
<InlineCode>spacy-nightly</InlineCode>
|
||||
</Link>
|
||||
, not the latest <Link to="https://spacy.io">stable version</Link>.
|
||||
</Alert>
|
||||
)}
|
||||
{!isOnline && (
|
||||
<Alert title="Looks like you're offline." icon="offline" variant="warning">
|
||||
But don't worry, your visited pages should be saved for you.
|
||||
|
@ -130,9 +143,10 @@ class Layout extends React.Component {
|
|||
const { data, pageContext, location, children } = this.props
|
||||
const { file, site = {} } = data || {}
|
||||
const mdx = file ? file.childMdx : null
|
||||
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
||||
const bodyClass = classNames(`theme-${theme}`, { 'search-exclude': !!searchExclude })
|
||||
const meta = site.siteMetadata || {}
|
||||
const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
|
||||
const uiTheme = meta.nightly ? 'nightly' : theme
|
||||
const bodyClass = classNames(`theme-${uiTheme}`, { 'search-exclude': !!searchExclude })
|
||||
const isDocs = ['usage', 'models', 'api', 'styleguide'].includes(section)
|
||||
const content = !mdx ? null : (
|
||||
<MDXProvider components={mdxComponents}>
|
||||
|
@ -148,8 +162,9 @@ class Layout extends React.Component {
|
|||
section={section}
|
||||
sectionTitle={sectionTitle}
|
||||
bodyClass={bodyClass}
|
||||
nightly={meta.nightly}
|
||||
/>
|
||||
<AlertSpace />
|
||||
<AlertSpace nightly={meta.nightly} />
|
||||
<Navigation
|
||||
title={meta.title}
|
||||
items={meta.navigation}
|
||||
|
@ -167,11 +182,11 @@ class Layout extends React.Component {
|
|||
mdxComponents={mdxComponents}
|
||||
/>
|
||||
) : (
|
||||
<>
|
||||
<div>
|
||||
{children}
|
||||
{content}
|
||||
<Footer wide />
|
||||
</>
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
)
|
||||
|
@ -184,6 +199,7 @@ export const pageQuery = graphql`
|
|||
query($slug: String!) {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
title
|
||||
description
|
||||
navigation {
|
||||
|
|
|
@ -30,8 +30,8 @@ function filterResources(resources, data) {
|
|||
return sorted.filter(res => (res.category || []).includes(data.id))
|
||||
}
|
||||
|
||||
const UniverseContent = ({ content = [], categories, pageContext, location, mdxComponents }) => {
|
||||
const { theme, data = {} } = pageContext
|
||||
const UniverseContent = ({ content = [], categories, theme, pageContext, mdxComponents }) => {
|
||||
const { data = {} } = pageContext
|
||||
const filteredResources = filterResources(content, data)
|
||||
const activeData = data ? content.find(({ id }) => id === data.id) : null
|
||||
const markdownComponents = { ...mdxComponents, code: InlineCode }
|
||||
|
@ -302,15 +302,16 @@ const Universe = ({ pageContext, location, mdxComponents }) => (
|
|||
<StaticQuery
|
||||
query={query}
|
||||
render={data => {
|
||||
const content = data.site.siteMetadata.universe.resources
|
||||
const categories = data.site.siteMetadata.universe.categories
|
||||
const { universe, nightly } = data.site.siteMetadata
|
||||
const theme = nightly ? 'nightly' : pageContext.theme
|
||||
return (
|
||||
<UniverseContent
|
||||
content={content}
|
||||
categories={categories}
|
||||
content={universe.resources}
|
||||
categories={universe.categories}
|
||||
pageContext={pageContext}
|
||||
location={location}
|
||||
mdxComponents={mdxComponents}
|
||||
theme={theme}
|
||||
/>
|
||||
)
|
||||
}}
|
||||
|
@ -323,6 +324,7 @@ const query = graphql`
|
|||
query UniverseQuery {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
universe {
|
||||
resources {
|
||||
type
|
||||
|
|
|
@ -1,19 +0,0 @@
|
|||
import React from 'react'
|
||||
import { window } from 'browser-monads'
|
||||
|
||||
import { LandingHeader, LandingTitle } from '../components/landing'
|
||||
import Button from '../components/button'
|
||||
|
||||
export default () => (
|
||||
<LandingHeader style={{ minHeight: 400 }}>
|
||||
<LandingTitle>
|
||||
Ooops, this page
|
||||
<br />
|
||||
does not exist!
|
||||
</LandingTitle>
|
||||
<br />
|
||||
<Button onClick={() => window.history.go(-1)} variant="tertiary">
|
||||
Click here to go back
|
||||
</Button>
|
||||
</LandingHeader>
|
||||
)
|
|
@ -68,7 +68,7 @@ const Landing = ({ data }) => {
|
|||
const counts = getCounts(data.languages)
|
||||
return (
|
||||
<>
|
||||
<LandingHeader>
|
||||
<LandingHeader nightly={data.nightly}>
|
||||
<LandingTitle>
|
||||
Industrial-Strength
|
||||
<br />
|
||||
|
@ -268,6 +268,7 @@ const landingQuery = graphql`
|
|||
query LandingQuery {
|
||||
site {
|
||||
siteMetadata {
|
||||
nightly
|
||||
repo
|
||||
languages {
|
||||
models
|
||||
|
|
Loading…
Reference in New Issue
Block a user