Merge branch 'develop' of https://github.com/explosion/spaCy into develop

2025-12-08 10:44:30 +03:00 · 2020-07-01 15:17:27 +02:00 · 2020-07-01 15:17:27 +02:00 · cb51bb637b
commit cb51bb637b
parent 7734cbc34d 6e28760316
35 changed files with 12213 additions and 1909 deletions
--- a/.github/contributors/hertelm.md
+++ b/.github/contributors/hertelm.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Matthias Hertel      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | June 29, 2020        |
+| GitHub username                | hertelm              |
+| Website (optional)             |                      |
--- a/netlify.toml
+++ b/netlify.toml
@ -1,6 +1,8 @@
 redirects = [
    # Netlify
    {from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
+    # Subdomain for branches
+    {from = "https://nightly.spacy.io/*", to="https://spacy-io-develop.spacy.io/:splat", force = true, status = 200},
    # Old subdomains
    {from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
    {from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
--- a/spacy/cli/project.py
+++ b/spacy/cli/project.py
@ -242,12 +242,16 @@ def project_clone(
        try:
            run_command(cmd)
        except SystemExit:
-            err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'"
+            err = f"Could not clone the repo '{repo}' into the temp dir '{tmp_dir}'."
            msg.fail(err)
        with (tmp_dir / ".git" / "info" / "sparse-checkout").open("w") as f:
            f.write(name)
-        run_command(["git", "-C", str(tmp_dir), "fetch"])
-        run_command(["git", "-C", str(tmp_dir), "checkout"])
+        try:
+            run_command(["git", "-C", str(tmp_dir), "fetch"])
+            run_command(["git", "-C", str(tmp_dir), "checkout"])
+        except SystemExit:
+            err = f"Could not clone '{name}' in the repo '{repo}'."
+            msg.fail(err)
        shutil.move(str(tmp_dir / Path(name).name), str(project_dir))
    msg.good(f"Cloned project '{name}' from {repo} into {project_dir}")
    for sub_dir in DIRS:
@ -525,9 +529,9 @@ def update_dvc_config(
        outputs_no_cache = command.get("outputs_no_cache", [])
        if not deps and not outputs and not outputs_no_cache:
            continue
-        # Default to "." as the project path since dvc.yaml is auto-generated
+        # Default to the working dir as the project path since dvc.yaml is auto-generated
        # and we don't want arbitrary paths in there
-        project_cmd = ["python", "-m", NAME, "project", ".", "exec", name]
+        project_cmd = ["python", "-m", NAME, "project", "exec", name]
        deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
        outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
        outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
--- a/website/docs/api/phrasematcher.md
+++ b/website/docs/api/phrasematcher.md
@ -91,7 +91,7 @@ Match a stream of documents, yielding them in turn.
 > ```python
 >   from spacy.matcher import PhraseMatcher
 >   matcher = PhraseMatcher(nlp.vocab)
->   for doc in matcher.pipe(texts, batch_size=50):
+>   for doc in matcher.pipe(docs, batch_size=50):
 >       pass
 > ```

--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -46,19 +46,19 @@ Update the evaluation scores from a single [`Doc`](/api/doc) /

 ## Properties

-| Name                                                | Type  | Description                                                                                                |
-| --------------------------------------------------- | ----- | ---------------------------------------------------------------------------------------------------------- |
-| `token_acc`                                         | float | Tokenization accuracy.                                                                                     |
-| `tags_acc`                                          | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`).                                         |
-| `uas`                                               | float | Unlabelled dependency score.                                                                               |
-| `las`                                               | float | Labelled dependency score.                                                                                 |
-| `ents_p`                                            | float | Named entity accuracy (precision).                                                                         |
-| `ents_r`                                            | float | Named entity accuracy (recall).                                                                            |
-| `ents_f`                                            | float | Named entity accuracy (F-score).                                                                           |
-| `ents_per_type` <Tag variant="new">2.1.5</Tag>      | dict  | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores.                      |
+| Name                                                | Type  | Description                                                                            |
+| --------------------------------------------------- | ----- | -------------------------------------------------------------------------------------- |
+| `token_acc`                                         | float | Tokenization accuracy.                                                                 |
+| `tags_acc`                                          | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`).                     |
+| `uas`                                               | float | Unlabelled dependency score.                                                           |
+| `las`                                               | float | Labelled dependency score.                                                             |
+| `ents_p`                                            | float | Named entity accuracy (precision).                                                     |
+| `ents_r`                                            | float | Named entity accuracy (recall).                                                        |
+| `ents_f`                                            | float | Named entity accuracy (F-score).                                                       |
+| `ents_per_type` <Tag variant="new">2.1.5</Tag>      | dict  | Scores per entity label. Keyed by label, mapped to a dict of `p`, `r` and `f` scores.  |
 | `textcat_f` <Tag variant="new">3.0</Tag>            | float | F-score on positive label for binary classification, macro-averaged F-score otherwise. |
-| `textcat_auc` <Tag variant="new"3.0</Tag>           | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined).                            |
-| `textcats_f_per_cat` <Tag variant="new">3.0</Tag>   | dict  | F-scores per textcat label, keyed by label.                                                                |
-| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict  | ROC AUC scores per textcat label, keyed by label.                                                          |
-| `las_per_type` <Tag variant="new">2.2.3</Tag>       | dict  | Labelled dependency scores, keyed by label.                                                                |
-| `scores`                                            | dict  | All scores, keyed by type.                                                                                 |
+| `textcat_auc` <Tag variant="new">3.0</Tag>          | float | Macro-averaged AUC ROC score for multilabel classification (`-1` if undefined).        |
+| `textcats_f_per_cat` <Tag variant="new">3.0</Tag>   | dict  | F-scores per textcat label, keyed by label.                                            |
+| `textcats_auc_per_cat` <Tag variant="new">3.0</Tag> | dict  | ROC AUC scores per textcat label, keyed by label.                                      |
+| `las_per_type` <Tag variant="new">2.2.3</Tag>       | dict  | Labelled dependency scores, keyed by label.                                            |
+| `scores`                                            | dict  | All scores, keyed by type.                                                             |
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -122,7 +122,7 @@ for match_id, start, end in matches:
 ```

 The matcher returns a list of `(match_id, start, end)` tuples – in this case,
-`[('15578876784678163569', 0, 2)]`, which maps to the span `doc[0:2]` of our
+`[('15578876784678163569', 0, 3)]`, which maps to the span `doc[0:3]` of our
 original document. The `match_id` is the [hash value](/usage/spacy-101#vocab) of
 the string ID "HelloWorld". To get the string value, you can look up the ID in
 the [`StringStore`](/api/stringstore).
--- a/website/docs/usage/v2-3.md
+++ b/website/docs/usage/v2-3.md
@ -161,10 +161,18 @@ debugging your tokenizer configuration.

 spaCy's custom warnings have been replaced with native Python
 [`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
-setting `SPACY_WARNING_IGNORE`, use the
-[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
+setting `SPACY_WARNING_IGNORE`, use the [`warnings`
+filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
 to manage warnings.

+```diff
+import spacy
+ import warnings
+
+- spacy.errors.SPACY_WARNING_IGNORE.append('W007')
+ warnings.filterwarnings("ignore", message=r"\\[W007\\]", category=UserWarning)
+```
+
 #### Normalization tables

 The normalization tables have moved from the language data in
@ -174,6 +182,65 @@ If you're adding data for a new language, the normalization table should be
 added to `spacy-lookups-data`. See
 [adding norm exceptions](/usage/adding-languages#norm-exceptions).

+#### No preloaded vocab for models with vectors
+
+To reduce the initial loading time, the lexemes in `nlp.vocab` are no longer
+loaded on initialization for models with vectors. As you process texts, the
+lexemes will be added to the vocab automatically, just as in small models
+without vectors.
+
+To see the number of unique vectors and number of words with vectors, see
+`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
+unique vectors and `684830` words with vectors:
+
+```python
+{
+    'width': 300,
+    'vectors': 20000,
+    'keys': 684830,
+    'name': 'en_core_web_md.vectors'
+}
+```
+
+If required, for instance if you are working directly with word vectors rather
+than processing texts, you can load all lexemes for words with vectors at once:
+
+```python
+for orth in nlp.vocab.vectors:
+    _ = nlp.vocab[orth]
+```
+
+If your workflow previously iterated over `nlp.vocab`, a similar alternative
+is to iterate over words with vectors instead:
+
+```diff
+- lexemes = [w for w in nlp.vocab]
+ lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
+```
+
+Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
+the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
+provided lexemes but only 685K words with vectors. The vectors have been
+updated for most languages in v2.2, but the English models contain the same
+vectors for both v2.2 and v2.3.
+
+#### Lexeme.is_oov and Token.is_oov
+
+<Infobox title="Important note" variant="warning">
+
+Due to a bug, the values for `is_oov` are reversed in v2.3.0, but this will be
+fixed in the next patch release v2.3.1.
+
+</Infobox>
+
+In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
+have a word vector. This is equivalent to `token.orth not in
+nlp.vocab.vectors`.
+
+Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
+probability and cluster features. The probability and cluster features are no
+longer included in the provided medium and large models (see the next section).
+
 #### Probability and cluster features

 > #### Load and save extra prob lookups table
@ -201,6 +268,28 @@ model vocab, which will take a few seconds on initial loading. When you save
 this model after loading the `prob` table, the full `prob` table will be saved
 as part of the model vocab.

+To load the probability table into a provided model, first make sure you have
+`spacy-lookups-data` installed. To load the table, remove the empty provided
+`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
+table from `spacy-lookups-data`:
+
+```diff
+ # prerequisite: pip install spacy-lookups-data
+import spacy
+
+nlp = spacy.load("en_core_web_md")
+
+# remove the empty placeholder prob table
+ if nlp.vocab.lookups_extra.has_table("lexeme_prob"):
+     nlp.vocab.lookups_extra.remove_table("lexeme_prob")
+
+# access any `.prob` to load the full table into the model
+assert nlp.vocab["a"].prob == -3.9297883511
+
+# if desired, save this model with the probability table included
+nlp.to_disk("/path/to/model")
+```
+
 If you'd like to include custom `cluster`, `prob`, or `sentiment` tables as part
 of a new model, add the data to
 [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) under
@ -218,3 +307,39 @@ When you initialize a new model with [`spacy init-model`](/api/cli#init-model),
 the `prob` table from `spacy-lookups-data` may be loaded as part of the
 initialization. If you'd like to omit this extra data as in spaCy's provided
 v2.3 models, use the new flag `--omit-extra-lookups`.
+
+#### Tag maps in provided models vs. blank models
+
+The tag maps in the provided models may differ from the tag maps in the spaCy
+library. You can access the tag map in a loaded model under
+`nlp.vocab.morphology.tag_map`.
+
+The tag map from `spacy.lang.lg.tag_map` is still used when a blank model is
+initialized. If you want to provide an alternate tag map, update
+`nlp.vocab.morphology.tag_map` after initializing the model or if you're using
+the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
+provide in the tag map as a JSON dict.
+
+If you want to export a tag map from a provided model for use with the train
+CLI, you can save it as a JSON dict. To only use string keys as required by
+JSON and to make it easier to read and edit, any internal integer IDs need to
+be converted back to strings:
+
+```python
+import spacy
+import srsly
+
+nlp = spacy.load("en_core_web_sm")
+tag_map = {}
+
+# convert any integer IDs to strings for JSON
+for tag, morph in nlp.vocab.morphology.tag_map.items():
+    tag_map[tag] = {}
+    for feat, val in morph.items():
+        feat = nlp.vocab.strings.as_string(feat)
+        if not isinstance(val, bool):
+            val = nlp.vocab.strings.as_string(val)
+        tag_map[tag][feat] = val
+
+srsly.write_json("tag_map.json", tag_map)
+```
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -0,0 +1,17 @@
+---
+title: What's New in v3.0
+teaser: New features, backwards incompatibilities and migration guide
+menu:
+  - ['Summary', 'summary']
+  - ['New Features', 'features']
+  - ['Backwards Incompatibilities', 'incompat']
+  - ['Migrating from v2.x', 'migrating']
+---
+
+## Summary {#summary}
+
+## New Features {#features}
+
+## Backwards Incompatibilities {#incompat}
+
+## Migrating from v2.x {#migrating}
--- a/website/gatsby-config.js
+++ b/website/gatsby-config.js
@ -15,6 +15,11 @@ const universe = require('./meta/universe.json')

 const DEFAULT_TEMPLATE = path.resolve('./src/templates/index.js')

+const isNightly = !!+process.env.SPACY_NIGHTLY || site.nightlyBranches.includes(process.env.BRANCH)
+const favicon = isNightly ? `src/images/icon_nightly.png` : `src/images/icon.png`
+const binderBranch = isNightly ? 'nightly' : site.binderBranch
+const siteUrl = isNightly ? site.siteUrlNightly : site.siteUrl
+
 module.exports = {
    siteMetadata: {
        ...site,
@ -22,6 +27,9 @@ module.exports = {
        sidebars,
        ...models,
        universe,
+        nightly: isNightly,
+        binderBranch,
+        siteUrl,
    },

    plugins: [
@ -128,7 +136,7 @@ module.exports = {
                background_color: site.theme,
                theme_color: site.theme,
                display: `minimal-ui`,
-                icon: `src/images/icon.png`,
+                icon: favicon,
            },
        },
        {
@ -140,6 +148,23 @@ module.exports = {
                respectDNT: true,
            },
        },
+        {
+            resolve: 'gatsby-plugin-robots-txt',
+            options: {
+                host: siteUrl,
+                sitemap: `${siteUrl}/sitemap.xml`,
+                // If we're in a special state (nightly, legacy) prevent indexing
+                resolveEnv: () => (isNightly ? 'development' : 'production'),
+                env: {
+                    production: {
+                        policy: [{ userAgent: '*', allow: '/' }],
+                    },
+                    development: {
+                        policy: [{ userAgent: '*', disallow: ['/'] }],
+                    },
+                },
+            },
+        },
        `gatsby-plugin-offline`,
    ],
 }
--- a/website/meta/languages.json
+++ b/website/meta/languages.json
@ -78,11 +78,14 @@
            "name": "Japanese",
            "models": ["ja_core_news_sm", "ja_core_news_md", "ja_core_news_lg"],
            "dependencies": [
+                { "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
+                { "name": "Mecab", "url": "https://github.com/taku910/mecab" },
                {
                    "name": "SudachiPy",
                    "url": "https://github.com/WorksApplications/SudachiPy"
                }
            ],
+            "example": "これは文章です。",
            "has_examples": true
        },
        {
@ -191,17 +194,6 @@
            "example": "นี่คือประโยค",
            "has_examples": true
        },
-        {
-            "code": "ja",
-            "name": "Japanese",
-            "dependencies": [
-                { "name": "Unidic", "url": "http://unidic.ninjal.ac.jp/back_number#unidic_cwj" },
-                { "name": "Mecab", "url": "https://github.com/taku910/mecab" },
-                { "name": "fugashi", "url": "https://github.com/polm/fugashi" }
-            ],
-            "example": "これは文章です。",
-            "has_examples": true
-        },
        {
            "code": "ko",
            "name": "Korean",
--- a/website/meta/sidebars.json
+++ b/website/meta/sidebars.json
@ -8,11 +8,7 @@
                    { "text": "Installation", "url": "/usage" },
                    { "text": "Models & Languages", "url": "/usage/models" },
                    { "text": "Facts & Figures", "url": "/usage/facts-figures" },
-                    { "text": "spaCy 101", "url": "/usage/spacy-101" },
-                    { "text": "New in v2.3", "url": "/usage/v2-3" },
-                    { "text": "New in v2.2", "url": "/usage/v2-2" },
-                    { "text": "New in v2.1", "url": "/usage/v2-1" },
-                    { "text": "New in v2.0", "url": "/usage/v2" }
+                    { "text": "New in v3.0", "url": "/usage/v3" }
                ]
            },
            {
--- a/website/meta/site.json
+++ b/website/meta/site.json
@ -3,6 +3,8 @@
    "description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
    "slogan": "Industrial-strength Natural Language Processing in Python",
    "siteUrl": "https://spacy.io",
+    "siteUrlNightly": "https://nightly.spacy.io",
+    "nightlyBranches": ["spacy.io-develop"],
    "email": "contact@explosion.ai",
    "company": "Explosion AI",
    "companyUrl": "https://explosion.ai",
--- a/website/package-lock.json
+++ b/website/package-lock.json
--- a/website/package.json
+++ b/website/package.json
@ -16,7 +16,7 @@
        "autoprefixer": "^9.4.7",
        "classnames": "^2.2.6",
        "codemirror": "^5.43.0",
-        "gatsby": "^2.1.18",
+        "gatsby": "^2.11.1",
        "gatsby-image": "^2.0.29",
        "gatsby-mdx": "^0.3.6",
        "gatsby-plugin-catch-links": "^2.0.11",
@ -25,6 +25,7 @@
        "gatsby-plugin-offline": "^2.0.24",
        "gatsby-plugin-react-helmet": "^3.0.6",
        "gatsby-plugin-react-svg": "^2.0.0",
+        "gatsby-plugin-robots-txt": "^1.5.1",
        "gatsby-plugin-sass": "^2.0.10",
        "gatsby-plugin-sharp": "^2.0.20",
        "gatsby-plugin-sitemap": "^2.0.5",
@ -52,6 +53,7 @@
    "scripts": {
        "build": "gatsby build",
        "dev": "gatsby develop",
+        "dev:nightly": "BRANCH=spacy.io-develop npm run dev",
        "lint": "eslint **",
        "clear": "rm -rf .cache",
        "test": "echo \"Write tests! -> https://gatsby.app/unit-testing\""
--- a/website/src/components/button.js
+++ b/website/src/components/button.js
@ -27,7 +27,7 @@ Button.defaultProps = {
 }

 Button.propTypes = {
-    to: PropTypes.string.isRequired,
+    to: PropTypes.string,
    variant: PropTypes.oneOf(['primary', 'secondary', 'tertiary']),
    large: PropTypes.bool,
    icon: PropTypes.string,
--- a/website/src/components/icon.js
+++ b/website/src/components/icon.js
@ -19,6 +19,7 @@ import { ReactComponent as NoIcon } from '../images/icons/no.svg'
 import { ReactComponent as NeutralIcon } from '../images/icons/neutral.svg'
 import { ReactComponent as OfflineIcon } from '../images/icons/offline.svg'
 import { ReactComponent as SearchIcon } from '../images/icons/search.svg'
+import { ReactComponent as MoonIcon } from '../images/icons/moon.svg'

 import classes from '../styles/icon.module.sass'

@ -41,6 +42,7 @@ const icons = {
    neutral: NeutralIcon,
    offline: OfflineIcon,
    search: SearchIcon,
+    moon: MoonIcon,
 }

 const Icon = ({ name, width, height, inline, variant, className }) => {
--- a/website/src/components/landing.js
+++ b/website/src/components/landing.js
@ -2,7 +2,9 @@ import React, { Fragment } from 'react'
 import classNames from 'classnames'

 import pattern from '../images/pattern_blue.jpg'
+import patternNightly from '../images/pattern_nightly.jpg'
 import patternOverlay from '../images/pattern_landing.jpg'
+import patternOverlayNightly from '../images/pattern_landing_nightly.jpg'
 import logoSvgs from '../images/logos'

 import Grid from './grid'
@ -14,9 +16,10 @@ import Link from './link'
 import { chunkArray } from './util'
 import classes from '../styles/landing.module.sass'

-export const LandingHeader = ({ style = {}, children }) => {
-    const wrapperStyle = { backgroundImage: `url(${pattern})` }
-    const contentStyle = { backgroundImage: `url(${patternOverlay})`, ...style }
+export const LandingHeader = ({ nightly, style = {}, children }) => {
+    const overlay = nightly ? patternOverlayNightly : patternOverlay
+    const wrapperStyle = { backgroundImage: `url(${nightly ? patternNightly : pattern})` }
+    const contentStyle = { backgroundImage: `url(${overlay})`, ...style }
    return (
        <header className={classes.header}>
            <div className={classes.headerWrapper} style={wrapperStyle}>
--- a/website/src/components/main.js
+++ b/website/src/components/main.js
@ -5,15 +5,22 @@ import classNames from 'classnames'
 import patternBlue from '../images/pattern_blue.jpg'
 import patternGreen from '../images/pattern_green.jpg'
 import patternPurple from '../images/pattern_purple.jpg'
+import patternNightly from '../images/pattern_nightly.jpg'
 import classes from '../styles/main.module.sass'

-const patterns = { blue: patternBlue, green: patternGreen, purple: patternPurple }
+const patterns = {
+    blue: patternBlue,
+    green: patternGreen,
+    purple: patternPurple,
+    nightly: patternNightly,
+}

 export const Content = ({ Component = 'div', className, children }) => (
    <Component className={classNames(classes.content, className)}>{children}</Component>
 )

 const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
+    const pattern = patterns[theme]
    const mainClassNames = classNames(classes.root, {
        [classes.withSidebar]: sidebar,
        [classes.withAsides]: asides,
@ -23,10 +30,7 @@ const Main = ({ sidebar, asides, wrapContent, theme, footer, children }) => {
        <main className={mainClassNames}>
            {wrapContent ? <Content Component="article">{children}</Content> : children}
            {asides && (
-                <div
-                    className={classes.asides}
-                    style={{ backgroundImage: `url(${patterns[theme]}` }}
-                />
+                <div className={classes.asides} style={{ backgroundImage: `url(${pattern}` }} />
            )}
            {footer}
        </main>
--- a/website/src/components/seo.js
+++ b/website/src/components/seo.js
@ -6,6 +6,7 @@ import { StaticQuery, graphql } from 'gatsby'
 import socialImageDefault from '../images/social_default.jpg'
 import socialImageApi from '../images/social_api.jpg'
 import socialImageUniverse from '../images/social_universe.jpg'
+import socialImageNightly from '../images/social_nightly.jpg'

 function getPageTitle(title, sitename, slogan, sectionTitle) {
    if (sectionTitle && title) {
@ -17,13 +18,14 @@ function getPageTitle(title, sitename, slogan, sectionTitle) {
    return `${sitename} · ${slogan}`
 }

-function getImage(section) {
+function getImage(section, nightly) {
+    if (nightly) return socialImageNightly
    if (section === 'api') return socialImageApi
    if (section === 'universe') return socialImageUniverse
    return socialImageDefault
 }

-const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) => (
+const SEO = ({ description, lang, title, section, sectionTitle, bodyClass, nightly }) => (
    <StaticQuery
        query={query}
        render={data => {
@ -35,7 +37,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
                siteMetadata.slogan,
                sectionTitle
            )
-            const socialImage = siteMetadata.siteUrl + getImage(section)
+            const socialImage = siteMetadata.siteUrl + getImage(section, nightly)
            const meta = [
                {
                    name: 'description',
--- a/website/src/components/tag.js
+++ b/website/src/components/tag.js
@ -11,6 +11,9 @@ const Tag = ({ spaced, variant, tooltip, children }) => {
        const isValid = isString(children) && !isNaN(children)
        const version = isValid ? Number(children).toFixed(1) : children
        const tooltipText = `This feature is new and was introduced in spaCy v${version}`
+        // TODO: we probably want to handle this more elegantly, but the idea is
+        // that we can hide tags referring to old versions
+        // const hideTag = version.startsWith('2')
        return (
            <TagTemplate spaced={spaced} tooltip={tooltipText}>
                v{version}
--- a/website/src/images/icon_nightly.png
+++ b/website/src/images/icon_nightly.png
--- a/website/src/images/icons/moon.svg
+++ b/website/src/images/icons/moon.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32">
+<path d="M10.895 7.574c0 7.55 5.179 13.67 11.567 13.67 1.588 0 3.101-0.38 4.479-1.063-1.695 4.46-5.996 7.636-11.051 7.636-6.533 0-11.83-5.297-11.83-11.83 0-4.82 2.888-8.959 7.023-10.803-0.116 0.778-0.188 1.573-0.188 2.39z"></path>
+</svg>
--- a/website/src/images/pattern_landing_nightly.jpg
+++ b/website/src/images/pattern_landing_nightly.jpg
--- a/website/src/images/pattern_nightly.jpg
+++ b/website/src/images/pattern_nightly.jpg
--- a/website/src/images/social_nightly.jpg
+++ b/website/src/images/social_nightly.jpg
--- a/website/src/pages/404.js
+++ b/website/src/pages/404.js
@ -0,0 +1,47 @@
+import React from 'react'
+import { window } from 'browser-monads'
+import { graphql } from 'gatsby'
+
+import Template from '../templates/index'
+import { LandingHeader, LandingTitle } from '../components/landing'
+import Button from '../components/button'
+
+export default ({ data, location }) => {
+    const { nightly } = data.site.siteMetadata
+    const pageContext = { title: '404 Error', searchExclude: true, isIndex: false }
+    return (
+        <Template data={data} pageContext={pageContext} location={location}>
+            <LandingHeader style={{ minHeight: 400 }} nightly={nightly}>
+                <LandingTitle>
+                    Ooops, this page
+                    <br />
+                    does not exist!
+                </LandingTitle>
+                <br />
+                <Button onClick={() => window.history.go(-1)} variant="tertiary">
+                    Click here to go back
+                </Button>
+            </LandingHeader>
+        </Template>
+    )
+}
+
+export const pageQuery = graphql`
+    query {
+        site {
+            siteMetadata {
+                nightly
+                title
+                description
+                navigation {
+                    text
+                    url
+                }
+                docSearch {
+                    apiKey
+                    indexName
+                }
+            }
+        }
+    }
+`
--- a/website/src/pages/404.md
+++ b/website/src/pages/404.md
@ -1,7 +0,0 @@
---
-title: 404 Error
---
-
-import Error from 'widgets/404.js'
-
-<Error />
--- a/website/src/styles/alert.module.sass
+++ b/website/src/styles/alert.module.sass
@ -3,11 +3,14 @@
    bottom: 0
    left: 0
    width: 100%
-    background: var(--color-subtle-light)
+    background: var(--color-back)
    z-index: 100
    font: var(--font-size-sm)/var(--line-height-md) var(--font-primary)
    text-align: center
    padding: 1rem
+    box-shadow: var(--box-shadow)
+    border-top: 2px solid
+    color: var(--color-theme)

 .warning
    --alert-bg: var(--color-yellow-light)
--- a/website/src/styles/layout.sass
+++ b/website/src/styles/layout.sass
@ -47,6 +47,11 @@
    --color-theme-purple-light: hsla(255, 61%, 54%, 0.06)
    --color-theme-purple-opaque: hsla(255, 61%, 54%, 0.11)

+    --color-theme-nightly: hsl(257, 99%, 67%)
+    --color-theme-nightly-dark: hsl(257, 99%, 57%)
+    --color-theme-nightly-light: hsla(257, 99%, 67%, 0.06)
+    --color-theme-nightly-opaque: hsla(257, 99%, 67%, 0.11)
+
    // Regular colors
    --color-back: hsl(0, 0%, 100%)
    --color-front: hsl(213, 15%, 12%)
@ -106,6 +111,12 @@
    --color-theme-light: var(--color-theme-purple-light)
    --color-theme-opaque: var(--color-theme-purple-opaque)

+.theme-nightly
+    --color-theme: var(--color-theme-nightly)
+    --color-theme-dark: var(--color-theme-nightly-dark)
+    --color-theme-light: var(--color-theme-nightly-light)
+    --color-theme-opaque: var(--color-theme-nightly-opaque)
+

 /* Fonts */

--- a/website/src/styles/sidebar.module.sass
+++ b/website/src/styles/sidebar.module.sass
@ -22,6 +22,9 @@ $crumb-bar: 2px
    & > *
        padding: 0 2rem 0.35rem

+    &:last-child
+        margin-bottom: 5rem
+
 .label
    color: var(--color-dark)
    font: bold var(--font-size-lg)/var(--line-height-md) var(--font-secondary)
--- a/website/src/templates/docs.js
+++ b/website/src/templates/docs.js
@ -31,7 +31,7 @@ const Docs = ({ pageContext, children }) => (
                theme,
                version,
            } = pageContext
-            const { sidebars = [], modelsRepo, languages } = site.siteMetadata
+            const { sidebars = [], modelsRepo, languages, nightly } = site.siteMetadata
            const isModels = section === 'models'
            const sidebar = pageContext.sidebar
                ? { items: pageContext.sidebar }
@ -83,7 +83,7 @@ const Docs = ({ pageContext, children }) => (
                    {sidebar && <Sidebar items={sidebar.items} pageMenu={pageMenu} slug={slug} />}
                    <Main
                        section={section}
-                        theme={theme}
+                        theme={nightly ? 'nightly' : theme}
                        sidebar
                        asides
                        wrapContent
@ -146,6 +146,7 @@ const query = graphql`
                    models
                    starters
                }
+                nightly
                sidebars {
                    section
                    items {
--- a/website/src/templates/index.js
+++ b/website/src/templates/index.js
@ -75,10 +75,23 @@ const scopeComponents = {
    InlineCode,
 }

-const AlertSpace = () => {
+const AlertSpace = ({ nightly }) => {
    const isOnline = useOnlineStatus()
    return (
        <>
+            {nightly && (
+                <Alert
+                    title="You're viewing the pre-release docs."
+                    icon="moon"
+                    closeOnClick={false}
+                >
+                    The page reflects{' '}
+                    <Link to="https://pypi.org/project/spacy-nightly/">
+                        <InlineCode>spacy-nightly</InlineCode>
+                    </Link>
+                    , not the latest <Link to="https://spacy.io">stable version</Link>.
+                </Alert>
+            )}
            {!isOnline && (
                <Alert title="Looks like you're offline." icon="offline" variant="warning">
                    But don't worry, your visited pages should be saved for you.
@ -130,9 +143,10 @@ class Layout extends React.Component {
        const { data, pageContext, location, children } = this.props
        const { file, site = {} } = data || {}
        const mdx = file ? file.childMdx : null
-        const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
-        const bodyClass = classNames(`theme-${theme}`, { 'search-exclude': !!searchExclude })
        const meta = site.siteMetadata || {}
+        const { title, section, sectionTitle, teaser, theme = 'blue', searchExclude } = pageContext
+        const uiTheme = meta.nightly ? 'nightly' : theme
+        const bodyClass = classNames(`theme-${uiTheme}`, { 'search-exclude': !!searchExclude })
        const isDocs = ['usage', 'models', 'api', 'styleguide'].includes(section)
        const content = !mdx ? null : (
            <MDXProvider components={mdxComponents}>
@ -148,8 +162,9 @@ class Layout extends React.Component {
                    section={section}
                    sectionTitle={sectionTitle}
                    bodyClass={bodyClass}
+                    nightly={meta.nightly}
                />
-                <AlertSpace />
+                <AlertSpace nightly={meta.nightly} />
                <Navigation
                    title={meta.title}
                    items={meta.navigation}
@ -167,11 +182,11 @@ class Layout extends React.Component {
                        mdxComponents={mdxComponents}
                    />
                ) : (
-                    <>
+                    <div>
                        {children}
                        {content}
                        <Footer wide />
-                    </>
+                    </div>
                )}
            </>
        )
@ -184,6 +199,7 @@ export const pageQuery = graphql`
    query($slug: String!) {
        site {
            siteMetadata {
+                nightly
                title
                description
                navigation {
--- a/website/src/templates/universe.js
+++ b/website/src/templates/universe.js
@ -30,8 +30,8 @@ function filterResources(resources, data) {
    return sorted.filter(res => (res.category || []).includes(data.id))
 }

-const UniverseContent = ({ content = [], categories, pageContext, location, mdxComponents }) => {
-    const { theme, data = {} } = pageContext
+const UniverseContent = ({ content = [], categories, theme, pageContext, mdxComponents }) => {
+    const { data = {} } = pageContext
    const filteredResources = filterResources(content, data)
    const activeData = data ? content.find(({ id }) => id === data.id) : null
    const markdownComponents = { ...mdxComponents, code: InlineCode }
@ -302,15 +302,16 @@ const Universe = ({ pageContext, location, mdxComponents }) => (
    <StaticQuery
        query={query}
        render={data => {
-            const content = data.site.siteMetadata.universe.resources
-            const categories = data.site.siteMetadata.universe.categories
+            const { universe, nightly } = data.site.siteMetadata
+            const theme = nightly ? 'nightly' : pageContext.theme
            return (
                <UniverseContent
-                    content={content}
-                    categories={categories}
+                    content={universe.resources}
+                    categories={universe.categories}
                    pageContext={pageContext}
                    location={location}
                    mdxComponents={mdxComponents}
+                    theme={theme}
                />
            )
        }}
@ -323,6 +324,7 @@ const query = graphql`
    query UniverseQuery {
        site {
            siteMetadata {
+                nightly
                universe {
                    resources {
                        type
--- a/website/src/widgets/404.js
+++ b/website/src/widgets/404.js
@ -1,19 +0,0 @@
-import React from 'react'
-import { window } from 'browser-monads'
-
-import { LandingHeader, LandingTitle } from '../components/landing'
-import Button from '../components/button'
-
-export default () => (
-    <LandingHeader style={{ minHeight: 400 }}>
-        <LandingTitle>
-            Ooops, this page
-            <br />
-            does not exist!
-        </LandingTitle>
-        <br />
-        <Button onClick={() => window.history.go(-1)} variant="tertiary">
-            Click here to go back
-        </Button>
-    </LandingHeader>
-)
--- a/website/src/widgets/landing.js
+++ b/website/src/widgets/landing.js
@ -68,7 +68,7 @@ const Landing = ({ data }) => {
    const counts = getCounts(data.languages)
    return (
        <>
-            <LandingHeader>
+            <LandingHeader nightly={data.nightly}>
                <LandingTitle>
                    Industrial-Strength
                    <br />
@ -268,6 +268,7 @@ const landingQuery = graphql`
    query LandingQuery {
        site {
            siteMetadata {
+                nightly
                repo
                languages {
                    models