Merge branch 'develop' into spacy.io

2025-08-24 14:04:56 +03:00 · 2019-03-12 22:57:34 +01:00 · 2019-03-12 22:57:34 +01:00 · ec29e6f4c8
commit ec29e6f4c8
parent b456af305b 4cfe4aa224
15 changed files with 56 additions and 34 deletions
--- a/website/docs/api/annotation.md
+++ b/website/docs/api/annotation.md
@ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the
 training corpus and can be defined in the respective language data's
 [`tag_map.py`](/usage/adding-languages#tag-map).

-<Accordion title="Universal Part-of-speech Tags">
+<Accordion title="Universal Part-of-speech Tags" id="pos-universal">

 spaCy also maps all language-specific part-of-speech tags to a small, fixed set
 of word type tags following the
@ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's
 [models](/models). The individual labels are language-specific and depend on the
 training corpus.

-<Accordion title="Universal Dependency Labels">
+<Accordion title="Universal Dependency Labels" id="dependency-parsing-universal">

 The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is
 used in all languages trained on Universal Dependency Corpora.
--- a/website/docs/usage/101/_pipelines.md
+++ b/website/docs/usage/101/_pipelines.md
@ -33,9 +33,22 @@ list containing the component names:

 import Accordion from 'components/accordion.js'

-<Accordion title="Does the order of pipeline components matter?">
+<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">

-No
+In spaCy v2.x, the statistical components like the tagger or parser are
+independent and don't share any data between themselves. For example, the named
+entity recognizer doesn't use any features set by the tagger and parser, and so
+on. This means that you can swap them, or remove single components from the
+pipeline without affecting the others.
+
+However, custom components may depend on annotations set by other components.
+For example, a custom lemmatizer may need the part-of-speech tags assigned, so
+it'll only work if it's added after the tagger. The parser will respect
+pre-defined sentence boundaries, so if a previous component in the pipeline sets
+them, its dependency predictions may be different. Similarly, it matters if you
+add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
+recognizer: if it's added before, the entity recognizer will take the existing
+entities into account when making predictions.

 </Accordion>

--- a/website/docs/usage/adding-languages.md
+++ b/website/docs/usage/adding-languages.md
@ -39,7 +39,7 @@ and morphological analysis.

 </div>

-<Infobox title="Table of Contents">
+<Infobox title="Table of Contents" id="toc">

 - [Language data 101](#101)
 - [The Language subclass](#language-subclass)
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -298,9 +298,9 @@ different languages, see the
 The best way to understand spaCy's dependency parser is interactively. To make
 this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc`
 or a list of `Doc` objects to displaCy and run
-[`displacy.serve`](top-level#displacy.serve) to run the web server, or
-[`displacy.render`](top-level#displacy.render) to generate the raw markup. If
-you want to know how to write rules that hook into some type of syntactic
+[`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or
+[`displacy.render`](/api/top-level#displacy.render) to generate the raw markup.
+If you want to know how to write rules that hook into some type of syntactic
 construction, just plug the sentence into the visualizer and see how spaCy
 annotates it.

@ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on

 </Infobox>

-<Accordion title="Should I change the language data or add custom tokenizer rules?">
+<Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer">

 Tokenization rules that are specific to one language, but can be **generalized
 across that language** should ideally live in the language data in
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -41,7 +41,7 @@ contribute to model development.
 > If a model is available for a language, you can download it using the
 > [`spacy download`](/api/cli#download) command. In order to use languages that
 > don't yet come with a model, you have to import them directly, or use
-> [`spacy.blank`](api/top-level#spacy.blank):
+> [`spacy.blank`](/api/top-level#spacy.blank):
 >
 > ```python
 > from spacy.lang.fi import Finnish
--- a/website/docs/usage/processing-pipelines.md
+++ b/website/docs/usage/processing-pipelines.md
@ -46,7 +46,8 @@ components. spaCy then does the following:
 3. Add each pipeline component to the pipeline in order, using
   [`add_pipe`](/api/language#add_pipe).
 4. Make the **model data** available to the `Language` class by calling
-   [`from_disk`](language#from_disk) with the path to the model data directory.
+   [`from_disk`](/api/language#from_disk) with the path to the model data
+   directory.

 So when you call this...

@ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning
 libraries. It also lets you take advantage of spaCy's data structures and the
 `Doc` object as the "single source of truth".

-<Accordion title="Why ._ and not just a top-level attribute?">
+<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">

 Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer
 separation and makes it easier to ensure backwards compatibility. For example,
@ -437,7 +438,7 @@ immediately know what's built-in and what's custom – for example,

 </Accordion>

-<Accordion title="How is the ._ implemented?">
+<Accordion title="How is the ._ implemented?" id="dot-underscore-implementation">

 Extension definitions – the defaults, methods, getters and setters you pass in
 to `set_extension` – are stored in class attributes on the `Underscore` class.
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the
 surrounding tokens, merge spans into single tokens or add entries to the named
 entities in `doc.ents`.

-<Accordion title="Should I use rules or train a model?">
+<Accordion title="Should I use rules or train a model?" id="rules-vs-model">

 For complex tasks, it's usually better to train a statistical entity recognition
 model. However, statistical models require training data, so for many
@ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler).

 </Accordion>

-<Accordion title="When should I use the token matcher vs. the phrase matcher?">
+<Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher">

 The `PhraseMatcher` is useful if you already have a large terminology list or
 gazetteer consisting of single or multi-token phrases that you want to find
--- a/website/docs/usage/spacy-101.md
+++ b/website/docs/usage/spacy-101.md
@ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**.

 </div>

-<Infobox title="Table of contents">
+<Infobox title="Table of contents" id="toc">

 - [Features](#features)
 - [Linguistic annotations](#annotations)
--- a/website/docs/usage/v2.md
+++ b/website/docs/usage/v2.md
@ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`,

 </div>

-<Infobox title="Table of Contents">
+<Infobox title="Table of Contents" id="toc">

 - [Summary](#summary)
 - [New features](#features)
--- a/website/docs/usage/visualizers.md
+++ b/website/docs/usage/visualizers.md
@ -75,7 +75,7 @@ arcs.
 | `font`    | unicode | Font name or font family for all text.                      | `"Arial"`   |

 For a list of all available options, see the
-[`displacy` API documentation](top-level#displacy_options).
+[`displacy` API documentation](/api/top-level#displacy_options).

 > #### Options example
 >
--- a/website/package.json
+++ b/website/package.json
@ -12,7 +12,6 @@
        "@mdx-js/tag": "^0.17.5",
        "@phosphor/widgets": "^1.6.0",
        "@rehooks/online-status": "^1.0.0",
-        "@sindresorhus/slugify": "^0.8.0",
        "@svgr/webpack": "^4.1.0",
        "autoprefixer": "^9.4.7",
        "classnames": "^2.2.6",
@ -62,7 +61,8 @@
        "md-attr-parser": "^1.2.1",
        "prettier": "^1.16.4",
        "raw-loader": "^1.0.0",
-        "unist-util-visit": "^1.4.0"
+        "unist-util-visit": "^1.4.0",
+        "@sindresorhus/slugify": "^0.8.0"
    },
    "repository": {
        "type": "git",
--- a/website/src/components/accordion.js
+++ b/website/src/components/accordion.js
@ -1,33 +1,38 @@
-import React, { useState } from 'react'
+import React, { useState, useEffect } from 'react'
 import PropTypes from 'prop-types'
 import classNames from 'classnames'
-import slugify from '@sindresorhus/slugify'

 import Link from './link'
 import classes from '../styles/accordion.module.sass'

 const Accordion = ({ title, id, expanded, children }) => {
-    const anchorId = id || slugify(title)
-    const [isExpanded, setIsExpanded] = useState(expanded)
+    const [isExpanded, setIsExpanded] = useState(true)
    const contentClassNames = classNames(classes.content, {
        [classes.hidden]: !isExpanded,
    })
    const iconClassNames = classNames({
        [classes.hidden]: isExpanded,
    })
+    // Make sure accordion is expanded if JS is disabled
+    useEffect(() => setIsExpanded(expanded), [])
    return (
-        <section id={anchorId}>
+        <section className="accordion" id={id}>
            <div className={classes.root}>
-                <h3>
+                <h4>
                    <button
                        className={classes.button}
                        aria-expanded={String(isExpanded)}
                        onClick={() => setIsExpanded(!isExpanded)}
                    >
                        <span>
-                            {title}
-                            {isExpanded && (
-                                <Link to={`#${anchorId}`} className={classes.anchor} hidden>
+                            <span className="heading-text">{title}</span>
+                            {isExpanded && !!id && (
+                                <Link
+                                    to={`#${id}`}
+                                    className={classes.anchor}
+                                    hidden
+                                    onClick={event => event.stopPropagation()}
+                                >
                                    &para;
                                </Link>
                            )}
@ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => {
                            <rect height={2} width={8} x={1} y={4} />
                        </svg>
                    </button>
-                </h3>
+                </h4>
                <div className={contentClassNames}>{children}</div>
            </div>
        </section>
--- a/website/src/components/infobox.js
+++ b/website/src/components/infobox.js
@ -5,13 +5,13 @@ import classNames from 'classnames'
 import Icon from './icon'
 import classes from '../styles/infobox.module.sass'

-const Infobox = ({ title, variant, className, children }) => {
+const Infobox = ({ title, id, variant, className, children }) => {
    const infoboxClassNames = classNames(classes.root, className, {
        [classes.warning]: variant === 'warning',
        [classes.danger]: variant === 'danger',
    })
    return (
-        <aside className={infoboxClassNames}>
+        <aside className={infoboxClassNames} id={id}>
            {title && (
                <h4 className={classes.title}>
                    {variant !== 'default' && (
@ -31,6 +31,7 @@ Infobox.defaultProps = {

 Infobox.propTypes = {
    title: PropTypes.string,
+    id: PropTypes.string,
    variant: PropTypes.oneOf(['default', 'warning', 'danger']),
    className: PropTypes.string,
    children: PropTypes.node.isRequired,
--- a/website/src/components/juniper.js
+++ b/website/src/components/juniper.js
@ -232,6 +232,7 @@ Juniper.defaultProps = {
    theme: 'default',
    isolateCells: true,
    useBinder: true,
+    storageKey: 'juniper',
    useStorage: true,
    storageExpire: 60,
    debug: false,
--- a/website/src/templates/models.js
+++ b/website/src/templates/models.js
@ -105,7 +105,7 @@ const Help = ({ children }) => (

 const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => {
    const [initialized, setInitialized] = useState(false)
-    const [isError, setIsError] = useState(false)
+    const [isError, setIsError] = useState(true)
    const [meta, setMeta] = useState({})
    const { type, genre, size } = getModelComponents(name)
    const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility])
@ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
    useEffect(() => {
        window.dispatchEvent(new Event('resize')) // scroll position for progress
        if (!initialized && version) {
+            setIsError(false)
            fetch(`${baseUrl}/meta/${name}-${version}.json`)
                .then(res => res.json())
                .then(json => {
@ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
    const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link>
    const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null
    const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license
-    const hasInteractiveCode = size === 'sm' && hasExamples
+    const hasInteractiveCode = size === 'sm' && hasExamples && !isError

    const rows = [
        { label: 'Language', tag: langId, content: langName },