Merge branch 'develop' into spacy.io

2025-10-31 16:07:41 +03:00 · 2019-03-12 22:57:34 +01:00 · 2019-03-12 22:57:34 +01:00 · ec29e6f4c8
commit ec29e6f4c8
parent b456af305b 4cfe4aa224
15 changed files with 56 additions and 34 deletions
--- a/website/docs/api/annotation.md
+++ b/website/docs/api/annotation.md
@ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the
 training corpus and can be defined in the respective language data's
 [`tag_map.py`](/usage/adding-languages#tag-map).
-<Accordion title="Universal Part-of-speech Tags">
+<Accordion title="Universal Part-of-speech Tags" id="pos-universal">
 spaCy also maps all language-specific part-of-speech tags to a small, fixed set
 of word type tags following the
@ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's
 [models](/models). The individual labels are language-specific and depend on the
 training corpus.
-<Accordion title="Universal Dependency Labels">
+<Accordion title="Universal Dependency Labels" id="dependency-parsing-universal">
 The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is
 used in all languages trained on Universal Dependency Corpora.
--- a/website/docs/usage/101/_pipelines.md
+++ b/website/docs/usage/101/_pipelines.md
@ -33,9 +33,22 @@ list containing the component names:
 import Accordion from 'components/accordion.js'
-<Accordion title="Does the order of pipeline components matter?">
+<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">
-No
+In spaCy v2.x, the statistical components like the tagger or parser are
 independent and don't share any data between themselves. For example, the named
 entity recognizer doesn't use any features set by the tagger and parser, and so
 on. This means that you can swap them, or remove single components from the
 pipeline without affecting the others.
 However, custom components may depend on annotations set by other components.
 For example, a custom lemmatizer may need the part-of-speech tags assigned, so
 it'll only work if it's added after the tagger. The parser will respect
 pre-defined sentence boundaries, so if a previous component in the pipeline sets
 them, its dependency predictions may be different. Similarly, it matters if you
 add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
 recognizer: if it's added before, the entity recognizer will take the existing
 entities into account when making predictions.
 </Accordion>
--- a/website/docs/usage/adding-languages.md
+++ b/website/docs/usage/adding-languages.md
@ -39,7 +39,7 @@ and morphological analysis.
 </div>
-<Infobox title="Table of Contents">
+<Infobox title="Table of Contents" id="toc">
 - [Language data 101](#101)
 - [The Language subclass](#language-subclass)
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -298,9 +298,9 @@ different languages, see the
 The best way to understand spaCy's dependency parser is interactively. To make
 this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc`
 or a list of `Doc` objects to displaCy and run
-[`displacy.serve`](top-level#displacy.serve) to run the web server, or
+[`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or
-[`displacy.render`](top-level#displacy.render) to generate the raw markup. If
+[`displacy.render`](/api/top-level#displacy.render) to generate the raw markup.
-you want to know how to write rules that hook into some type of syntactic
+If you want to know how to write rules that hook into some type of syntactic
 construction, just plug the sentence into the visualizer and see how spaCy
 annotates it.
@ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on
 </Infobox>
-<Accordion title="Should I change the language data or add custom tokenizer rules?">
+<Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer">
 Tokenization rules that are specific to one language, but can be **generalized
 across that language** should ideally live in the language data in
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -41,7 +41,7 @@ contribute to model development.
 > If a model is available for a language, you can download it using the
 > [`spacy download`](/api/cli#download) command. In order to use languages that
 > don't yet come with a model, you have to import them directly, or use
-> [`spacy.blank`](api/top-level#spacy.blank):
+> [`spacy.blank`](/api/top-level#spacy.blank):
 >
 > ```python
 > from spacy.lang.fi import Finnish
--- a/website/docs/usage/processing-pipelines.md
+++ b/website/docs/usage/processing-pipelines.md
@ -46,7 +46,8 @@ components. spaCy then does the following:
 3. Add each pipeline component to the pipeline in order, using
   [`add_pipe`](/api/language#add_pipe).
 4. Make the **model data** available to the `Language` class by calling
-   [`from_disk`](language#from_disk) with the path to the model data directory.
+   [`from_disk`](/api/language#from_disk) with the path to the model data
   directory.
 So when you call this...
@ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning
 libraries. It also lets you take advantage of spaCy's data structures and the
 `Doc` object as the "single source of truth".
-<Accordion title="Why ._ and not just a top-level attribute?">
+<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">
 Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer
 separation and makes it easier to ensure backwards compatibility. For example,
@ -437,7 +438,7 @@ immediately know what's built-in and what's custom – for example,
 </Accordion>
-<Accordion title="How is the ._ implemented?">
+<Accordion title="How is the ._ implemented?" id="dot-underscore-implementation">
 Extension definitions – the defaults, methods, getters and setters you pass in
 to `set_extension` – are stored in class attributes on the `Underscore` class.
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the
 surrounding tokens, merge spans into single tokens or add entries to the named
 entities in `doc.ents`.
-<Accordion title="Should I use rules or train a model?">
+<Accordion title="Should I use rules or train a model?" id="rules-vs-model">
 For complex tasks, it's usually better to train a statistical entity recognition
 model. However, statistical models require training data, so for many
@ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler).
 </Accordion>
-<Accordion title="When should I use the token matcher vs. the phrase matcher?">
+<Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher">
 The `PhraseMatcher` is useful if you already have a large terminology list or
 gazetteer consisting of single or multi-token phrases that you want to find
--- a/website/docs/usage/spacy-101.md
+++ b/website/docs/usage/spacy-101.md
@ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**.
 </div>
-<Infobox title="Table of contents">
+<Infobox title="Table of contents" id="toc">
 - [Features](#features)
 - [Linguistic annotations](#annotations)
--- a/website/docs/usage/v2.md
+++ b/website/docs/usage/v2.md
@ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`,
 </div>
-<Infobox title="Table of Contents">
+<Infobox title="Table of Contents" id="toc">
 - [Summary](#summary)
 - [New features](#features)
--- a/website/docs/usage/visualizers.md
+++ b/website/docs/usage/visualizers.md
@ -75,7 +75,7 @@ arcs.
 | `font`    | unicode | Font name or font family for all text.                      | `"Arial"`   |
 For a list of all available options, see the
-[`displacy` API documentation](top-level#displacy_options).
+[`displacy` API documentation](/api/top-level#displacy_options).
 > #### Options example
 >
--- a/website/package.json
+++ b/website/package.json
@ -12,7 +12,6 @@
        "@mdx-js/tag": "^0.17.5",
        "@phosphor/widgets": "^1.6.0",
        "@rehooks/online-status": "^1.0.0",
        "@sindresorhus/slugify": "^0.8.0",
        "@svgr/webpack": "^4.1.0",
        "autoprefixer": "^9.4.7",
        "classnames": "^2.2.6",
@ -62,7 +61,8 @@
        "md-attr-parser": "^1.2.1",
        "prettier": "^1.16.4",
        "raw-loader": "^1.0.0",
-        "unist-util-visit": "^1.4.0"
+        "unist-util-visit": "^1.4.0",
        "@sindresorhus/slugify": "^0.8.0"
    },
    "repository": {
        "type": "git",
--- a/website/src/components/accordion.js
+++ b/website/src/components/accordion.js
@ -1,33 +1,38 @@
-import React, { useState } from 'react'
+import React, { useState, useEffect } from 'react'
 import PropTypes from 'prop-types'
 import classNames from 'classnames'
 import slugify from '@sindresorhus/slugify'
 import Link from './link'
 import classes from '../styles/accordion.module.sass'
 const Accordion = ({ title, id, expanded, children }) => {
-    const anchorId = id || slugify(title)
+    const [isExpanded, setIsExpanded] = useState(true)
    const [isExpanded, setIsExpanded] = useState(expanded)
    const contentClassNames = classNames(classes.content, {
        [classes.hidden]: !isExpanded,
    })
    const iconClassNames = classNames({
        [classes.hidden]: isExpanded,
    })
    // Make sure accordion is expanded if JS is disabled
    useEffect(() => setIsExpanded(expanded), [])
    return (
-        <section id={anchorId}>
+        <section className="accordion" id={id}>
            <div className={classes.root}>
-                <h3>
+                <h4>
                    <button
                        className={classes.button}
                        aria-expanded={String(isExpanded)}
                        onClick={() => setIsExpanded(!isExpanded)}
                    >
                        <span>
-                            {title}
+                            <span className="heading-text">{title}</span>
-                            {isExpanded && (
+                            {isExpanded && !!id && (
-                                <Link to={`#${anchorId}`} className={classes.anchor} hidden>
+                                <Link
                                    to={`#${id}`}
                                    className={classes.anchor}
                                    hidden
                                    onClick={event => event.stopPropagation()}
                                >
                                    &para;
                                </Link>
                            )}
@ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => {
                            <rect height={2} width={8} x={1} y={4} />
                        </svg>
                    </button>
-                </h3>
+                </h4>
                <div className={contentClassNames}>{children}</div>
            </div>
        </section>
--- a/website/src/components/infobox.js
+++ b/website/src/components/infobox.js
@ -5,13 +5,13 @@ import classNames from 'classnames'
 import Icon from './icon'
 import classes from '../styles/infobox.module.sass'
-const Infobox = ({ title, variant, className, children }) => {
+const Infobox = ({ title, id, variant, className, children }) => {
    const infoboxClassNames = classNames(classes.root, className, {
        [classes.warning]: variant === 'warning',
        [classes.danger]: variant === 'danger',
    })
    return (
-        <aside className={infoboxClassNames}>
+        <aside className={infoboxClassNames} id={id}>
            {title && (
                <h4 className={classes.title}>
                    {variant !== 'default' && (
@ -31,6 +31,7 @@ Infobox.defaultProps = {
 Infobox.propTypes = {
    title: PropTypes.string,
    id: PropTypes.string,
    variant: PropTypes.oneOf(['default', 'warning', 'danger']),
    className: PropTypes.string,
    children: PropTypes.node.isRequired,
--- a/website/src/components/juniper.js
+++ b/website/src/components/juniper.js
@ -232,6 +232,7 @@ Juniper.defaultProps = {
    theme: 'default',
    isolateCells: true,
    useBinder: true,
    storageKey: 'juniper',
    useStorage: true,
    storageExpire: 60,
    debug: false,
--- a/website/src/templates/models.js
+++ b/website/src/templates/models.js
@ -105,7 +105,7 @@ const Help = ({ children }) => (
 const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => {
    const [initialized, setInitialized] = useState(false)
-    const [isError, setIsError] = useState(false)
+    const [isError, setIsError] = useState(true)
    const [meta, setMeta] = useState({})
    const { type, genre, size } = getModelComponents(name)
    const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility])
@ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
    useEffect(() => {
        window.dispatchEvent(new Event('resize')) // scroll position for progress
        if (!initialized && version) {
            setIsError(false)
            fetch(`${baseUrl}/meta/${name}-${version}.json`)
                .then(res => res.json())
                .then(json => {
@ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
    const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link>
    const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null
    const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license
-    const hasInteractiveCode = size === 'sm' && hasExamples
+    const hasInteractiveCode = size === 'sm' && hasExamples && !isError
    const rows = [
        { label: 'Language', tag: langId, content: langName },