Merge branch 'develop' into spacy.io

This commit is contained in:
Ines Montani 2019-03-12 22:57:34 +01:00
commit ec29e6f4c8
15 changed files with 56 additions and 34 deletions

View File

@ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the
training corpus and can be defined in the respective language data's
[`tag_map.py`](/usage/adding-languages#tag-map).
<Accordion title="Universal Part-of-speech Tags">
<Accordion title="Universal Part-of-speech Tags" id="pos-universal">
spaCy also maps all language-specific part-of-speech tags to a small, fixed set
of word type tags following the
@ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's
[models](/models). The individual labels are language-specific and depend on the
training corpus.
<Accordion title="Universal Dependency Labels">
<Accordion title="Universal Dependency Labels" id="dependency-parsing-universal">
The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is
used in all languages trained on Universal Dependency Corpora.

View File

@ -33,9 +33,22 @@ list containing the component names:
import Accordion from 'components/accordion.js'
<Accordion title="Does the order of pipeline components matter?">
<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">
No
In spaCy v2.x, the statistical components like the tagger or parser are
independent and don't share any data between themselves. For example, the named
entity recognizer doesn't use any features set by the tagger and parser, and so
on. This means that you can swap them, or remove single components from the
pipeline without affecting the others.
However, custom components may depend on annotations set by other components.
For example, a custom lemmatizer may need the part-of-speech tags assigned, so
it'll only work if it's added after the tagger. The parser will respect
pre-defined sentence boundaries, so if a previous component in the pipeline sets
them, its dependency predictions may be different. Similarly, it matters if you
add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
recognizer: if it's added before, the entity recognizer will take the existing
entities into account when making predictions.
</Accordion>

View File

@ -39,7 +39,7 @@ and morphological analysis.
</div>
<Infobox title="Table of Contents">
<Infobox title="Table of Contents" id="toc">
- [Language data 101](#101)
- [The Language subclass](#language-subclass)

View File

@ -298,9 +298,9 @@ different languages, see the
The best way to understand spaCy's dependency parser is interactively. To make
this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc`
or a list of `Doc` objects to displaCy and run
[`displacy.serve`](top-level#displacy.serve) to run the web server, or
[`displacy.render`](top-level#displacy.render) to generate the raw markup. If
you want to know how to write rules that hook into some type of syntactic
[`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or
[`displacy.render`](/api/top-level#displacy.render) to generate the raw markup.
If you want to know how to write rules that hook into some type of syntactic
construction, just plug the sentence into the visualizer and see how spaCy
annotates it.
@ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on
</Infobox>
<Accordion title="Should I change the language data or add custom tokenizer rules?">
<Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer">
Tokenization rules that are specific to one language, but can be **generalized
across that language** should ideally live in the language data in

View File

@ -41,7 +41,7 @@ contribute to model development.
> If a model is available for a language, you can download it using the
> [`spacy download`](/api/cli#download) command. In order to use languages that
> don't yet come with a model, you have to import them directly, or use
> [`spacy.blank`](api/top-level#spacy.blank):
> [`spacy.blank`](/api/top-level#spacy.blank):
>
> ```python
> from spacy.lang.fi import Finnish

View File

@ -46,7 +46,8 @@ components. spaCy then does the following:
3. Add each pipeline component to the pipeline in order, using
[`add_pipe`](/api/language#add_pipe).
4. Make the **model data** available to the `Language` class by calling
[`from_disk`](language#from_disk) with the path to the model data directory.
[`from_disk`](/api/language#from_disk) with the path to the model data
directory.
So when you call this...
@ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning
libraries. It also lets you take advantage of spaCy's data structures and the
`Doc` object as the "single source of truth".
<Accordion title="Why ._ and not just a top-level attribute?">
<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">
Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer
separation and makes it easier to ensure backwards compatibility. For example,
@ -437,7 +438,7 @@ immediately know what's built-in and what's custom for example,
</Accordion>
<Accordion title="How is the ._ implemented?">
<Accordion title="How is the ._ implemented?" id="dot-underscore-implementation">
Extension definitions the defaults, methods, getters and setters you pass in
to `set_extension` are stored in class attributes on the `Underscore` class.

View File

@ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the
surrounding tokens, merge spans into single tokens or add entries to the named
entities in `doc.ents`.
<Accordion title="Should I use rules or train a model?">
<Accordion title="Should I use rules or train a model?" id="rules-vs-model">
For complex tasks, it's usually better to train a statistical entity recognition
model. However, statistical models require training data, so for many
@ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler).
</Accordion>
<Accordion title="When should I use the token matcher vs. the phrase matcher?">
<Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher">
The `PhraseMatcher` is useful if you already have a large terminology list or
gazetteer consisting of single or multi-token phrases that you want to find

View File

@ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**.
</div>
<Infobox title="Table of contents">
<Infobox title="Table of contents" id="toc">
- [Features](#features)
- [Linguistic annotations](#annotations)

View File

@ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`,
</div>
<Infobox title="Table of Contents">
<Infobox title="Table of Contents" id="toc">
- [Summary](#summary)
- [New features](#features)

View File

@ -75,7 +75,7 @@ arcs.
| `font` | unicode | Font name or font family for all text. | `"Arial"` |
For a list of all available options, see the
[`displacy` API documentation](top-level#displacy_options).
[`displacy` API documentation](/api/top-level#displacy_options).
> #### Options example
>

View File

@ -12,7 +12,6 @@
"@mdx-js/tag": "^0.17.5",
"@phosphor/widgets": "^1.6.0",
"@rehooks/online-status": "^1.0.0",
"@sindresorhus/slugify": "^0.8.0",
"@svgr/webpack": "^4.1.0",
"autoprefixer": "^9.4.7",
"classnames": "^2.2.6",
@ -62,7 +61,8 @@
"md-attr-parser": "^1.2.1",
"prettier": "^1.16.4",
"raw-loader": "^1.0.0",
"unist-util-visit": "^1.4.0"
"unist-util-visit": "^1.4.0",
"@sindresorhus/slugify": "^0.8.0"
},
"repository": {
"type": "git",

View File

@ -1,33 +1,38 @@
import React, { useState } from 'react'
import React, { useState, useEffect } from 'react'
import PropTypes from 'prop-types'
import classNames from 'classnames'
import slugify from '@sindresorhus/slugify'
import Link from './link'
import classes from '../styles/accordion.module.sass'
const Accordion = ({ title, id, expanded, children }) => {
const anchorId = id || slugify(title)
const [isExpanded, setIsExpanded] = useState(expanded)
const [isExpanded, setIsExpanded] = useState(true)
const contentClassNames = classNames(classes.content, {
[classes.hidden]: !isExpanded,
})
const iconClassNames = classNames({
[classes.hidden]: isExpanded,
})
// Make sure accordion is expanded if JS is disabled
useEffect(() => setIsExpanded(expanded), [])
return (
<section id={anchorId}>
<section className="accordion" id={id}>
<div className={classes.root}>
<h3>
<h4>
<button
className={classes.button}
aria-expanded={String(isExpanded)}
onClick={() => setIsExpanded(!isExpanded)}
>
<span>
{title}
{isExpanded && (
<Link to={`#${anchorId}`} className={classes.anchor} hidden>
<span className="heading-text">{title}</span>
{isExpanded && !!id && (
<Link
to={`#${id}`}
className={classes.anchor}
hidden
onClick={event => event.stopPropagation()}
>
&para;
</Link>
)}
@ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => {
<rect height={2} width={8} x={1} y={4} />
</svg>
</button>
</h3>
</h4>
<div className={contentClassNames}>{children}</div>
</div>
</section>

View File

@ -5,13 +5,13 @@ import classNames from 'classnames'
import Icon from './icon'
import classes from '../styles/infobox.module.sass'
const Infobox = ({ title, variant, className, children }) => {
const Infobox = ({ title, id, variant, className, children }) => {
const infoboxClassNames = classNames(classes.root, className, {
[classes.warning]: variant === 'warning',
[classes.danger]: variant === 'danger',
})
return (
<aside className={infoboxClassNames}>
<aside className={infoboxClassNames} id={id}>
{title && (
<h4 className={classes.title}>
{variant !== 'default' && (
@ -31,6 +31,7 @@ Infobox.defaultProps = {
Infobox.propTypes = {
title: PropTypes.string,
id: PropTypes.string,
variant: PropTypes.oneOf(['default', 'warning', 'danger']),
className: PropTypes.string,
children: PropTypes.node.isRequired,

View File

@ -232,6 +232,7 @@ Juniper.defaultProps = {
theme: 'default',
isolateCells: true,
useBinder: true,
storageKey: 'juniper',
useStorage: true,
storageExpire: 60,
debug: false,

View File

@ -105,7 +105,7 @@ const Help = ({ children }) => (
const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => {
const [initialized, setInitialized] = useState(false)
const [isError, setIsError] = useState(false)
const [isError, setIsError] = useState(true)
const [meta, setMeta] = useState({})
const { type, genre, size } = getModelComponents(name)
const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility])
@ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
useEffect(() => {
window.dispatchEvent(new Event('resize')) // scroll position for progress
if (!initialized && version) {
setIsError(false)
fetch(`${baseUrl}/meta/${name}-${version}.json`)
.then(res => res.json())
.then(json => {
@ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link>
const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null
const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license
const hasInteractiveCode = size === 'sm' && hasExamples
const hasInteractiveCode = size === 'sm' && hasExamples && !isError
const rows = [
{ label: 'Language', tag: langId, content: langName },