mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Merge branch 'develop' into spacy.io
This commit is contained in:
commit
ec29e6f4c8
|
@ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the
|
|||
training corpus and can be defined in the respective language data's
|
||||
[`tag_map.py`](/usage/adding-languages#tag-map).
|
||||
|
||||
<Accordion title="Universal Part-of-speech Tags">
|
||||
<Accordion title="Universal Part-of-speech Tags" id="pos-universal">
|
||||
|
||||
spaCy also maps all language-specific part-of-speech tags to a small, fixed set
|
||||
of word type tags following the
|
||||
|
@ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's
|
|||
[models](/models). The individual labels are language-specific and depend on the
|
||||
training corpus.
|
||||
|
||||
<Accordion title="Universal Dependency Labels">
|
||||
<Accordion title="Universal Dependency Labels" id="dependency-parsing-universal">
|
||||
|
||||
The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is
|
||||
used in all languages trained on Universal Dependency Corpora.
|
||||
|
|
|
@ -33,9 +33,22 @@ list containing the component names:
|
|||
|
||||
import Accordion from 'components/accordion.js'
|
||||
|
||||
<Accordion title="Does the order of pipeline components matter?">
|
||||
<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">
|
||||
|
||||
No
|
||||
In spaCy v2.x, the statistical components like the tagger or parser are
|
||||
independent and don't share any data between themselves. For example, the named
|
||||
entity recognizer doesn't use any features set by the tagger and parser, and so
|
||||
on. This means that you can swap them, or remove single components from the
|
||||
pipeline without affecting the others.
|
||||
|
||||
However, custom components may depend on annotations set by other components.
|
||||
For example, a custom lemmatizer may need the part-of-speech tags assigned, so
|
||||
it'll only work if it's added after the tagger. The parser will respect
|
||||
pre-defined sentence boundaries, so if a previous component in the pipeline sets
|
||||
them, its dependency predictions may be different. Similarly, it matters if you
|
||||
add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
|
||||
recognizer: if it's added before, the entity recognizer will take the existing
|
||||
entities into account when making predictions.
|
||||
|
||||
</Accordion>
|
||||
|
||||
|
|
|
@ -39,7 +39,7 @@ and morphological analysis.
|
|||
|
||||
</div>
|
||||
|
||||
<Infobox title="Table of Contents">
|
||||
<Infobox title="Table of Contents" id="toc">
|
||||
|
||||
- [Language data 101](#101)
|
||||
- [The Language subclass](#language-subclass)
|
||||
|
|
|
@ -298,9 +298,9 @@ different languages, see the
|
|||
The best way to understand spaCy's dependency parser is interactively. To make
|
||||
this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc`
|
||||
or a list of `Doc` objects to displaCy and run
|
||||
[`displacy.serve`](top-level#displacy.serve) to run the web server, or
|
||||
[`displacy.render`](top-level#displacy.render) to generate the raw markup. If
|
||||
you want to know how to write rules that hook into some type of syntactic
|
||||
[`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or
|
||||
[`displacy.render`](/api/top-level#displacy.render) to generate the raw markup.
|
||||
If you want to know how to write rules that hook into some type of syntactic
|
||||
construction, just plug the sentence into the visualizer and see how spaCy
|
||||
annotates it.
|
||||
|
||||
|
@ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on
|
|||
|
||||
</Infobox>
|
||||
|
||||
<Accordion title="Should I change the language data or add custom tokenizer rules?">
|
||||
<Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer">
|
||||
|
||||
Tokenization rules that are specific to one language, but can be **generalized
|
||||
across that language** should ideally live in the language data in
|
||||
|
|
|
@ -41,7 +41,7 @@ contribute to model development.
|
|||
> If a model is available for a language, you can download it using the
|
||||
> [`spacy download`](/api/cli#download) command. In order to use languages that
|
||||
> don't yet come with a model, you have to import them directly, or use
|
||||
> [`spacy.blank`](api/top-level#spacy.blank):
|
||||
> [`spacy.blank`](/api/top-level#spacy.blank):
|
||||
>
|
||||
> ```python
|
||||
> from spacy.lang.fi import Finnish
|
||||
|
|
|
@ -46,7 +46,8 @@ components. spaCy then does the following:
|
|||
3. Add each pipeline component to the pipeline in order, using
|
||||
[`add_pipe`](/api/language#add_pipe).
|
||||
4. Make the **model data** available to the `Language` class by calling
|
||||
[`from_disk`](language#from_disk) with the path to the model data directory.
|
||||
[`from_disk`](/api/language#from_disk) with the path to the model data
|
||||
directory.
|
||||
|
||||
So when you call this...
|
||||
|
||||
|
@ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning
|
|||
libraries. It also lets you take advantage of spaCy's data structures and the
|
||||
`Doc` object as the "single source of truth".
|
||||
|
||||
<Accordion title="Why ._ and not just a top-level attribute?">
|
||||
<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">
|
||||
|
||||
Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer
|
||||
separation and makes it easier to ensure backwards compatibility. For example,
|
||||
|
@ -437,7 +438,7 @@ immediately know what's built-in and what's custom – for example,
|
|||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="How is the ._ implemented?">
|
||||
<Accordion title="How is the ._ implemented?" id="dot-underscore-implementation">
|
||||
|
||||
Extension definitions – the defaults, methods, getters and setters you pass in
|
||||
to `set_extension` – are stored in class attributes on the `Underscore` class.
|
||||
|
|
|
@ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the
|
|||
surrounding tokens, merge spans into single tokens or add entries to the named
|
||||
entities in `doc.ents`.
|
||||
|
||||
<Accordion title="Should I use rules or train a model?">
|
||||
<Accordion title="Should I use rules or train a model?" id="rules-vs-model">
|
||||
|
||||
For complex tasks, it's usually better to train a statistical entity recognition
|
||||
model. However, statistical models require training data, so for many
|
||||
|
@ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler).
|
|||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="When should I use the token matcher vs. the phrase matcher?">
|
||||
<Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher">
|
||||
|
||||
The `PhraseMatcher` is useful if you already have a large terminology list or
|
||||
gazetteer consisting of single or multi-token phrases that you want to find
|
||||
|
|
|
@ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**.
|
|||
|
||||
</div>
|
||||
|
||||
<Infobox title="Table of contents">
|
||||
<Infobox title="Table of contents" id="toc">
|
||||
|
||||
- [Features](#features)
|
||||
- [Linguistic annotations](#annotations)
|
||||
|
|
|
@ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`,
|
|||
|
||||
</div>
|
||||
|
||||
<Infobox title="Table of Contents">
|
||||
<Infobox title="Table of Contents" id="toc">
|
||||
|
||||
- [Summary](#summary)
|
||||
- [New features](#features)
|
||||
|
|
|
@ -75,7 +75,7 @@ arcs.
|
|||
| `font` | unicode | Font name or font family for all text. | `"Arial"` |
|
||||
|
||||
For a list of all available options, see the
|
||||
[`displacy` API documentation](top-level#displacy_options).
|
||||
[`displacy` API documentation](/api/top-level#displacy_options).
|
||||
|
||||
> #### Options example
|
||||
>
|
||||
|
|
|
@ -12,7 +12,6 @@
|
|||
"@mdx-js/tag": "^0.17.5",
|
||||
"@phosphor/widgets": "^1.6.0",
|
||||
"@rehooks/online-status": "^1.0.0",
|
||||
"@sindresorhus/slugify": "^0.8.0",
|
||||
"@svgr/webpack": "^4.1.0",
|
||||
"autoprefixer": "^9.4.7",
|
||||
"classnames": "^2.2.6",
|
||||
|
@ -62,7 +61,8 @@
|
|||
"md-attr-parser": "^1.2.1",
|
||||
"prettier": "^1.16.4",
|
||||
"raw-loader": "^1.0.0",
|
||||
"unist-util-visit": "^1.4.0"
|
||||
"unist-util-visit": "^1.4.0",
|
||||
"@sindresorhus/slugify": "^0.8.0"
|
||||
},
|
||||
"repository": {
|
||||
"type": "git",
|
||||
|
|
|
@ -1,33 +1,38 @@
|
|||
import React, { useState } from 'react'
|
||||
import React, { useState, useEffect } from 'react'
|
||||
import PropTypes from 'prop-types'
|
||||
import classNames from 'classnames'
|
||||
import slugify from '@sindresorhus/slugify'
|
||||
|
||||
import Link from './link'
|
||||
import classes from '../styles/accordion.module.sass'
|
||||
|
||||
const Accordion = ({ title, id, expanded, children }) => {
|
||||
const anchorId = id || slugify(title)
|
||||
const [isExpanded, setIsExpanded] = useState(expanded)
|
||||
const [isExpanded, setIsExpanded] = useState(true)
|
||||
const contentClassNames = classNames(classes.content, {
|
||||
[classes.hidden]: !isExpanded,
|
||||
})
|
||||
const iconClassNames = classNames({
|
||||
[classes.hidden]: isExpanded,
|
||||
})
|
||||
// Make sure accordion is expanded if JS is disabled
|
||||
useEffect(() => setIsExpanded(expanded), [])
|
||||
return (
|
||||
<section id={anchorId}>
|
||||
<section className="accordion" id={id}>
|
||||
<div className={classes.root}>
|
||||
<h3>
|
||||
<h4>
|
||||
<button
|
||||
className={classes.button}
|
||||
aria-expanded={String(isExpanded)}
|
||||
onClick={() => setIsExpanded(!isExpanded)}
|
||||
>
|
||||
<span>
|
||||
{title}
|
||||
{isExpanded && (
|
||||
<Link to={`#${anchorId}`} className={classes.anchor} hidden>
|
||||
<span className="heading-text">{title}</span>
|
||||
{isExpanded && !!id && (
|
||||
<Link
|
||||
to={`#${id}`}
|
||||
className={classes.anchor}
|
||||
hidden
|
||||
onClick={event => event.stopPropagation()}
|
||||
>
|
||||
¶
|
||||
</Link>
|
||||
)}
|
||||
|
@ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => {
|
|||
<rect height={2} width={8} x={1} y={4} />
|
||||
</svg>
|
||||
</button>
|
||||
</h3>
|
||||
</h4>
|
||||
<div className={contentClassNames}>{children}</div>
|
||||
</div>
|
||||
</section>
|
||||
|
|
|
@ -5,13 +5,13 @@ import classNames from 'classnames'
|
|||
import Icon from './icon'
|
||||
import classes from '../styles/infobox.module.sass'
|
||||
|
||||
const Infobox = ({ title, variant, className, children }) => {
|
||||
const Infobox = ({ title, id, variant, className, children }) => {
|
||||
const infoboxClassNames = classNames(classes.root, className, {
|
||||
[classes.warning]: variant === 'warning',
|
||||
[classes.danger]: variant === 'danger',
|
||||
})
|
||||
return (
|
||||
<aside className={infoboxClassNames}>
|
||||
<aside className={infoboxClassNames} id={id}>
|
||||
{title && (
|
||||
<h4 className={classes.title}>
|
||||
{variant !== 'default' && (
|
||||
|
@ -31,6 +31,7 @@ Infobox.defaultProps = {
|
|||
|
||||
Infobox.propTypes = {
|
||||
title: PropTypes.string,
|
||||
id: PropTypes.string,
|
||||
variant: PropTypes.oneOf(['default', 'warning', 'danger']),
|
||||
className: PropTypes.string,
|
||||
children: PropTypes.node.isRequired,
|
||||
|
|
|
@ -232,6 +232,7 @@ Juniper.defaultProps = {
|
|||
theme: 'default',
|
||||
isolateCells: true,
|
||||
useBinder: true,
|
||||
storageKey: 'juniper',
|
||||
useStorage: true,
|
||||
storageExpire: 60,
|
||||
debug: false,
|
||||
|
|
|
@ -105,7 +105,7 @@ const Help = ({ children }) => (
|
|||
|
||||
const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => {
|
||||
const [initialized, setInitialized] = useState(false)
|
||||
const [isError, setIsError] = useState(false)
|
||||
const [isError, setIsError] = useState(true)
|
||||
const [meta, setMeta] = useState({})
|
||||
const { type, genre, size } = getModelComponents(name)
|
||||
const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility])
|
||||
|
@ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
|
|||
useEffect(() => {
|
||||
window.dispatchEvent(new Event('resize')) // scroll position for progress
|
||||
if (!initialized && version) {
|
||||
setIsError(false)
|
||||
fetch(`${baseUrl}/meta/${name}-${version}.json`)
|
||||
.then(res => res.json())
|
||||
.then(json => {
|
||||
|
@ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl
|
|||
const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link>
|
||||
const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null
|
||||
const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license
|
||||
const hasInteractiveCode = size === 'sm' && hasExamples
|
||||
const hasInteractiveCode = size === 'sm' && hasExamples && !isError
|
||||
|
||||
const rows = [
|
||||
{ label: 'Language', tag: langId, content: langName },
|
||||
|
|
Loading…
Reference in New Issue
Block a user