mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Merge branch 'develop' into spacy.io
This commit is contained in:
		
						commit
						ec29e6f4c8
					
				|  | @ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the | ||||||
| training corpus and can be defined in the respective language data's | training corpus and can be defined in the respective language data's | ||||||
| [`tag_map.py`](/usage/adding-languages#tag-map). | [`tag_map.py`](/usage/adding-languages#tag-map). | ||||||
| 
 | 
 | ||||||
| <Accordion title="Universal Part-of-speech Tags"> | <Accordion title="Universal Part-of-speech Tags" id="pos-universal"> | ||||||
| 
 | 
 | ||||||
| spaCy also maps all language-specific part-of-speech tags to a small, fixed set | spaCy also maps all language-specific part-of-speech tags to a small, fixed set | ||||||
| of word type tags following the | of word type tags following the | ||||||
|  | @ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's | ||||||
| [models](/models). The individual labels are language-specific and depend on the | [models](/models). The individual labels are language-specific and depend on the | ||||||
| training corpus. | training corpus. | ||||||
| 
 | 
 | ||||||
| <Accordion title="Universal Dependency Labels"> | <Accordion title="Universal Dependency Labels" id="dependency-parsing-universal"> | ||||||
| 
 | 
 | ||||||
| The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is | The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is | ||||||
| used in all languages trained on Universal Dependency Corpora. | used in all languages trained on Universal Dependency Corpora. | ||||||
|  |  | ||||||
|  | @ -33,9 +33,22 @@ list containing the component names: | ||||||
| 
 | 
 | ||||||
| import Accordion from 'components/accordion.js' | import Accordion from 'components/accordion.js' | ||||||
| 
 | 
 | ||||||
| <Accordion title="Does the order of pipeline components matter?"> | <Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order"> | ||||||
| 
 | 
 | ||||||
| No | In spaCy v2.x, the statistical components like the tagger or parser are | ||||||
|  | independent and don't share any data between themselves. For example, the named | ||||||
|  | entity recognizer doesn't use any features set by the tagger and parser, and so | ||||||
|  | on. This means that you can swap them, or remove single components from the | ||||||
|  | pipeline without affecting the others. | ||||||
|  | 
 | ||||||
|  | However, custom components may depend on annotations set by other components. | ||||||
|  | For example, a custom lemmatizer may need the part-of-speech tags assigned, so | ||||||
|  | it'll only work if it's added after the tagger. The parser will respect | ||||||
|  | pre-defined sentence boundaries, so if a previous component in the pipeline sets | ||||||
|  | them, its dependency predictions may be different. Similarly, it matters if you | ||||||
|  | add the [`EntityRuler`](/api/entityruler) before or after the statistical entity | ||||||
|  | recognizer: if it's added before, the entity recognizer will take the existing | ||||||
|  | entities into account when making predictions. | ||||||
| 
 | 
 | ||||||
| </Accordion> | </Accordion> | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -39,7 +39,7 @@ and morphological analysis. | ||||||
| 
 | 
 | ||||||
| </div> | </div> | ||||||
| 
 | 
 | ||||||
| <Infobox title="Table of Contents"> | <Infobox title="Table of Contents" id="toc"> | ||||||
| 
 | 
 | ||||||
| - [Language data 101](#101) | - [Language data 101](#101) | ||||||
| - [The Language subclass](#language-subclass) | - [The Language subclass](#language-subclass) | ||||||
|  |  | ||||||
|  | @ -298,9 +298,9 @@ different languages, see the | ||||||
| The best way to understand spaCy's dependency parser is interactively. To make | The best way to understand spaCy's dependency parser is interactively. To make | ||||||
| this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc` | this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc` | ||||||
| or a list of `Doc` objects to displaCy and run | or a list of `Doc` objects to displaCy and run | ||||||
| [`displacy.serve`](top-level#displacy.serve) to run the web server, or | [`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or | ||||||
| [`displacy.render`](top-level#displacy.render) to generate the raw markup. If | [`displacy.render`](/api/top-level#displacy.render) to generate the raw markup. | ||||||
| you want to know how to write rules that hook into some type of syntactic | If you want to know how to write rules that hook into some type of syntactic | ||||||
| construction, just plug the sentence into the visualizer and see how spaCy | construction, just plug the sentence into the visualizer and see how spaCy | ||||||
| annotates it. | annotates it. | ||||||
| 
 | 
 | ||||||
|  | @ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on | ||||||
| 
 | 
 | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
| <Accordion title="Should I change the language data or add custom tokenizer rules?"> | <Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer"> | ||||||
| 
 | 
 | ||||||
| Tokenization rules that are specific to one language, but can be **generalized | Tokenization rules that are specific to one language, but can be **generalized | ||||||
| across that language** should ideally live in the language data in | across that language** should ideally live in the language data in | ||||||
|  |  | ||||||
|  | @ -41,7 +41,7 @@ contribute to model development. | ||||||
| > If a model is available for a language, you can download it using the | > If a model is available for a language, you can download it using the | ||||||
| > [`spacy download`](/api/cli#download) command. In order to use languages that | > [`spacy download`](/api/cli#download) command. In order to use languages that | ||||||
| > don't yet come with a model, you have to import them directly, or use | > don't yet come with a model, you have to import them directly, or use | ||||||
| > [`spacy.blank`](api/top-level#spacy.blank): | > [`spacy.blank`](/api/top-level#spacy.blank): | ||||||
| > | > | ||||||
| > ```python | > ```python | ||||||
| > from spacy.lang.fi import Finnish | > from spacy.lang.fi import Finnish | ||||||
|  |  | ||||||
|  | @ -46,7 +46,8 @@ components. spaCy then does the following: | ||||||
| 3. Add each pipeline component to the pipeline in order, using | 3. Add each pipeline component to the pipeline in order, using | ||||||
|    [`add_pipe`](/api/language#add_pipe). |    [`add_pipe`](/api/language#add_pipe). | ||||||
| 4. Make the **model data** available to the `Language` class by calling | 4. Make the **model data** available to the `Language` class by calling | ||||||
|    [`from_disk`](language#from_disk) with the path to the model data directory. |    [`from_disk`](/api/language#from_disk) with the path to the model data | ||||||
|  |    directory. | ||||||
| 
 | 
 | ||||||
| So when you call this... | So when you call this... | ||||||
| 
 | 
 | ||||||
|  | @ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning | ||||||
| libraries. It also lets you take advantage of spaCy's data structures and the | libraries. It also lets you take advantage of spaCy's data structures and the | ||||||
| `Doc` object as the "single source of truth". | `Doc` object as the "single source of truth". | ||||||
| 
 | 
 | ||||||
| <Accordion title="Why ._ and not just a top-level attribute?"> | <Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore"> | ||||||
| 
 | 
 | ||||||
| Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer | Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer | ||||||
| separation and makes it easier to ensure backwards compatibility. For example, | separation and makes it easier to ensure backwards compatibility. For example, | ||||||
|  | @ -437,7 +438,7 @@ immediately know what's built-in and what's custom – for example, | ||||||
| 
 | 
 | ||||||
| </Accordion> | </Accordion> | ||||||
| 
 | 
 | ||||||
| <Accordion title="How is the ._ implemented?"> | <Accordion title="How is the ._ implemented?" id="dot-underscore-implementation"> | ||||||
| 
 | 
 | ||||||
| Extension definitions – the defaults, methods, getters and setters you pass in | Extension definitions – the defaults, methods, getters and setters you pass in | ||||||
| to `set_extension` – are stored in class attributes on the `Underscore` class. | to `set_extension` – are stored in class attributes on the `Underscore` class. | ||||||
|  |  | ||||||
|  | @ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the | ||||||
| surrounding tokens, merge spans into single tokens or add entries to the named | surrounding tokens, merge spans into single tokens or add entries to the named | ||||||
| entities in `doc.ents`. | entities in `doc.ents`. | ||||||
| 
 | 
 | ||||||
| <Accordion title="Should I use rules or train a model?"> | <Accordion title="Should I use rules or train a model?" id="rules-vs-model"> | ||||||
| 
 | 
 | ||||||
| For complex tasks, it's usually better to train a statistical entity recognition | For complex tasks, it's usually better to train a statistical entity recognition | ||||||
| model. However, statistical models require training data, so for many | model. However, statistical models require training data, so for many | ||||||
|  | @ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler). | ||||||
| 
 | 
 | ||||||
| </Accordion> | </Accordion> | ||||||
| 
 | 
 | ||||||
| <Accordion title="When should I use the token matcher vs. the phrase matcher?"> | <Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher"> | ||||||
| 
 | 
 | ||||||
| The `PhraseMatcher` is useful if you already have a large terminology list or | The `PhraseMatcher` is useful if you already have a large terminology list or | ||||||
| gazetteer consisting of single or multi-token phrases that you want to find | gazetteer consisting of single or multi-token phrases that you want to find | ||||||
|  |  | ||||||
|  | @ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**. | ||||||
| 
 | 
 | ||||||
| </div> | </div> | ||||||
| 
 | 
 | ||||||
| <Infobox title="Table of contents"> | <Infobox title="Table of contents" id="toc"> | ||||||
| 
 | 
 | ||||||
| - [Features](#features) | - [Features](#features) | ||||||
| - [Linguistic annotations](#annotations) | - [Linguistic annotations](#annotations) | ||||||
|  |  | ||||||
|  | @ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`, | ||||||
| 
 | 
 | ||||||
| </div> | </div> | ||||||
| 
 | 
 | ||||||
| <Infobox title="Table of Contents"> | <Infobox title="Table of Contents" id="toc"> | ||||||
| 
 | 
 | ||||||
| - [Summary](#summary) | - [Summary](#summary) | ||||||
| - [New features](#features) | - [New features](#features) | ||||||
|  |  | ||||||
|  | @ -75,7 +75,7 @@ arcs. | ||||||
| | `font`    | unicode | Font name or font family for all text.                      | `"Arial"`   | | | `font`    | unicode | Font name or font family for all text.                      | `"Arial"`   | | ||||||
| 
 | 
 | ||||||
| For a list of all available options, see the | For a list of all available options, see the | ||||||
| [`displacy` API documentation](top-level#displacy_options). | [`displacy` API documentation](/api/top-level#displacy_options). | ||||||
| 
 | 
 | ||||||
| > #### Options example | > #### Options example | ||||||
| > | > | ||||||
|  |  | ||||||
|  | @ -12,7 +12,6 @@ | ||||||
|         "@mdx-js/tag": "^0.17.5", |         "@mdx-js/tag": "^0.17.5", | ||||||
|         "@phosphor/widgets": "^1.6.0", |         "@phosphor/widgets": "^1.6.0", | ||||||
|         "@rehooks/online-status": "^1.0.0", |         "@rehooks/online-status": "^1.0.0", | ||||||
|         "@sindresorhus/slugify": "^0.8.0", |  | ||||||
|         "@svgr/webpack": "^4.1.0", |         "@svgr/webpack": "^4.1.0", | ||||||
|         "autoprefixer": "^9.4.7", |         "autoprefixer": "^9.4.7", | ||||||
|         "classnames": "^2.2.6", |         "classnames": "^2.2.6", | ||||||
|  | @ -62,7 +61,8 @@ | ||||||
|         "md-attr-parser": "^1.2.1", |         "md-attr-parser": "^1.2.1", | ||||||
|         "prettier": "^1.16.4", |         "prettier": "^1.16.4", | ||||||
|         "raw-loader": "^1.0.0", |         "raw-loader": "^1.0.0", | ||||||
|         "unist-util-visit": "^1.4.0" |         "unist-util-visit": "^1.4.0", | ||||||
|  |         "@sindresorhus/slugify": "^0.8.0" | ||||||
|     }, |     }, | ||||||
|     "repository": { |     "repository": { | ||||||
|         "type": "git", |         "type": "git", | ||||||
|  |  | ||||||
|  | @ -1,33 +1,38 @@ | ||||||
| import React, { useState } from 'react' | import React, { useState, useEffect } from 'react' | ||||||
| import PropTypes from 'prop-types' | import PropTypes from 'prop-types' | ||||||
| import classNames from 'classnames' | import classNames from 'classnames' | ||||||
| import slugify from '@sindresorhus/slugify' |  | ||||||
| 
 | 
 | ||||||
| import Link from './link' | import Link from './link' | ||||||
| import classes from '../styles/accordion.module.sass' | import classes from '../styles/accordion.module.sass' | ||||||
| 
 | 
 | ||||||
| const Accordion = ({ title, id, expanded, children }) => { | const Accordion = ({ title, id, expanded, children }) => { | ||||||
|     const anchorId = id || slugify(title) |     const [isExpanded, setIsExpanded] = useState(true) | ||||||
|     const [isExpanded, setIsExpanded] = useState(expanded) |  | ||||||
|     const contentClassNames = classNames(classes.content, { |     const contentClassNames = classNames(classes.content, { | ||||||
|         [classes.hidden]: !isExpanded, |         [classes.hidden]: !isExpanded, | ||||||
|     }) |     }) | ||||||
|     const iconClassNames = classNames({ |     const iconClassNames = classNames({ | ||||||
|         [classes.hidden]: isExpanded, |         [classes.hidden]: isExpanded, | ||||||
|     }) |     }) | ||||||
|  |     // Make sure accordion is expanded if JS is disabled
 | ||||||
|  |     useEffect(() => setIsExpanded(expanded), []) | ||||||
|     return ( |     return ( | ||||||
|         <section id={anchorId}> |         <section className="accordion" id={id}> | ||||||
|             <div className={classes.root}> |             <div className={classes.root}> | ||||||
|                 <h3> |                 <h4> | ||||||
|                     <button |                     <button | ||||||
|                         className={classes.button} |                         className={classes.button} | ||||||
|                         aria-expanded={String(isExpanded)} |                         aria-expanded={String(isExpanded)} | ||||||
|                         onClick={() => setIsExpanded(!isExpanded)} |                         onClick={() => setIsExpanded(!isExpanded)} | ||||||
|                     > |                     > | ||||||
|                         <span> |                         <span> | ||||||
|                             {title} |                             <span className="heading-text">{title}</span> | ||||||
|                             {isExpanded && ( |                             {isExpanded && !!id && ( | ||||||
|                                 <Link to={`#${anchorId}`} className={classes.anchor} hidden> |                                 <Link | ||||||
|  |                                     to={`#${id}`} | ||||||
|  |                                     className={classes.anchor} | ||||||
|  |                                     hidden | ||||||
|  |                                     onClick={event => event.stopPropagation()} | ||||||
|  |                                 > | ||||||
|                                     ¶ |                                     ¶ | ||||||
|                                 </Link> |                                 </Link> | ||||||
|                             )} |                             )} | ||||||
|  | @ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => { | ||||||
|                             <rect height={2} width={8} x={1} y={4} /> |                             <rect height={2} width={8} x={1} y={4} /> | ||||||
|                         </svg> |                         </svg> | ||||||
|                     </button> |                     </button> | ||||||
|                 </h3> |                 </h4> | ||||||
|                 <div className={contentClassNames}>{children}</div> |                 <div className={contentClassNames}>{children}</div> | ||||||
|             </div> |             </div> | ||||||
|         </section> |         </section> | ||||||
|  |  | ||||||
|  | @ -5,13 +5,13 @@ import classNames from 'classnames' | ||||||
| import Icon from './icon' | import Icon from './icon' | ||||||
| import classes from '../styles/infobox.module.sass' | import classes from '../styles/infobox.module.sass' | ||||||
| 
 | 
 | ||||||
| const Infobox = ({ title, variant, className, children }) => { | const Infobox = ({ title, id, variant, className, children }) => { | ||||||
|     const infoboxClassNames = classNames(classes.root, className, { |     const infoboxClassNames = classNames(classes.root, className, { | ||||||
|         [classes.warning]: variant === 'warning', |         [classes.warning]: variant === 'warning', | ||||||
|         [classes.danger]: variant === 'danger', |         [classes.danger]: variant === 'danger', | ||||||
|     }) |     }) | ||||||
|     return ( |     return ( | ||||||
|         <aside className={infoboxClassNames}> |         <aside className={infoboxClassNames} id={id}> | ||||||
|             {title && ( |             {title && ( | ||||||
|                 <h4 className={classes.title}> |                 <h4 className={classes.title}> | ||||||
|                     {variant !== 'default' && ( |                     {variant !== 'default' && ( | ||||||
|  | @ -31,6 +31,7 @@ Infobox.defaultProps = { | ||||||
| 
 | 
 | ||||||
| Infobox.propTypes = { | Infobox.propTypes = { | ||||||
|     title: PropTypes.string, |     title: PropTypes.string, | ||||||
|  |     id: PropTypes.string, | ||||||
|     variant: PropTypes.oneOf(['default', 'warning', 'danger']), |     variant: PropTypes.oneOf(['default', 'warning', 'danger']), | ||||||
|     className: PropTypes.string, |     className: PropTypes.string, | ||||||
|     children: PropTypes.node.isRequired, |     children: PropTypes.node.isRequired, | ||||||
|  |  | ||||||
|  | @ -232,6 +232,7 @@ Juniper.defaultProps = { | ||||||
|     theme: 'default', |     theme: 'default', | ||||||
|     isolateCells: true, |     isolateCells: true, | ||||||
|     useBinder: true, |     useBinder: true, | ||||||
|  |     storageKey: 'juniper', | ||||||
|     useStorage: true, |     useStorage: true, | ||||||
|     storageExpire: 60, |     storageExpire: 60, | ||||||
|     debug: false, |     debug: false, | ||||||
|  |  | ||||||
|  | @ -105,7 +105,7 @@ const Help = ({ children }) => ( | ||||||
| 
 | 
 | ||||||
| const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => { | const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => { | ||||||
|     const [initialized, setInitialized] = useState(false) |     const [initialized, setInitialized] = useState(false) | ||||||
|     const [isError, setIsError] = useState(false) |     const [isError, setIsError] = useState(true) | ||||||
|     const [meta, setMeta] = useState({}) |     const [meta, setMeta] = useState({}) | ||||||
|     const { type, genre, size } = getModelComponents(name) |     const { type, genre, size } = getModelComponents(name) | ||||||
|     const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility]) |     const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility]) | ||||||
|  | @ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl | ||||||
|     useEffect(() => { |     useEffect(() => { | ||||||
|         window.dispatchEvent(new Event('resize')) // scroll position for progress
 |         window.dispatchEvent(new Event('resize')) // scroll position for progress
 | ||||||
|         if (!initialized && version) { |         if (!initialized && version) { | ||||||
|  |             setIsError(false) | ||||||
|             fetch(`${baseUrl}/meta/${name}-${version}.json`) |             fetch(`${baseUrl}/meta/${name}-${version}.json`) | ||||||
|                 .then(res => res.json()) |                 .then(res => res.json()) | ||||||
|                 .then(json => { |                 .then(json => { | ||||||
|  | @ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl | ||||||
|     const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link> |     const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link> | ||||||
|     const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null |     const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null | ||||||
|     const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license |     const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license | ||||||
|     const hasInteractiveCode = size === 'sm' && hasExamples |     const hasInteractiveCode = size === 'sm' && hasExamples && !isError | ||||||
| 
 | 
 | ||||||
|     const rows = [ |     const rows = [ | ||||||
|         { label: 'Language', tag: langId, content: langName }, |         { label: 'Language', tag: langId, content: langName }, | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user