mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Merge branch 'develop' into spacy.io
This commit is contained in:
		
						commit
						ec29e6f4c8
					
				|  | @ -78,7 +78,7 @@ assigned by spaCy's [models](/models). The individual mapping is specific to the | |||
| training corpus and can be defined in the respective language data's | ||||
| [`tag_map.py`](/usage/adding-languages#tag-map). | ||||
| 
 | ||||
| <Accordion title="Universal Part-of-speech Tags"> | ||||
| <Accordion title="Universal Part-of-speech Tags" id="pos-universal"> | ||||
| 
 | ||||
| spaCy also maps all language-specific part-of-speech tags to a small, fixed set | ||||
| of word type tags following the | ||||
|  | @ -269,7 +269,7 @@ This section lists the syntactic dependency labels assigned by spaCy's | |||
| [models](/models). The individual labels are language-specific and depend on the | ||||
| training corpus. | ||||
| 
 | ||||
| <Accordion title="Universal Dependency Labels"> | ||||
| <Accordion title="Universal Dependency Labels" id="dependency-parsing-universal"> | ||||
| 
 | ||||
| The [Universal Dependencies scheme](http://universaldependencies.org/u/dep/) is | ||||
| used in all languages trained on Universal Dependency Corpora. | ||||
|  |  | |||
|  | @ -33,9 +33,22 @@ list containing the component names: | |||
| 
 | ||||
| import Accordion from 'components/accordion.js' | ||||
| 
 | ||||
| <Accordion title="Does the order of pipeline components matter?"> | ||||
| <Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order"> | ||||
| 
 | ||||
| No | ||||
| In spaCy v2.x, the statistical components like the tagger or parser are | ||||
| independent and don't share any data between themselves. For example, the named | ||||
| entity recognizer doesn't use any features set by the tagger and parser, and so | ||||
| on. This means that you can swap them, or remove single components from the | ||||
| pipeline without affecting the others. | ||||
| 
 | ||||
| However, custom components may depend on annotations set by other components. | ||||
| For example, a custom lemmatizer may need the part-of-speech tags assigned, so | ||||
| it'll only work if it's added after the tagger. The parser will respect | ||||
| pre-defined sentence boundaries, so if a previous component in the pipeline sets | ||||
| them, its dependency predictions may be different. Similarly, it matters if you | ||||
| add the [`EntityRuler`](/api/entityruler) before or after the statistical entity | ||||
| recognizer: if it's added before, the entity recognizer will take the existing | ||||
| entities into account when making predictions. | ||||
| 
 | ||||
| </Accordion> | ||||
| 
 | ||||
|  |  | |||
|  | @ -39,7 +39,7 @@ and morphological analysis. | |||
| 
 | ||||
| </div> | ||||
| 
 | ||||
| <Infobox title="Table of Contents"> | ||||
| <Infobox title="Table of Contents" id="toc"> | ||||
| 
 | ||||
| - [Language data 101](#101) | ||||
| - [The Language subclass](#language-subclass) | ||||
|  |  | |||
|  | @ -298,9 +298,9 @@ different languages, see the | |||
| The best way to understand spaCy's dependency parser is interactively. To make | ||||
| this easier, spaCy v2.0+ comes with a visualization module. You can pass a `Doc` | ||||
| or a list of `Doc` objects to displaCy and run | ||||
| [`displacy.serve`](top-level#displacy.serve) to run the web server, or | ||||
| [`displacy.render`](top-level#displacy.render) to generate the raw markup. If | ||||
| you want to know how to write rules that hook into some type of syntactic | ||||
| [`displacy.serve`](/api/top-level#displacy.serve) to run the web server, or | ||||
| [`displacy.render`](/api/top-level#displacy.render) to generate the raw markup. | ||||
| If you want to know how to write rules that hook into some type of syntactic | ||||
| construction, just plug the sentence into the visualizer and see how spaCy | ||||
| annotates it. | ||||
| 
 | ||||
|  | @ -621,7 +621,7 @@ For more details on the language-specific data, see the usage guide on | |||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| <Accordion title="Should I change the language data or add custom tokenizer rules?"> | ||||
| <Accordion title="Should I change the language data or add custom tokenizer rules?" id="lang-data-vs-tokenizer"> | ||||
| 
 | ||||
| Tokenization rules that are specific to one language, but can be **generalized | ||||
| across that language** should ideally live in the language data in | ||||
|  |  | |||
|  | @ -41,7 +41,7 @@ contribute to model development. | |||
| > If a model is available for a language, you can download it using the | ||||
| > [`spacy download`](/api/cli#download) command. In order to use languages that | ||||
| > don't yet come with a model, you have to import them directly, or use | ||||
| > [`spacy.blank`](api/top-level#spacy.blank): | ||||
| > [`spacy.blank`](/api/top-level#spacy.blank): | ||||
| > | ||||
| > ```python | ||||
| > from spacy.lang.fi import Finnish | ||||
|  |  | |||
|  | @ -46,7 +46,8 @@ components. spaCy then does the following: | |||
| 3. Add each pipeline component to the pipeline in order, using | ||||
|    [`add_pipe`](/api/language#add_pipe). | ||||
| 4. Make the **model data** available to the `Language` class by calling | ||||
|    [`from_disk`](language#from_disk) with the path to the model data directory. | ||||
|    [`from_disk`](/api/language#from_disk) with the path to the model data | ||||
|    directory. | ||||
| 
 | ||||
| So when you call this... | ||||
| 
 | ||||
|  | @ -426,7 +427,7 @@ spaCy, and implement your own models trained with other machine learning | |||
| libraries. It also lets you take advantage of spaCy's data structures and the | ||||
| `Doc` object as the "single source of truth". | ||||
| 
 | ||||
| <Accordion title="Why ._ and not just a top-level attribute?"> | ||||
| <Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore"> | ||||
| 
 | ||||
| Writing to a `._` attribute instead of to the `Doc` directly keeps a clearer | ||||
| separation and makes it easier to ensure backwards compatibility. For example, | ||||
|  | @ -437,7 +438,7 @@ immediately know what's built-in and what's custom – for example, | |||
| 
 | ||||
| </Accordion> | ||||
| 
 | ||||
| <Accordion title="How is the ._ implemented?"> | ||||
| <Accordion title="How is the ._ implemented?" id="dot-underscore-implementation"> | ||||
| 
 | ||||
| Extension definitions – the defaults, methods, getters and setters you pass in | ||||
| to `set_extension` – are stored in class attributes on the `Underscore` class. | ||||
|  |  | |||
|  | @ -15,7 +15,7 @@ their relationships. This means you can easily access and analyze the | |||
| surrounding tokens, merge spans into single tokens or add entries to the named | ||||
| entities in `doc.ents`. | ||||
| 
 | ||||
| <Accordion title="Should I use rules or train a model?"> | ||||
| <Accordion title="Should I use rules or train a model?" id="rules-vs-model"> | ||||
| 
 | ||||
| For complex tasks, it's usually better to train a statistical entity recognition | ||||
| model. However, statistical models require training data, so for many | ||||
|  | @ -41,7 +41,7 @@ on [rule-based entity recognition](#entityruler). | |||
| 
 | ||||
| </Accordion> | ||||
| 
 | ||||
| <Accordion title="When should I use the token matcher vs. the phrase matcher?"> | ||||
| <Accordion title="When should I use the token matcher vs. the phrase matcher?" id="matcher-vs-phrase-matcher"> | ||||
| 
 | ||||
| The `PhraseMatcher` is useful if you already have a large terminology list or | ||||
| gazetteer consisting of single or multi-token phrases that you want to find | ||||
|  |  | |||
|  | @ -50,7 +50,7 @@ systems, or to pre-process text for **deep learning**. | |||
| 
 | ||||
| </div> | ||||
| 
 | ||||
| <Infobox title="Table of contents"> | ||||
| <Infobox title="Table of contents" id="toc"> | ||||
| 
 | ||||
| - [Features](#features) | ||||
| - [Linguistic annotations](#annotations) | ||||
|  |  | |||
|  | @ -39,7 +39,7 @@ also add your own custom attributes, properties and methods to the `Doc`, | |||
| 
 | ||||
| </div> | ||||
| 
 | ||||
| <Infobox title="Table of Contents"> | ||||
| <Infobox title="Table of Contents" id="toc"> | ||||
| 
 | ||||
| - [Summary](#summary) | ||||
| - [New features](#features) | ||||
|  |  | |||
|  | @ -75,7 +75,7 @@ arcs. | |||
| | `font`    | unicode | Font name or font family for all text.                      | `"Arial"`   | | ||||
| 
 | ||||
| For a list of all available options, see the | ||||
| [`displacy` API documentation](top-level#displacy_options). | ||||
| [`displacy` API documentation](/api/top-level#displacy_options). | ||||
| 
 | ||||
| > #### Options example | ||||
| > | ||||
|  |  | |||
|  | @ -12,7 +12,6 @@ | |||
|         "@mdx-js/tag": "^0.17.5", | ||||
|         "@phosphor/widgets": "^1.6.0", | ||||
|         "@rehooks/online-status": "^1.0.0", | ||||
|         "@sindresorhus/slugify": "^0.8.0", | ||||
|         "@svgr/webpack": "^4.1.0", | ||||
|         "autoprefixer": "^9.4.7", | ||||
|         "classnames": "^2.2.6", | ||||
|  | @ -62,7 +61,8 @@ | |||
|         "md-attr-parser": "^1.2.1", | ||||
|         "prettier": "^1.16.4", | ||||
|         "raw-loader": "^1.0.0", | ||||
|         "unist-util-visit": "^1.4.0" | ||||
|         "unist-util-visit": "^1.4.0", | ||||
|         "@sindresorhus/slugify": "^0.8.0" | ||||
|     }, | ||||
|     "repository": { | ||||
|         "type": "git", | ||||
|  |  | |||
|  | @ -1,33 +1,38 @@ | |||
| import React, { useState } from 'react' | ||||
| import React, { useState, useEffect } from 'react' | ||||
| import PropTypes from 'prop-types' | ||||
| import classNames from 'classnames' | ||||
| import slugify from '@sindresorhus/slugify' | ||||
| 
 | ||||
| import Link from './link' | ||||
| import classes from '../styles/accordion.module.sass' | ||||
| 
 | ||||
| const Accordion = ({ title, id, expanded, children }) => { | ||||
|     const anchorId = id || slugify(title) | ||||
|     const [isExpanded, setIsExpanded] = useState(expanded) | ||||
|     const [isExpanded, setIsExpanded] = useState(true) | ||||
|     const contentClassNames = classNames(classes.content, { | ||||
|         [classes.hidden]: !isExpanded, | ||||
|     }) | ||||
|     const iconClassNames = classNames({ | ||||
|         [classes.hidden]: isExpanded, | ||||
|     }) | ||||
|     // Make sure accordion is expanded if JS is disabled
 | ||||
|     useEffect(() => setIsExpanded(expanded), []) | ||||
|     return ( | ||||
|         <section id={anchorId}> | ||||
|         <section className="accordion" id={id}> | ||||
|             <div className={classes.root}> | ||||
|                 <h3> | ||||
|                 <h4> | ||||
|                     <button | ||||
|                         className={classes.button} | ||||
|                         aria-expanded={String(isExpanded)} | ||||
|                         onClick={() => setIsExpanded(!isExpanded)} | ||||
|                     > | ||||
|                         <span> | ||||
|                             {title} | ||||
|                             {isExpanded && ( | ||||
|                                 <Link to={`#${anchorId}`} className={classes.anchor} hidden> | ||||
|                             <span className="heading-text">{title}</span> | ||||
|                             {isExpanded && !!id && ( | ||||
|                                 <Link | ||||
|                                     to={`#${id}`} | ||||
|                                     className={classes.anchor} | ||||
|                                     hidden | ||||
|                                     onClick={event => event.stopPropagation()} | ||||
|                                 > | ||||
|                                     ¶ | ||||
|                                 </Link> | ||||
|                             )} | ||||
|  | @ -44,7 +49,7 @@ const Accordion = ({ title, id, expanded, children }) => { | |||
|                             <rect height={2} width={8} x={1} y={4} /> | ||||
|                         </svg> | ||||
|                     </button> | ||||
|                 </h3> | ||||
|                 </h4> | ||||
|                 <div className={contentClassNames}>{children}</div> | ||||
|             </div> | ||||
|         </section> | ||||
|  |  | |||
|  | @ -5,13 +5,13 @@ import classNames from 'classnames' | |||
| import Icon from './icon' | ||||
| import classes from '../styles/infobox.module.sass' | ||||
| 
 | ||||
| const Infobox = ({ title, variant, className, children }) => { | ||||
| const Infobox = ({ title, id, variant, className, children }) => { | ||||
|     const infoboxClassNames = classNames(classes.root, className, { | ||||
|         [classes.warning]: variant === 'warning', | ||||
|         [classes.danger]: variant === 'danger', | ||||
|     }) | ||||
|     return ( | ||||
|         <aside className={infoboxClassNames}> | ||||
|         <aside className={infoboxClassNames} id={id}> | ||||
|             {title && ( | ||||
|                 <h4 className={classes.title}> | ||||
|                     {variant !== 'default' && ( | ||||
|  | @ -31,6 +31,7 @@ Infobox.defaultProps = { | |||
| 
 | ||||
| Infobox.propTypes = { | ||||
|     title: PropTypes.string, | ||||
|     id: PropTypes.string, | ||||
|     variant: PropTypes.oneOf(['default', 'warning', 'danger']), | ||||
|     className: PropTypes.string, | ||||
|     children: PropTypes.node.isRequired, | ||||
|  |  | |||
|  | @ -232,6 +232,7 @@ Juniper.defaultProps = { | |||
|     theme: 'default', | ||||
|     isolateCells: true, | ||||
|     useBinder: true, | ||||
|     storageKey: 'juniper', | ||||
|     useStorage: true, | ||||
|     storageExpire: 60, | ||||
|     debug: false, | ||||
|  |  | |||
|  | @ -105,7 +105,7 @@ const Help = ({ children }) => ( | |||
| 
 | ||||
| const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExamples, licenses }) => { | ||||
|     const [initialized, setInitialized] = useState(false) | ||||
|     const [isError, setIsError] = useState(false) | ||||
|     const [isError, setIsError] = useState(true) | ||||
|     const [meta, setMeta] = useState({}) | ||||
|     const { type, genre, size } = getModelComponents(name) | ||||
|     const version = useMemo(() => getLatestVersion(name, compatibility), [name, compatibility]) | ||||
|  | @ -113,6 +113,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl | |||
|     useEffect(() => { | ||||
|         window.dispatchEvent(new Event('resize')) // scroll position for progress
 | ||||
|         if (!initialized && version) { | ||||
|             setIsError(false) | ||||
|             fetch(`${baseUrl}/meta/${name}-${version}.json`) | ||||
|                 .then(res => res.json()) | ||||
|                 .then(json => { | ||||
|  | @ -134,7 +135,7 @@ const Model = ({ name, langId, langName, baseUrl, repo, compatibility, hasExampl | |||
|     const author = !meta.url ? meta.author : <Link to={meta.url}>{meta.author}</Link> | ||||
|     const licenseUrl = licenses[meta.license] ? licenses[meta.license].url : null | ||||
|     const license = licenseUrl ? <Link to={licenseUrl}>{meta.license}</Link> : meta.license | ||||
|     const hasInteractiveCode = size === 'sm' && hasExamples | ||||
|     const hasInteractiveCode = size === 'sm' && hasExamples && !isError | ||||
| 
 | ||||
|     const rows = [ | ||||
|         { label: 'Language', tag: langId, content: langName }, | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user