💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
This commit is contained in:
Ines Montani 2019-02-17 19:31:19 +01:00 committed by GitHub
parent 043e8186f3
commit e597110d31
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
413 changed files with 49007 additions and 25375 deletions

8
.gitignore vendored
View File

@ -5,9 +5,15 @@ corpora/
keys/ keys/
# Website # Website
website/.cache/
website/public/
website/node_modules
website/.npm
website/logs
*.log
npm-debug.log*
website/www/ website/www/
website/_deploy.sh website/_deploy.sh
website/.gitignore
# Cython / C extensions # Cython / C extensions
cythonize.json cythonize.json

38
website/.prettierrc Normal file
View File

@ -0,0 +1,38 @@
{
"semi": false,
"singleQuote": true,
"trailingComma": "es5",
"tabWidth": 4,
"printWidth": 100,
"overrides": [
{
"files": "*.sass",
"options": {
"printWidth": 999
}
},
{
"files": "*.mdx",
"options": {
"tabWidth": 2,
"printWidth": 80,
"proseWrap": "always"
}
},
{
"files": "*.md",
"options": {
"tabWidth": 2,
"printWidth": 80,
"proseWrap": "always",
"htmlWhitespaceSensitivity": "strict"
}
},
{
"files": "*.html",
"options": {
"htmlWhitespaceSensitivity": "strict"
}
}
]
}

View File

@ -1,12 +0,0 @@
//- 💫 404 ERROR
include _includes/_mixins
+landing-header
h1.c-landing__title.u-heading-0
| Ooops, this page#[br]
| does not exist!
h2.c-landing__title.u-heading-3.u-padding-small
+button(false, true, "secondary-light")(href="javascript:history.go(-1)")
| Click here to go back

View File

@ -1,143 +1,559 @@
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a> <Comment>
# spacy.io website and docs # spacy.io website and docs
The [spacy.io](https://spacy.io) website is implemented in [Jade (aka Pug)](https://www.jade-lang.org), and is built or served by [Harp](https://harpjs.com). Jade is an extensible templating language with a readable syntax, that compiles to HTML. _This page contains the documentation and styleguide for the spaCy website. Its
The website source makes extensive use of Jade mixins, so that the design system is abstracted away from the content you're rendered version is available at https://spacy.io/styleguide._
writing. You can read more about our approach in our blog post, ["Rebuilding a Website with Modular Markup"](https://explosion.ai/blog/modular-markup).
---
## Viewing the site locally </Comment>
The [spacy.io](https://spacy.io) website is implemented using
[Gatsby](https://www.gatsbyjs.org) with
[Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This
allows authoring content in **straightforward Markdown** without the usual
limitations. Standard elements can be overwritten with powerful
[React](http://reactjs.org/) components and wherever Markdown syntax isn't
enough, JSX components can be used.
> #### Contributing to the site
>
> The docs can always use another example or more detail, and they should always
> be up to date and not misleading. We always appreciate a
> [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the
> correct file to edit, simply click on the "Suggest edits" button at the bottom
> of a page.
>
> For more details on editing the site locally, see the installation
> instructions and markdown reference below.
## Logo {#logo source="website/src/images/logo.svg"}
import { Logos } from 'widgets/styleguide'
If you would like to use the spaCy logo on your site, please get in touch and
ask us first. However, if you want to show support and tell others that your
project is using spaCy, you can grab one of our
[spaCy badges](/usage/spacy-101#faq-project-with-spacy).
<Logos />
## Colors {#colors}
import { Colors, Patterns } from 'widgets/styleguide'
<Colors />
### Patterns
<Patterns />
## Typography {#typography}
import { H1, H2, H3, H4, H5, Label, InlineList, Comment } from
'components/typography'
> #### Markdown
>
> ```markdown_
> ## Headline 2
> ## Headline 2 {#some_id}
> ## Headline 2 {#some_id tag="method"}
> ```
>
> #### JSX
>
> ```jsx
> <H2>Headline 2</H2>
> <H2 id="some_id">Headline 2</H2>
> <H2 id="some_id" tag="method">Headline 2</H2>
> ```
Headlines are set in
[HK Grotesk](http://cargocollective.com/hanken/HK-Grotesk-Open-Source-Font) by
Hanken Design. All other body text and code uses the best-matching default
system font to provide a "native" reading experience.
<Infobox title="Important note" variant="warning">
Level 2 headings are automatically wrapped in `<section>` elements at compile
time, using a custom
[Markdown transformer](https://github.com/explosion/spaCy/tree/master/website/plugins/remark-wrap-section.js).
This makes it easier to highlight the section that's currently in the viewpoint
in the sidebar menu.
</Infobox>
<div>
<H1>Headline 1</H1>
<H2>Headline 2</H2>
<H3>Headline 3</H3>
<H4>Headline 4</H4>
<H5>Headline 5</H5>
<Label>Label</Label>
</div>
---
The following optional attributes can be set on the headline to modify it. For
example, to add a tag for the documented type or mark features that have been
introduced in a specific version or require statistical models to be loaded.
Tags are also available as standalone `<Tag />` components.
| Argument | Example | Result |
| -------- | -------------------------- | ----------------------------------------- |
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
| `new` | `{new="2"}` | <Tag variant="new">2</Tag> |
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
| `hidden` | `{hidden="true"}` | |
## Elements {#elements}
### Links {#links}
> #### Markdown
>
> ```markdown
> [I am a link](https://spacy.io)
> ```
>
> #### JSX
>
> ```jsx
> <Link to="https://spacy.io">I am a link</Link>
> ```
Special link styles are used depending on the link URL.
- [I am a regular external link](https://explosion.ai)
- [I am a link to the documentation](/api/doc)
- [I am a link to GitHub](https://github.com/explosion/spaCy)
### Abbreviations {#abbr}
import { Abbr } from 'components/typography'
> #### JSX
>
> ```jsx
> <Abbr title="Explanation">Abbreviation</Abbr>
> ```
Some text with <Abbr title="Explanation here">an abbreviation</Abbr>. On small
screens, I collapse and the explanation text is displayed next to the
abbreviation.
### Tags {#tags}
import Tag from 'components/tag'
> ```jsx
> <Tag>method</Tag>
> <Tag variant="new">2.1</Tag>
> <Tag variant="model">tagger, parser</Tag>
> ```
Tags can be used together with headlines, or next to properties across the
documentation, and combined with tooltips to provide additional information. An
optional `variant` argument can be used for special tags. `variant="new"` makes
the tag take a version number to mark new features. Using the component,
visibility of this tag can later be toggled once the feature isn't considered
new anymore. Setting `variant="model"` takes a description of model capabilities
and can be used to mark features that require a respective model to be
installed.
<InlineList>
<Tag>method</Tag> <Tag variant="new">2</Tag> <Tag variant="model">tagger,
parser</Tag>
</InlineList>
### Buttons {#buttons}
import Button from 'components/button'
> ```jsx
> <Button to="#" variant="primary">Primary small</Button>
> <Button to="#" variant="secondary">Secondary small</Button>
> ```
Link buttons come in two variants, `primary` and `secondary` and two sizes, with
an optional `large` size modifier. Since they're mostly used as enhanced links,
the buttons are implemented as styled links instead of native button elements.
<InlineList><Button to="#" variant="primary">Primary small</Button>
<Button to="#" variant="secondary">Secondary small</Button></InlineList>
<InlineList><Button to="#" variant="primary" large>Primary large</Button>
<Button to="#" variant="secondary" large>Secondary large</Button></InlineList>
## Components
### Table
> #### Markdown
>
> ```markdown_
> | Header 1 | Header 2 |
> | --- | --- |
> | Column 1 | Column 2 |
> ```
>
> #### JSX
>
> ```markup
> <Table>
> <Tr><Th>Header 1</Th><Th>Header 2</Th></Tr></thead>
> <Tr><Td>Column 1</Td><Td>Column 2</Td></Tr>
> </Table>
> ```
Tables are used to present data and API documentation. Certain keywords can be
used to mark a footer row with a distinct style, for example to visualise the
return values of a documented function.
| Header 1 | Header 2 | Header 3 | Header 4 |
| ----------- | -------- | :------: | -------: |
| Column 1 | Column 2 | Column 3 | Column 4 |
| Column 1 | Column 2 | Column 3 | Column 4 |
| Column 1 | Column 2 | Column 3 | Column 4 |
| Column 1 | Column 2 | Column 3 | Column 4 |
| **RETURNS** | Column 2 | Column 3 | Column 4 |
### List
> #### Markdown
>
> ```markdown_
> 1. One
> 2. Two
> ```
>
> #### JSX
>
> ```markup
> <Ol>
> <Li>One</Li>
> <Li>Two</Li>
> </Ol>
> ```
Lists are available as bulleted and numbered. Markdown lists are transformed
automatically.
- I am a bulleted list
- I have nice bullets
- Lorem ipsum dolor
- consectetur adipiscing elit
1. I am an ordered list
2. I have nice numbers
3. Lorem ipsum dolor
4. consectetur adipiscing elit
### Aside
> #### Markdown
>
> ```markdown_
> > #### Aside title
> > This is aside text.
> ```
>
> #### JSX
>
> ```jsx
> <Aside title="Aside title">This is aside text.</Aside>
> ```
Asides can be used to display additional notes and content in the right-hand
column. Asides can contain text, code and other elements if needed. Visually,
asides are moved to the side on the X-axis, and displayed at the same level they
were inserted. On small screens, they collapse and are rendered in their
original position, in between the text.
To make them easier to use in Markdown, paragraphs formatted as blockquotes will
turn into asides by default. Level 4 headlines (with a leading `####`) will
become aside titles.
### Code Block
> #### Markdown
>
> ````markdown_
> ```python
> ### This is a title
> import spacy
> ```
> ````
>
> #### JSX
>
> ```jsx
> <CodeBlock title="This is a title" lang="python">
> import spacy
> </CodeBlock>
> ```
Code blocks use the [Prism](http://prismjs.com/) syntax highlighter with a
custom theme. The language can be set individually on each block, and defaults
to raw text with no highlighting. An optional label can be added as the first
line with the prefix `####` (Python-like) and `///` (JavaScript-like). the
indented block as plain text and preserve whitespace.
```python
### Using spaCy
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"This is a sentence.")
for token in doc:
print(token.text, token.pos_)
```
Code blocks and also specify an optional range of line numbers to highlight by
adding `{highlight="..."}` to the headline. Acceptable ranges are spans like
`5-7`, but also `5-7,10` or `5-7,10,13-14`.
> #### Markdown
>
> ````markdown_
> ```python
> ### This is a title {highlight="1-2"}
> import spacy
> nlp = spacy.load("en_core_web_sm")
> ```
> ````
```python
### Using the matcher {highlight="5-7"}
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
pattern = [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
matcher.add('HelloWorld', None, pattern)
doc = nlp(u'Hello, world! Hello world!')
matches = matcher(doc)
```
Adding `{executable="true"}` to the title turns the code into an executable
block, powered by [Binder](https://mybinder.org) and
[Juniper](https://github.com/ines/juniper). If JavaScript is disabled, the
interactive widget defaults to a regular code block.
> #### Markdown
>
> ````markdown_
> ```python
> ### {executable="true"}
> import spacy
> nlp = spacy.load("en_core_web_sm")
> ```
> ````
```python
### {executable="true"}
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"This is a sentence.")
for token in doc:
print(token.text, token.pos_)
```
If a code block only contains a URL to a GitHub file, the raw file contents are
embedded automatically and syntax highlighting is applied. The link to the
original file is shown at the top of the widget.
> #### Markdown
>
> ````markdown_
> ```python
> https://github.com/...
> ```
> ````
>
> #### JSX
>
> ```jsx
> <GitHubCode url="https://github.com/..." lang="python" />
> ```
```python
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
```
### Infobox
import Infobox from 'components/infobox'
> #### JSX
>
> ```jsx
> <Infobox title="Information">Regular infobox</Infobox>
> <Infobox title="Important note" variant="warning">This is a warning.</Infobox>
> <Infobox title="Be careful!" variant="danger">This is dangerous.</Infobox>
> ```
Infoboxes can be used to add notes, updates, warnings or additional information
to a page or section. Semantically, they're implemented and interpreted as an
`aside` element. Infoboxes can take an optional `title` argument, as well as an
optional `variant` (either `"warning"` or `"danger"`).
<Infobox title="This is an infobox">
If needed, an infobox can contain regular text, `inline code`, lists and other
blocks.
</Infobox>
<Infobox title="This is a warning" variant="warning">
If needed, an infobox can contain regular text, `inline code`, lists and other
blocks.
</Infobox>
<Infobox title="This is dangerous" variant="danger">
If needed, an infobox can contain regular text, `inline code`, lists and other
blocks.
</Infobox>
### Accordion
import Accordion from 'components/accordion'
> #### JSX
>
> ```jsx
> <Accordion title="This is an accordion">
> Accordion content goes here.
> </Accordion>
> ```
Accordions are collapsible sections that are mostly used for lengthy tables,
like the tag and label annotation schemes for different languages. They all need
to be presented but chances are the user doesn't actually care about _all_ of
them, especially not at the same time. So it's fairly reasonable to hide them
begin a click. This particular implementation was inspired by the amazing
[Inclusive Components blog](https://inclusive-components.design/collapsible-sections/).
<Accordion title="This is an accordion">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque enim ante,
pretium a orci eget, varius dignissim augue. Nam eu dictum mauris, id tincidunt
nisi. Integer commodo pellentesque tincidunt. Nam at turpis finibus tortor
gravida sodales tincidunt sit amet est. Nullam euismod arcu in tortor auctor,
sit amet dignissim justo congue.
</Accordion>
## Setup and installation {#setup}
Before running the setup, make sure your versions of
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
```bash ```bash
sudo npm install --global harp # Clone the repository
git clone https://github.com/explosion/spaCy git clone https://github.com/explosion/spaCy
cd spaCy/website cd spaCy/website
harp server
# Install Gatsby's command-line tool
npm install --global gatsby-cli
# Install the dependencies
npm install
# Start the development server
npm run dev
``` ```
This will serve the site on [http://localhost:9000](http://localhost:9000). If you are planning on making edits to the site, you should also set up the
[Prettier](https://prettier.io/) code formatter. It takes care of formatting
Markdown and other files automatically.
[See here](https://prettier.io/docs/en/editors.html) for the available
extensions for your code editor. The
[`.prettierrc`](https://github.com/explosion/spaCy/tree/master/website/.prettierrc)
file in the root defines the settings used in this codebase.
## Markdown reference {#markdown}
## Making changes to the site All page content and page meta lives in the `.md` files in the `/docs`
directory. The frontmatter block at the top of each file defines the page title
and other settings like the sidebar menu.
The docs can always use another example or more detail, and they should always be up to date and not misleading. If you see something, say something we always appreciate a [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the correct file to edit, simply click on the "Suggest edits" button at the bottom of a page. ````markdown
---
title: Page title
---
### File structure ## Headline starting a section {#some_id}
While all page content lives in the `.jade` files, article meta (page titles, sidebars etc.) is stored as JSON. Each folder contains a `_data.json` with all required meta for its files. This is a regular paragraph with a [link](https://spacy.io) and **bold text**.
### Markup language and conventions > #### This is an aside title
>
> This is aside text.
Jade/Pug is a whitespace-sensitive markup language that compiles to HTML. Indentation is used to nest elements, and for template logic, like `if`/`else` or `for`, mainly used to iterate over objects and arrays in the meta data. It also allows inline JavaScript expressions. ### Subheadline
For an overview of Harp and Jade, see [this blog post](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade). For more info on the Jade/Pug syntax, check out their [documentation](https://pugjs.org). | Header 1 | Header 2 |
| -------- | -------- |
| Column 1 | Column 2 |
In the [spacy.io](https://spacy.io) source, we use 4 spaces to indent and hard-wrap at 80 characters. ```python
### Code block title {highlight="2-3"}
```pug
p This is a very short paragraph. It stays inline.
p
| This is a much longer paragraph. It's hard-wrapped at 80 characters to
| make it easier to read on GitHub and in editors that do not have soft
| wrapping enabled. To prevent Jade from interpreting each line as a new
| element, it's prefixed with a pipe and two spaces. This ensures that no
| spaces are dropped for example, if your editor strips out trailing
| whitespace by default. Inline links are added using the inline syntax,
| like this: #[+a("https://google.com") Google].
```
Note that for external links, `+a("...")` is used instead of `a(href="...")` it's a mixin that takes care of adding all required attributes. If possible, always use a mixin instead of regular HTML elements. The only plain HTML elements we use are:
| Element | Description |
| --- | --- |
| `p` | paragraphs |
| `code` | inline `code` |
| `em` | *italicized* text |
| `strong` | **bold** text |
### Mixins
Each file includes a collection of [custom mixins](_includes/_mixins.jade) that make it easier to add content components no HTML or class names required.
For example:
```pug
//- Bulleted list
+list
+item This is a list item.
+item This is another list item.
//- Table with header
+table([ "Header one", "Header two" ])
+row
+cell Table cell
+cell Another one
+row
+cell And one more.
+cell And the last one.
//- Headlines with optional permalinks
+h(2, "link-id") Headline 2 with link to #link-id
```
Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:
```pug
+code("This is a label").
import spacy import spacy
en_nlp = spacy.load('en') nlp = spacy.load("en_core_web_sm")
en_doc = en_nlp(u'Hello, world. Here are two sentences.') doc = nlp("Hello world")
``` ```
You can find the documentation for the available mixins in [`_includes/_mixins.jade`](_includes/_mixins.jade). <Infobox title="Important note" variant="warning">
### Helpers for linking to content This is content in the infobox.
Aside from the `+a()` mixin, there are three other helpers to make linking to content more convenient. </Infobox>
````
#### Linking to GitHub In addition to the native markdown elements, you can use the components
[`<Infobox />`][infobox], [`<Accordion />`][accordion], [`<Abbr />`][abbr] and
[`<Tag />`][tag] via their JSX syntax.
Since GitHub links can be long and tricky, you can use the `gh()` function to generate them automatically for spaCy and all repositories owned by [explosion](https://github.com/explosion): [infobox]: https://spacy.io/styleguide#infobox
[accordion]: https://spacy.io/styleguide#accordion
[abbr]: https://spacy.io/styleguide#abbr
[tag]: https://spacy.io/styleguide#tag
```javascript ## Project structure {#structure}
// Syntax: gh(repo, [file], [branch])
gh("spaCy", "spacy/matcher.pyx")
// https://github.com/explosion/spaCy/blob/master/spacy/matcher.pyx
```yaml
### Directory structure
├── docs # the actual markdown content
├── meta # JSON-formatted site metadata
| ├── languages.json # supported languages and statistical models
| ├── logos.json # logos and links for landing page
| ├── sidebars.json # sidebar navigations for different sections
| ├── site.json # general site metadata
| └── universe.json # data for the spaCy universe section
├── public # compiled site
├── src # source
| ├── components # React components
| ├── fonts # webfonts
| ├── images # images used in the layout
| ├── plugins # custom plugins to transform Markdown
| ├── styles # CSS modules and global styles
| ├── templates # page layouts
| | ├── docs.js # layout template for documentation pages
| | ├── index.js # global layout template
| | ├── models.js # layout template for model pages
| | └── universe.js # layout templates for universe
| └── widgets # non-reusable components with content, e.g. changelog
├── gatsby-browser.js # browser-specific hooks for Gatsby
├── gatsby-config.js # Gatsby configuration
├── gatsby-node.js # Node-specific hooks for Gatsby
└── package.json # package settings and dependencies
``` ```
#### Linking to source
`+src()` generates a link with a little source icon to indicate it's linking to a code source. Ideally, it's used in combination with `gh()`:
```pug
+src(gh("spaCy", "spacy/matcher.pyx")) matcher.pxy
```
#### Linking to API reference
`+api()` generates a link to a page in the API docs, with an added icon. It should only be used across the workflows in the usage section, and only on the first mention of the respective class.
It takes the slug of an API page as the argument. You can also use anchors to link to specific sections they're usually the method or property names.
```pug
+api("tokenizer") #[code Tokenizer]
+api("doc#similarity") #[code Doc.similarity()]
```
### Most common causes of compile errors
| Problem | Fix |
| --- | --- |
| JSON formatting errors | make sure last elements of objects don't end with commas and/or use a JSON linter |
| unescaped characters like `<` or `>` and sometimes `'` in inline elements | replace with encoded version: `&lt;`, `&gt;` etc. |
| "Cannot read property 'call' of undefined" / "foo is not a function" | make sure mixin names are spelled correctly and mixins file is included with the correct path |
| "no closing bracket found" | make sure inline elements end with a `]`, like `#[code spacy.load('en')]` and for nested inline elements, make sure they're all on the same line and contain spaces between them (**bad:** `#[+api("doc")#[code Doc]]`) |
If Harp fails and throws a Jade error, don't take the reported line number at face value it's often wrong, as the page is compiled from templates and several files.

View File

@ -1,59 +0,0 @@
{
"index": {
"landing": true,
"logos": [
{
"airbnb": [ "https://www.airbnb.com", 150, 45],
"quora": [ "https://www.quora.com", 120, 34 ],
"retriever": [ "https://www.retriever.no", 150, 33 ],
"stitchfix": [ "https://www.stitchfix.com", 150, 18 ]
},
{
"chartbeat": [ "https://chartbeat.com", 180, 25 ],
"allenai": [ "https://allenai.org", 220, 37 ]
}
],
"features": [
{
"recode": ["https://www.recode.net/2017/6/22/15855492/ai-artificial-intelligence-nonprofit-good-human-chatbots-machine-learning", 100, 25],
"wapo": ["https://www.washingtonpost.com/news/wonk/wp/2016/05/18/googles-new-artificial-intelligence-cant-understand-these-sentences-can-you/", 100, 77],
"bbc": ["http://www.bbc.co.uk/rd/blog/2017-08-irfs-weeknotes-number-250", 90, 26],
"microsoft": ["https://www.microsoft.com/developerblog/2016/09/13/training-a-classifier-for-relation-extraction-from-medical-literature/", 130, 28]
},
{
"venturebeat": ["https://venturebeat.com/2017/01/27/4-ai-startups-that-analyze-customer-reviews/", 150, 19],
"thoughtworks": ["https://www.thoughtworks.com/radar/tools", 150, 28]
}
]
},
"robots.txt": {
"layout": false
},
"404": {
"title": "404 Error",
"landing": true
},
"styleguide": {
"title": "Styleguide",
"sidebar": {
"Styleguide": { "": "styleguide" },
"Resources": {
"Website Source": "https://github.com/explosion/spacy/tree/master/website",
"Contributing Guide": "https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md"
}
},
"menu": {
"Introduction": "intro",
"Logo": "logo",
"Colors": "colors",
"Typography": "typography",
"Elements": "elements",
"Components": "components",
"Embeds": "embeds",
"Markup Reference": "markup"
}
}
}

View File

@ -1,97 +0,0 @@
{
"globals": {
"title": "spaCy",
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
"SITENAME": "spaCy",
"SLOGAN": "Industrial-strength Natural Language Processing in Python",
"SITE_URL": "https://spacy.io",
"EMAIL": "contact@explosion.ai",
"COMPANY": "Explosion AI",
"COMPANY_URL": "https://explosion.ai",
"DEMOS_URL": "https://explosion.ai/demos",
"MODELS_REPO": "explosion/spacy-models",
"SPACY_VERSION": "2.1",
"BINDER_VERSION": "2.0.16",
"SOCIAL": {
"twitter": "spacy_io",
"github": "explosion",
"reddit": "spacynlp",
"codepen": "explosion",
"gitter": "explosion/spaCy"
},
"NAVIGATION": {
"Usage": "/usage",
"Models": "/models",
"API": "/api",
"Universe": "/universe"
},
"FOOTER": {
"spaCy": {
"Usage": "/usage",
"Models": "/models",
"API Reference": "/api",
"Universe": "/universe"
},
"Support": {
"Issue Tracker": "https://github.com/explosion/spaCy/issues",
"Stack Overflow": "http://stackoverflow.com/questions/tagged/spacy",
"Reddit Usergroup": "https://www.reddit.com/r/spacynlp/",
"Gitter Chat": "https://gitter.im/explosion/spaCy"
},
"Connect": {
"Twitter": "https://twitter.com/spacy_io",
"GitHub": "https://github.com/explosion/spaCy",
"Blog": "https://explosion.ai/blog",
"Contact": "mailto:contact@explosion.ai"
}
},
"QUICKSTART": [
{ "id": "os", "title": "Operating system", "options": [
{ "id": "mac", "title": "macOS / OSX", "checked": true },
{ "id": "windows", "title": "Windows" },
{ "id": "linux", "title": "Linux" }]
},
{ "id": "package", "title": "Package manager", "options": [
{ "id": "pip", "title": "pip", "checked": true },
{ "id": "conda", "title": "conda" },
{ "id": "source", "title": "from source" }]
},
{ "id": "python", "title": "Python version", "options": [
{ "id": 2, "title": "2.x" },
{ "id": 3, "title": "3.x", "checked": true }]
},
{ "id": "config", "title": "Configuration", "multiple": true, "options": [
{"id": "venv", "title": "virtualenv", "help": "Use a virtual environment and install spaCy into a user directory" }]
},
{ "id": "model", "title": "Models", "multiple": true }
],
"QUICKSTART_MODELS": [
{ "id": "lang", "title": "Language"},
{ "id": "load", "title": "Loading style", "options": [
{ "id": "spacy", "title": "Use spacy.load()", "checked": true, "help": "Use spaCy's built-in loader to load the model by name." },
{ "id": "module", "title": "Import as module", "help": "Import the model explicitly as a Python module." }]
},
{ "id": "config", "title": "Options", "multiple": true, "options": [
{ "id": "example", "title": "Show usage example" }]
}
],
"V_CSS": "2.2.1",
"V_JS": "2.2.4",
"DEFAULT_SYNTAX": "python",
"ANALYTICS": "UA-58931649-1",
"MAILCHIMP": {
"user": "spacy.us12",
"id": "83b0498b1e7fa3c91ce68c3f1",
"list": "89ad33e698"
}
}
}

View File

@ -1,28 +0,0 @@
//- 💫 INCLUDES > FOOTER
footer.o-footer.u-text
+grid.o-content
each group, label in FOOTER
+grid-col("quarter")
ul
li.u-text-label.u-color-subtle=label
each url, item in group
li
+a(url)=item
if SECTION == "index"
+grid-col("quarter")
include _newsletter
if SECTION != "index"
.o-content.o-block.u-border-dotted
include _newsletter
.o-inline-list.u-text-center.u-text-tiny.u-color-subtle
span &copy; 2016-#{new Date().getFullYear()} #[+a(COMPANY_URL, true)=COMPANY]
+a(COMPANY_URL, true)(aria-label="Explosion AI")
+icon("explosion", 45).o-icon.u-color-theme.u-grayscale
+a(COMPANY_URL + "/legal", true) Legal / Imprint

View File

@ -1,95 +0,0 @@
//- 💫 INCLUDES > FUNCTIONS
//- Descriptive variables, available in the global scope
- CURRENT = current.source
- SECTION = current.path[0]
- LANGUAGES = public.models._data.LANGUAGES
- MODELS = public.models._data.MODELS
- CURRENT_MODELS = MODELS[current.source] || []
- MODEL_COUNT = Object.keys(MODELS).map(m => Object.keys(MODELS[m]).length).reduce((a, b) => a + b)
- MODEL_LANG_COUNT = Object.keys(MODELS).length
- LANG_COUNT = Object.keys(LANGUAGES).length - 1
- MODEL_META = public.models._data.MODEL_META
- MODEL_LICENSES = public.models._data.MODEL_LICENSES
- MODEL_BENCHMARKS = public.models._data.MODEL_BENCHMARKS
- EXAMPLE_SENT_LANGS = public.models._data.EXAMPLE_SENT_LANGS
- EXAMPLE_SENTENCES = public.models._data.EXAMPLE_SENTENCES
- IS_PAGE = (SECTION != "index") && !landing
- IS_MODELS = (SECTION == "models" && LANGUAGES[current.source])
- HAS_MODELS = IS_MODELS && CURRENT_MODELS.length
//- Get page URL
- function getPageUrl() {
- var path = current.path;
- if(path[path.length - 1] == 'index') path = path.slice(0, path.length - 1);
- return `${SITE_URL}/${path.join('/')}`;
- }
//- Get pretty page title depending on section
- function getPageTitle() {
- var sections = ['api', 'usage', 'models'];
- if (sections.includes(SECTION)) {
- var titleSection = (SECTION == "api") ? 'API' : SECTION.charAt(0).toUpperCase() + SECTION.slice(1);
- return `${title} · ${SITENAME} ${titleSection} Documentation`;
- }
- else if (SECTION != 'index') return `${title} · ${SITENAME}`;
- return `${SITENAME} · ${SLOGAN}`;
- }
//- Get social image based on section and settings
- function getPageImage() {
- var img = (SECTION == 'api') ? 'api' : 'default';
- return `${SITE_URL}/assets/img/social/preview_${preview || img}.jpg`;
- }
//- Add prefixes to items of an array (for modifier CSS classes)
array - [array] list of class names or options, e.g. ["foot"]
prefix - [string] prefix to add to each class, e.g. "c-table__row"
RETURNS - [array] list of modified class names
- function prefixArgs(array, prefix) {
- return array.map(arg => prefix + '--' + arg).join(' ');
- }
//- Convert API paths (semi-temporary fix for renamed sections)
path - [string] link path supplied to +api mixin
RETURNS - [string] new link path to correct location
- function convertAPIPath(path) {
- if (path.startsWith('spacy#') || path.startsWith('displacy#') || path.startsWith('util#')) {
- var comps = path.split('#');
- return "top-level#" + comps[0] + '.' + comps[1];
- }
- return path;
- }
//- Get model components from ID. Components can then be looked up in LANGUAGES
and MODEL_META respectively, to get their human-readable form.
id - [string] model ID, e.g. "en_core_web_sm"
RETURNS - [object] object keyed by components lang, type, genre and size
- function getModelComponents(id) {
- var comps = id.split('_');
- return {'lang': comps[0], 'type': comps[1], 'genre': comps[2], 'size': comps[3]}
- }
//- Generate GitHub links
repo - [string] name of repo owned by explosion
filepath - [string] logical path to file relative to repository root
branch - [string] optional branch, defaults to "master"
RETURNS - [string] the correct link to the file on GitHub
- function gh(repo, filepath, branch) {
- var branch = ALPHA ? 'develop' : branch
- return 'https://github.com/' + SOCIAL.github + '/' + (repo || '') + (filepath ? '/blob/' + (branch || 'master') + '/' + filepath : '' );
- }

View File

@ -1,749 +0,0 @@
//- 💫 INCLUDES > MIXINS
include _functions
//- Section
id - [string] anchor assigned to section (used for breadcrumb navigation)
mixin section(id)
section.o-section(id=id ? "section-" + id : null data-section=id)&attributes(attributes)
block
//- Accordion (collapsible sections)
title - [string] Section title.
id - [string] Optional section ID for permalinks.
level - [integer] Headline level for section title.
mixin accordion(title, id, level)
section.o-accordion.o-block
+h(level || 4).o-no-block(id=id)
button.o-accordion__button.o-grid.o-grid--vcenter.o-grid--space.js-accordion(aria-expanded="false")=title
svg.o-accordion__icon(width="20" height="20" viewBox="0 0 10 10" aria-hidden="true" focusable="false")
rect.o-accordion__hide(height="8" width="2" y="1" x="4")
rect(height="2" width="8" y="4" x="1")
.o-accordion__content(hidden="")
block
//- Headlines Helper Mixin
level - [integer] 1, 2, 3, 4, or 5
mixin headline(level)
if level == 1
h1.u-heading-1&attributes(attributes)
block
else if level == 2
h2.u-heading-2&attributes(attributes)
block
else if level == 3
h3.u-heading-3&attributes(attributes)
block
else if level == 4
h4.u-heading-4&attributes(attributes)
block
else if level == 5
h5.u-heading-5&attributes(attributes)
block
//- Headlines
level - [integer] headline level, corresponds to h1, h2, h3 etc.
id - [string] unique identifier, creates permalink (optional)
mixin h(level, id, source)
+headline(level).u-heading(id=id)&attributes(attributes)
+permalink(id)
block
if source
+button(gh("spacy", source), false, "secondary", "small").u-nowrap.u-float-right
span Source #[+icon("code", 14).o-icon--inline]
//- Permalink rendering
id - [string] permalink ID used for link anchor
mixin permalink(id)
if id
a.u-permalink(href="##{id}")
block
else
block
//- External links
url - [string] link href
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
info: https://mathiasbynens.github.io/rel-noopener/
mixin a(url, trusted)
- external = url.includes("http")
a(href=url target=external ? "_blank" : null rel=external && !trusted ? "noopener nofollow" : null)&attributes(attributes)
block
//- Source link (with added icon for "code")
url - [string] link href, can also be gh() function to generate GitHub link
see _functions.jade for more info
mixin src(url)
span.u-inline-block.u-nowrap
+a(url)
block
| #[+icon("code", 16).o-icon--inline.u-color-theme]
//- API link (with added tag and automatically generated path)
path - [string] path to API docs page relative to /api/
mixin api(path)
- path = convertAPIPath(path)
+a("/api/" + path, true)(target="_self").u-no-border.u-inline-block.u-nowrap
block
| #[+icon("book", 16).o-icon--inline.u-color-theme]
//- Help icon with tooltip
tooltip - [string] Tooltip text
icon_size - [integer] Optional size of help icon in px.
mixin help(tooltip, icon_size)
span(data-tooltip=tooltip)&attributes(attributes)
if tooltip
span.u-hidden(aria-role="tooltip")=tooltip
+icon("help_o", icon_size || 16).o-icon--inline
//- Abbreviation
mixin abbr(title)
abbr.o-abbr(data-tooltip=title data-tooltip-style="code" aria-label=title)&attributes(attributes)
block
//- Aside wrapper
label - [string] aside label
mixin aside-wrapper(label, emoji)
aside.c-aside
.c-aside__content(role="complementary")&attributes(attributes)
if label
h4.u-text-label.u-text-label--dark
if emoji
span.o-emoji=emoji
| #{label}
block
//- Aside for text
label - [string] aside title (optional)
mixin aside(label, emoji)
+aside-wrapper(label, emoji)
.c-aside__text.u-text-small&attributes(attributes)
block
//- Aside for code
label - [string] aside title (optional or false for no label)
language - [string] language for syntax highlighting (default: "python")
supports basic relevant languages available for PrismJS
prompt - [string] prompt displayed before first line, e.g. "$"
mixin aside-code(label, language, prompt)
+aside-wrapper(label)&attributes(attributes)
+code(false, language, prompt).o-no-block
block
//- Infobox
label - [string] infobox title (optional or false for no title)
emoji - [string] optional emoji displayed before the title, necessary as
argument to be able to wrap it for spacing
mixin infobox(label, emoji)
aside.o-box.o-block.u-text-small&attributes(attributes)
if label
h3.u-heading.u-text-label.u-color-theme
if emoji
span.o-emoji=emoji
| #{label}
block
//- Logos displayed in the top corner of some infoboxes
logos - [array] List of icon ID, width, height and link.
mixin infobox-logos(...logos)
.o-box__logos.u-text-right.u-float-right
for logo in logos
if logo[3]
| #[+a(logo[3]).u-inline-block.u-hide-link.u-padding-small #[+icon(logo[0], logo[1], logo[2]).u-color-dark]]
else
| #[+icon(logo[0], logo[1], logo[2]).u-color-dark]
//- SVG from map (uses embedded SVG sprite)
name - [string] SVG symbol id
width - [integer] width in px
height - [integer] height in px (default: same as width)
mixin svg(name, width, height)
svg(aria-hidden="true" viewBox="0 0 #{width} #{height || width}" width=width height=(height || width))&attributes(attributes)
use(xlink:href="#svg_#{name}")
//- Icon
name - [string] icon name (will be used as symbol id: #svg_{name})
width - [integer] icon width (default: 20)
height - [integer] icon height (defaults to width)
mixin icon(name, width, height)
- var width = width || 20
- var height = height || width
+svg(name, width, height).o-icon(style="min-width: #{width}px")&attributes(attributes)
//- Pro/Con/Neutral icon
icon - [string] "pro", "con" or "neutral" (default: "neutral")
size - [integer] icon size (optional)
mixin procon(icon, label, show_label, size)
- var colors = { yes: "green", no: "red", neutral: "subtle" }
span.u-nowrap
+icon(icon, size || 20)(class="u-color-#{colors[icon] || 'subtle'}").o-icon--inline&attributes(attributes)
span.u-text-small(class=show_label ? null : "u-hidden")=(label || icon)
//- Link button
url - [string] link href
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
info: https://mathiasbynens.github.io/rel-noopener/
...style - all other arguments are added as class names c-button--argument
see assets/css/_components/_buttons.sass
mixin button(url, trusted, ...style)
- external = url && url.includes("http")
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target=external ? "_blank" : null rel=external && !trusted ? "noopener nofollow" : null)&attributes(attributes)
block
//- Code block
label - [string] aside title (optional or false for no label)
language - [string] language for syntax highlighting (default: "python")
supports basic relevant languages available for PrismJS
prompt - [string] prompt displayed before first line, e.g. "$"
height - [integer] optional height to clip code block to
icon - [string] icon displayed next to code block (e.g. "accept" for new code)
wrap - [boolean] wrap text and disable horizontal scrolling
mixin code(label, language, prompt, height, icon, wrap)
- var lang = (language != "none") ? (language || DEFAULT_SYNTAX) : null
- var lang_class = (language != "none") ? "lang-" + (language || DEFAULT_SYNTAX) : null
pre.c-code-block.o-block(data-language=lang class=lang_class class=icon ? "c-code-block--has-icon" : null style=height ? "height: #{height}px" : null)&attributes(attributes)
if label
h4.u-text-label.u-text-label--dark=label
if icon
- var classes = {'accept': 'u-color-green', 'reject': 'u-color-red'}
.c-code-block__icon(class=classes[icon] || null class=classes[icon] ? "c-code-block__icon--border" : null)
+icon(icon, 18)
code.c-code-block__content(class=wrap ? "u-wrap" : null data-prompt=prompt)
block
//- Executable code
mixin code-exec(label, large)
- label = (label || "Editable code example") + " (experimental)"
+terminal-wrapper(label, !large)
figure.juniper-wrapper
span.juniper-wrapper__text.u-text-tiny v#{BINDER_VERSION} &middot; Python 3 &middot; via #[+a("https://mybinder.org/").u-hide-link Binder]
+code(data-executable="true")&attributes(attributes)
block
//- Wrapper for code blocks to display old/new versions
mixin code-wrapper()
span.u-inline-block.u-padding-top.u-width-full
block
//- Code blocks to display old/new versions
label - [string] ARIA label for block. Defaults to "correct"/"incorrect".
mixin code-old(label, lang, prompt)
- var label = label || 'incorrect'
+code(false, lang, prompt, false, "reject").o-block-small(aria-label=label)
block
mixin code-new(label, lang, prompt)
- var label = label || 'correct'
+code(false, lang, prompt, false, "accept").o-block-small(aria-label=label)
block
//- CodePen embed
slug - [string] ID of CodePen demo (taken from URL)
height - [integer] height of demo embed iframe
default_tab - [string] code tab(s) visible on load (default: "result")
mixin codepen(slug, height, default_tab)
figure.o-block(style="min-height: #{height}px")&attributes(attributes)
.codepen(data-height=height data-theme-id="31335" data-slug-hash=slug data-default-tab=(default_tab || "result") data-embed-version="2" data-user=SOCIAL.codepen)
+a("https://codepen.io/" + SOCIAL.codepen + "/" + slug) View on CodePen
script(async src="https://assets.codepen.io/assets/embed/ei.js")
//- GitHub embed
repo - [string] repository owned by explosion organization
file - [string] logical path to file, relative to repository root
alt_file - [string] alternative file path used in footer and link button
height - [integer] height of code preview in px
mixin github(repo, file, height, alt_file, language)
- var branch = ALPHA ? "develop" : "master"
- var height = height || 250
figure.o-block
pre.c-code-block.o-block-small(class="lang-#{(language || DEFAULT_SYNTAX)}" style="height: #{height}px; min-height: #{height}px")
code.c-code-block__content(data-gh-embed="#{repo}/#{branch}/#{file}").
Can't fetch code example from GitHub :(
Please use the link below to view the example. If you've come across
a broken link, we always appreciate a pull request to the repository,
or a report on the issue tracker. Thanks!
footer.o-grid.u-text
.o-block-small.u-flex-full.u-padding-small #[+icon("github")] #[code.u-break.u-break--all=repo + '/' + (alt_file || file)]
div
+button(gh(repo, alt_file || file), false, "primary", "small") View on GitHub
//- Youtube video embed
id - [string] ID of YouTube video.
ratio - [string] Video ratio, "16x9" or "4x3".
mixin youtube(id, ratio)
figure.o-video.o-block(class="o-video--" + (ratio || "16x9"))
iframe.o-video__iframe(src="https://www.youtube.com/embed/#{id}" frameborder="0" height="500" allowfullscreen)
//- Images / figures
url - [string] url or path to image
width - [integer] image width in px, for better rendering (default: 500)
caption - [string] image caption
alt - [string] alternative image text, defaults to caption
mixin image(url, width, caption, alt)
figure.o-block&attributes(attributes)
if url
img(src=url alt=(alt || caption) width="#{width || 500}")
if caption
+image-caption=caption
block
//- Image caption
mixin image-caption()
figcaption.u-text-small.u-color-subtle.u-padding-small&attributes(attributes)
block
//- Graphic or illustration with button
original - [string] Path to original image
mixin graphic(original)
+image
block
if original
.u-text-right
+button(original, false, "secondary", "small") View large graphic
//- Chart.js
id - [string] chart ID, will be assigned as #chart_{id}
mixin chart(id, height)
figure.o-block&attributes(attributes)
canvas(id="chart_#{id}" width="800" height=(height || "400") style="max-width: 100%")
//- Labels
mixin label()
.u-text-label.u-color-dark&attributes(attributes)
block
mixin label-inline()
strong.u-text-label.u-color-dark&attributes(attributes)
block
//- Tag
tooltip - [string] optional tooltip text.
hide_icon - [boolean] hide tooltip icon
mixin tag(tooltip, hide_icon)
div.u-text-tag.u-text-tag--spaced(data-tooltip=tooltip)&attributes(attributes)
block
if tooltip
if !hide_icon
| #[+icon("help", 12).o-icon--tag]
| #[span.u-hidden(aria-role="tooltip")=tooltip]
//- "Requires model" tag with tooltip and list of capabilities
...capabs - [string] Required model capabilities, e.g. "vectors".
mixin tag-model(...capabs)
- var intro = "To use this functionality, spaCy needs a model to be installed"
- var ext = capabs.length ? " that supports the following capabilities: " + capabs.join(', ') : ""
+tag(intro + ext + ".") Needs model
//- "New" tag to label features new in a specific version
By using a separate mixin with a version ID, it becomes easy to quickly
enable/disable tags without having to modify the markup in the docs.
version - [string or integer] version number, without "v" prefix
mixin tag-new(version)
- var version = (typeof version == 'number') ? version.toFixed(1) : version
- var tooltip = "This feature is new and was introduced in spaCy v" + version
+tag(tooltip, true) v#{version}
//- List
type - [string] "numbers", "letters", "roman" (bulleted list if none set)
start - [integer] start number
mixin list(type, start)
if type
ol.c-list.o-block.u-text(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : null)&attributes(attributes)
block
else
ul.c-list.c-list--bullets.o-block.u-text&attributes(attributes)
block
//- List item (only used within +list)
mixin item()
li.c-list__item&attributes(attributes)
block
//- Table
head - [array] table headings (should match number of columns)
mixin table(head)
table.c-table.o-block&attributes(attributes)
if head
+row("head")
each column in head
+head-cell=column
block
//- Table row (only used within +table)
mixin row(...style)
tr.c-table__row(class=prefixArgs(style, "c-table__row"))&attributes(attributes)
block
//- Header table cell (only used within +row)
mixin head-cell()
th.c-table__head-cell.u-text-label&attributes(attributes)
block
//- Table cell (only used within +row in +table)
mixin cell(...style)
td.c-table__cell.u-text(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
block
//- Grid Container
...style - all arguments are added as class names o-grid--argument
see assets/css/_base/_grid.sass
mixin grid(...style)
.o-grid.o-block(class=prefixArgs(style, "o-grid"))&attributes(attributes)
block
//- Grid Column (only used within +grid)
width - [string] "quarter", "third", "half", "two-thirds", "three-quarters"
see $grid in assets/css/_variables.sass
mixin grid-col(...style)
.o-grid__col(class=prefixArgs(style, "o-grid__col"))&attributes(attributes)
block
//- Card (only used within +grid)
title - [string] card title
url - [string] link for card
author - [string] optional author, displayed as byline at the bottom
icon - [string] optional ID of icon displayed with card
width - [string] optional width of grid column, defaults to "half"
mixin card(title, url, author, icon, width)
+grid-col(width || "half").o-box.o-grid.o-grid--space.u-text&attributes(attributes)
+a(url)
h4.u-heading.u-text-label
if icon
+icon(icon, 25).u-float-right
if title
span.u-color-dark=title
.o-block-small.u-text-small
block
if author
.u-color-subtle.u-text-tiny by #{author}
//- Table of contents, to be used with +item mixins for links
col - [string] width of column (see +grid-col)
mixin table-of-contents(col)
+grid-col(col || "half")
+infobox
+label.o-block-small Table of contents
+list("numbers").u-text-small.o-no-block
block
//- Bibliography
id - [string] ID of bibliography component, for anchor links. Can be used if
there's more than one bibliography on one page.
mixin bibliography(id)
section(id=id || "bibliography")
+infobox
+label.o-block-small Bibliography
+list("numbers").u-text-small.o-no-block
block
//- Footnote
id - [string / integer] ID of footnote.
bib_id - [string] ID of bibliography component, defaults to "bibliography".
tooltip - [string] optional text displayed as tooltip
mixin fn(id, bib_id, tooltip)
sup.u-padding-small(id="bib" + id data-tooltip=tooltip)
span.u-text-tag
+a("#" + (bib_id || "bibliography")).u-hide-link #{id}
//- Table rows for annotation specs
mixin pos-row(tag, pos, morph, desc)
+row
+cell #[code(class=(tag.length > 10) ? "u-break u-break--all" : null)=tag]
+cell #[code=pos]
+cell
- var morphs = morph.includes("|") ? morph.split("|") : morph.split(" ")
for m in morphs
if m
| #[code=m]
+cell.u-text-small=desc
mixin ud-row(tag, desc, example)
+row
+cell #[code=tag]
+cell.u-text-small=desc
if example
+cell.u-text-small
em=example
mixin dep-row(label, desc)
+row
+cell #[code=label]
+cell=desc
//- Table rows for linguistic annotations
annots [array] - array of cell content
style [array] array of 1 (display as code) or 0 (display as text)
mixin annotation-row(annots, style)
+row
for cell, i in annots
if style && style[i]
- cell = (typeof(cell) != 'boolean') ? cell : cell ? 'True' : 'False'
+cell #[code=cell]
else
+cell=cell
block
//- spaCy logo
mixin logo()
+svg("spacy", 675, 215).o-logo&attributes(attributes)
//- Gitter chat button and widget
button - [string] text shown on button
label - [string] title of chat window (default: same as button)
mixin gitter(button, label)
aside.js-gitter.c-chat.is-collapsed(data-title=(label || button))
button.js-gitter-button.c-chat__button.u-text-tag
+icon("chat", 16).o-icon--inline
!=button
//- Badge
image - [string] path to badge image
url - [string] badge link
mixin badge(image, url)
+a(url).u-padding-small.u-hide-link&attributes(attributes)
img.o-badge(src=image alt=url height="20")
//- Quickstart widget
quickstart.js with manual markup, inspired by PyTorch's "Getting started"
groups - [object] option groups, uses global variable QUICKSTART
headline - [string] optional text to be rendered as widget headline
mixin quickstart(groups, headline, description, hide_results)
.c-quickstart.o-block-small#qs
.c-quickstart__content
if headline
+h(2)=headline
if description
p=description
for group in groups
.c-quickstart__group.u-text-small(data-qs-group=group.id)
if group.title
.c-quickstart__legend=group.title
if group.help
| #[+help(group.help)]
.c-quickstart__fields
for option in group.options
input.c-quickstart__input(class="c-quickstart__input--" + (group.input_style ? group.input_style : group.multiple ? "check" : "radio") type=group.multiple ? "checkbox" : "radio" name=group.id id="qs-#{option.id}" value=option.id checked=option.checked)
label.c-quickstart__label.u-text-tiny(for="qs-#{option.id}")!=option.title
if option.meta
| #[span.c-quickstart__label__meta (#{option.meta})]
if option.help
| #[+help(option.help)]
if hide_results
block
else
pre.c-code-block
code.c-code-block__content.c-quickstart__code(data-qs-results="")
block
//- Quickstart code item
data - [object] Rendering conditions (keyed by option group ID, value: option)
style - [string] modifier ID for line style
mixin qs(data, style)
- args = {}
for value, setting in data
- args['data-qs-' + setting] = value
span.c-quickstart__line(class="c-quickstart__line--#{style || 'bash'}")&attributes(args)
block
//- Terminal-style code window
label - [string] title displayed in top bar of terminal window
mixin terminal-wrapper(label, small)
.x-terminal(class=small ? "x-terminal--small" : null)
.x-terminal__icons(class=small ? "x-terminal__icons--small" : null): span
.u-padding-small.u-text-center(class=small ? "u-text-tiny" : "u-text")
strong=label
block
mixin terminal(label, button_text, button_url, exec)
+terminal-wrapper(label)
+code.x-terminal__code(data-executable=exec ? "" : null)
block
if button_text && button_url
+button(button_url, true, "primary", "small").x-terminal__button=button_text
//- Landing
mixin landing-header()
header.c-landing
.c-landing__wrapper
.c-landing__content
block
mixin landing-banner(headline, label)
.c-landing__banner.u-padding.o-block.u-color-light
+grid.c-landing__banner__content.o-no-block
+grid-col("third")
h3.u-heading.u-heading-1
if label
div
span.u-text-label.u-text-label--light=label
!=headline
+grid-col("two-thirds").c-landing__banner__text
block
mixin landing-logos(title, logos)
.o-content.u-text-center&attributes(attributes)
h3.u-heading.u-text-label.u-color-dark=title
each row, i in logos
- var is_last = i == logos.length - 1
+grid("center").o-inline-list.o-no-block(class=is_last ? "o-no-block" : null)
each details, name in row
+a(details[0]).u-padding-medium
+icon(name, details[1], details[2])
if is_last
block
//- Under construction (temporary)
Marks sections that still need to be completed for the v2.0 release.
mixin under-construction()
+infobox("Under construction", "🚧")
| This section is still being written and will be updated as soon as
| possible. Is there anything that you think should definitely
| mentioned or explained here? Any examples you'd like to see?
| #[strong Let us know] on the #[+a(gh("spacy") + "/issues") issue tracker]!
//- Legacy docs
mixin legacy()
+aside("Looking for the old docs?", "📖")
| To help you make the transition from v1.x to v2.0, we've uploaded the
| old website to #[strong #[+a("https://legacy.spacy.io/docs") legacy.spacy.io]].
| Wherever possible, the new docs also include notes on features that have
| changed in v2.0, and features that were introduced in the new version.

View File

@ -1,16 +0,0 @@
//- 💫 INCLUDES > TOP NAVIGATION
nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : null)
a(href="/" aria-label=SITENAME) #[+logo]
ul.c-nav__menu
- var current_url = '/' + current.path[0]
each url, item in NAVIGATION
- var is_active = (current_url == url)
li.c-nav__menu__item(class=is_active ? "is-active" : null)
+a(url)(tabindex=is_active ? "-1" : null)=item
li.c-nav__menu__item
+a(gh("spaCy"))(aria-label="GitHub") #[+icon("github", 20)]
progress.c-progress.js-progress(value="0" max="1")

View File

@ -1,16 +0,0 @@
//- 💫 INCLUDES > NEWSLETTER
ul.o-block-small
li.u-text-label.u-color-subtle Stay in the loop!
li Receive updates about new releases, tutorials and more.
form.o-grid#mc-embedded-subscribe-form(action="//#{MAILCHIMP.user}.list-manage.com/subscribe/post?u=#{MAILCHIMP.id}&amp;id=#{MAILCHIMP.list}" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
//- MailChimp spam protection
div(style="position: absolute; left: -5000px;" aria-hidden="true")
input(type="text" name="b_#{MAILCHIMP.id}_#{MAILCHIMP.list}" tabindex="-1" value="")
.o-grid-col.o-grid.o-grid--nowrap.o-field.u-padding-small
div
input#mce-EMAIL.o-field__input.u-text(type="email" name="EMAIL" placeholder="Your email" aria-label="Your email")
button#mc-embedded-subscribe.o-field__button.u-text-label.u-color-theme.u-nowrap(type="submit" name="subscribe") Sign up

View File

@ -1,54 +0,0 @@
//- 💫 INCLUDES > DOCS PAGE TEMPLATE
- sidebar_content = (public[SECTION] ? public[SECTION]._data.sidebar : public._data[SECTION] ? public._data[SECTION].sidebar : false) || FOOTER
include _sidebar
main.o-main.o-main--sidebar.o-main--aside
article.o-content
+grid.o-no-block
+h(1).u-heading--title=title.replace("'", "")
if tag
+tag=tag
if tag_new
+tag-new(tag_new)
if teaser
.u-heading__teaser.u-text-small.u-color-dark=teaser
else if IS_MODELS
.u-heading__teaser.u-text-small.u-color-dark
| Available statistical models for
| #[code=current.source] (#{LANGUAGES[current.source]}).
if source
.o-block.u-text-right
+button(gh("spacy", source), false, "secondary", "small").u-nowrap
| Source #[+icon("code", 14)]
if IS_MODELS
include _page_models
else
!=yield
+grid.o-content.u-text
+grid-col("half")
if !IS_MODELS
.o-inline-list
+button(gh("spacy", "website/" + current.path.join('/') + ".jade"), false, "secondary", "small")
| #[span.o-icon Suggest edits] #[+icon("code", 14)]
+grid-col("half").u-text-right
if next && public[SECTION]._data[next]
- data = public[SECTION]._data[next]
+grid("vcenter")
+a(next).u-text-small.u-flex-full
h4.u-text-label.u-color-dark Read next
| #{data.title}
+a(next).c-icon-button.c-icon-button--right(aria-hidden="true")
+icon("arrow-right", 24)
+gitter("spaCy chat")
include _footer

View File

@ -1,109 +0,0 @@
//- 💫 INCLUDES > MODELS PAGE TEMPLATE
for id in CURRENT_MODELS
- var comps = getModelComponents(id)
+section(id)
section(data-vue=id data-model=id)
+grid("vcenter").o-no-block(id=id)
+grid-col("two-thirds")
+h(2)
+a("#" + id).u-permalink=id
+grid-col("third").u-text-right
.u-color-subtle.u-text-tiny
+button(gh("spacy-models") + "/releases", true, "secondary", "small")(v-bind:href="releaseUrl")
| Release details
.u-padding-small Latest: #[code(v-text="version") n/a]
+aside-code("Installation", "bash", "$").
python -m spacy download #{id}
p(v-if="description" v-text="description")
+infobox(v-if="error")
| Unable to load model details from GitHub. To find out more
| about this model, see the overview of the
| #[+a(gh("spacy-models") + "/releases") latest model releases].
+table.o-block-small(v-bind:data-loading="loading")
+row
+cell #[+label Language]
+cell #[+tag=comps.lang] #{LANGUAGES[comps.lang]}
for comp, label in {"Type": comps.type, "Genre": comps.genre}
+row
+cell #[+label=label]
+cell #[+tag=comp] #{MODEL_META[comp]}
+row
+cell #[+label Size]
+cell #[+tag=comps.size] #[span(v-text="sizeFull" v-if="sizeFull")] #[em(v-else="") n/a]
+row(v-if="pipeline && pipeline.length" v-cloak="")
+cell
+label Pipeline #[+help(MODEL_META.pipeline).u-color-subtle]
+cell
span(v-for="(pipe, index) in pipeline" v-if="pipeline")
code(v-text="pipe")
span(v-if="index != pipeline.length - 1") ,&nbsp;
+row(v-if="vectors" v-cloak="")
+cell
+label Vectors #[+help(MODEL_META.vectors).u-color-subtle]
+cell(v-text="vectors")
+row(v-if="sources && sources.length" v-cloak="")
+cell
+label Sources #[+help(MODEL_META.sources).u-color-subtle]
+cell
span(v-for="(source, index) in sources") {{ source }}
span(v-if="index != sources.length - 1") ,&nbsp;
+row(v-if="author" v-cloak="")
+cell #[+label Author]
+cell
+a("")(v-bind:href="url" v-if="url" v-text="author")
span(v-else="" v-text="author") {{ model.author }}
+row(v-if="license" v-cloak="")
+cell #[+label License]
+cell
+a("")(v-bind:href="modelLicenses[license]" v-if="modelLicenses[license]") {{ license }}
span(v-else="") {{ license }}
+row(v-cloak="")
+cell #[+label Compat #[+help(MODEL_META.compat).u-color-subtle]]
+cell
.o-field.u-float-left
select.o-field__select.u-text-small(v-model="spacyVersion")
option(v-for="version in orderedCompat" v-bind:value="version") spaCy v{{ version }}
code(v-if="compatVersion" v-text="compatVersion")
em(v-else="") not compatible
+grid.o-block-small(v-cloak="" v-if="hasAccuracy")
for keys, label in MODEL_BENCHMARKS
.u-flex-full.u-padding-small
+table.o-block-small
+row("head")
+head-cell(colspan="2")=(MODEL_META["benchmark_" + label] || label)
for label, field in keys
+row
+cell.u-nowrap
+label=label
if MODEL_META[field]
| #[+help(MODEL_META[field]).u-color-subtle]
+cell("num")
span(v-if="#{field}" v-text="#{field}")
em(v-if="!#{field}") n/a
p.u-text-small.u-color-dark(v-if="notes" v-text="notes" v-cloak="")
if comps.size == "sm" && EXAMPLE_SENT_LANGS.includes(comps.lang)
section
+code-exec("Test the model live").
import spacy
from spacy.lang.#{comps.lang}.examples import sentences
nlp = spacy.load('#{id}')
doc = nlp(sentences[0])
print(doc.text)
for token in doc:
print(token.text, token.pos_, token.dep_)

View File

@ -1,28 +0,0 @@
//- 💫 INCLUDES > SCRIPTS
- scripts = ["vendor/prism.min", "vendor/vue.min"]
- if (SECTION == "universe") scripts.push("vendor/vue-markdown.min")
- if (quickstart) scripts.push("vendor/quickstart.min")
- if (IS_PAGE) scripts.push("vendor/in-view.min")
- if (IS_PAGE || SECTION == "index") scripts.push("vendor/juniper.min")
for script in scripts
script(src="/assets/js/" + script + ".js")
script(src="/assets/js/main.js?v#{V_JS}" type=(environment == "deploy") ? null : "module")
if environment == "deploy"
script(src="https://www.google-analytics.com/analytics.js", async)
script
| window.ga=window.ga||function(){
| (ga.q=ga.q||[]).push(arguments)}; ga.l=+new Date;
| ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
if IS_PAGE
script(src="https://sidecar.gitter.im/dist/sidecar.v1.js" async defer)
script
| ((window.gitter = {}).chat = {}).options = {
| useStyles: false,
| activationElement: '.js-gitter-button',
| targetElement: '.js-gitter',
| room: '!{SOCIAL.gitter}'
| };

View File

@ -1,23 +0,0 @@
//- 💫 INCLUDES > SIDEBAR
menu.c-sidebar.js-sidebar.u-text
if sidebar_content
each items, sectiontitle in sidebar_content
ul.c-sidebar__section.o-block-small
li.u-text-label.u-color-dark=sectiontitle
each url, item in items
- var is_current = CURRENT == url || (CURRENT == "index" && url == "./")
li.c-sidebar__item
+a(url)(class=is_current ? "is-active" : null tabindex=is_current ? "-1" : null data-sidebar-active=is_current ? "" : null)=item
if is_current
if IS_MODELS && CURRENT_MODELS.length
- menu = Object.assign({}, ...CURRENT_MODELS.map(id => ({ [id]: id })))
if menu
ul.c-sidebar__crumb.u-hidden-sm
- var counter = 0
for id, title in menu
- counter++
li.c-sidebar__crumb__item(data-nav=id)
+a("#section-" + id)=title

File diff suppressed because one or more lines are too long

View File

@ -1,57 +0,0 @@
//- 💫 GLOBAL LAYOUT
include _includes/_mixins
- title = IS_MODELS ? LANGUAGES[current.source] || title : title
- PAGE_URL = getPageUrl()
- PAGE_TITLE = getPageTitle()
- PAGE_IMAGE = getPageImage()
doctype html
html(lang="en")
head
title=PAGE_TITLE
meta(charset="utf-8")
meta(name="viewport" content="width=device-width, initial-scale=1.0")
meta(name="referrer" content="always")
meta(name="description" content=description)
meta(property="og:type" content="website")
meta(property="og:site_name" content=sitename)
meta(property="og:url" content=PAGE_URL)
meta(property="og:title" content=PAGE_TITLE)
meta(property="og:description" content=description)
meta(property="og:image" content=PAGE_IMAGE)
meta(name="twitter:card" content="summary_large_image")
meta(name="twitter:site" content="@" + SOCIAL.twitter)
meta(name="twitter:title" content=PAGE_TITLE)
meta(name="twitter:description" content=description)
meta(name="twitter:image" content=PAGE_IMAGE)
link(rel="shortcut icon" href="/assets/img/favicon.ico")
link(rel="icon" type="image/x-icon" href="/assets/img/favicon.ico")
if SECTION == "api"
link(href="/assets/css/style_green.css?v#{V_CSS}" rel="stylesheet")
else if SECTION == "universe"
link(href="/assets/css/style_purple.css?v#{V_CSS}" rel="stylesheet")
else
link(href="/assets/css/style.css?v#{V_CSS}" rel="stylesheet")
body
include _includes/_svg
include _includes/_navigation
if !landing
include _includes/_page-docs
else if SECTION == "universe"
!=yield
else
main!=yield
include _includes/_footer
include _includes/_scripts

View File

@ -1,43 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > BILUO
+table(["Tag", "Description"])
+row
+cell #[code #[span.u-color-theme B] EGIN]
+cell The first token of a multi-token entity.
+row
+cell #[code #[span.u-color-theme I] N]
+cell An inner token of a multi-token entity.
+row
+cell #[code #[span.u-color-theme L] AST]
+cell The final token of a multi-token entity.
+row
+cell #[code #[span.u-color-theme U] NIT]
+cell A single-token entity.
+row
+cell #[code #[span.u-color-theme O] UT]
+cell A non-entity token.
+aside("Why BILUO, not IOB?")
| There are several coding schemes for encoding entity annotations as
| token tags. These coding schemes are equally expressive, but not
| necessarily equally learnable.
| #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]
| showed that the minimal #[strong Begin], #[strong In], #[strong Out]
| scheme was more difficult to learn than the #[strong BILUO] scheme that
| we use, which explicitly marks boundary tokens.
p
| spaCy translates the character offsets into this scheme, in order to
| decide the cost of each action given the current state of the entity
| recogniser. The costs are then used to calculate the gradient of the
| loss, to train the model. The exact algorithm is a pastiche of
| well-known methods, and is not currently described in any single
| publication. The model is a greedy transition-based parser guided by a
| linear model whose weights are learned using the averaged perceptron
| loss, via the #[+a("http://www.aclweb.org/anthology/C12-1059") dynamic oracle]
| imitation learning strategy. The transition system is equivalent to the
| BILOU tagging scheme.

View File

@ -1,158 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > DEPENDENCY LABELS
p
| This section lists the syntactic dependency labels assigned by
| spaCy's #[+a("/models") models]. The individual labels are
| language-specific and depend on the training corpus.
+accordion("Universal Dependency Labels")
p
| The #[+a("http://universaldependencies.org/u/dep/") Universal Dependencies scheme]
| is used in all languages trained on Universal Dependency Corpora.
+table(["Dep", "Description"])
+ud-row("acl", "clausal modifier of noun (adjectival clause)")
+ud-row("advcl", "adverbial clause modifier")
+ud-row("advmod", "adverbial modifier")
+ud-row("amod", "adjectival modifier")
+ud-row("appos", "appositional modifier")
+ud-row("aux", "auxiliary")
+ud-row("case", "case marking")
+ud-row("cc", "coordinating conjunction")
+ud-row("ccomp", "clausal complement")
+ud-row("clf", "classifier")
+ud-row("compound", "compound")
+ud-row("conj", "conjunct")
+ud-row("cop", "copula")
+ud-row("csubj", "clausal subject")
+ud-row("dep", "unspecified dependency")
+ud-row("det", "determiner")
+ud-row("discourse", "discourse element")
+ud-row("dislocated", "dislocated elements")
+ud-row("expl", "expletive")
+ud-row("fixed", "fixed multiword expression")
+ud-row("flat", "flat multiword expression")
+ud-row("goeswith", "goes with")
+ud-row("iobj", "indirect object")
+ud-row("list", "list")
+ud-row("mark", "marker")
+ud-row("nmod", "nominal modifier")
+ud-row("nsubj", "nominal subject")
+ud-row("nummod", "numeric modifier")
+ud-row("obj", "object")
+ud-row("obl", "oblique nominal")
+ud-row("orphan", "orphan")
+ud-row("parataxis", "parataxis")
+ud-row("punct", "punctuation")
+ud-row("reparandum", "overridden disfluency")
+ud-row("root", "root")
+ud-row("vocative", "vocative")
+ud-row("xcomp", "open clausal complement")
+accordion("English", "dependency-parsing-english")
p
| The English dependency labels use the
| #[+a("https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md") CLEAR Style]
| by #[+a("http://www.clearnlp.com") ClearNLP].
+table(["Label", "Description"])
+dep-row("acl", "clausal modifier of noun (adjectival clause)")
+dep-row("acomp", "adjectival complement")
+dep-row("advcl", "adverbial clause modifier")
+dep-row("advmod", "adverbial modifier")
+dep-row("agent", "agent")
+dep-row("amod", "adjectival modifier")
+dep-row("appos", "appositional modifier")
+dep-row("attr", "attribute")
+dep-row("aux", "auxiliary")
+dep-row("auxpass", "auxiliary (passive)")
+dep-row("case", "case marking")
+dep-row("cc", "coordinating conjunction")
+dep-row("ccomp", "clausal complement")
+dep-row("compound", "compound")
+dep-row("conj", "conjunct")
+dep-row("cop", "copula")
+dep-row("csubj", "clausal subject")
+dep-row("csubjpass", "clausal subject (passive)")
+dep-row("dative", "dative")
+dep-row("dep", "unclassified dependent")
+dep-row("det", "determiner")
+dep-row("dobj", "direct object")
+dep-row("expl", "expletive")
+dep-row("intj", "interjection")
+dep-row("mark", "marker")
+dep-row("meta", "meta modifier")
+dep-row("neg", "negation modifier")
+dep-row("nn", "noun compound modifier")
+dep-row("nounmod", "modifier of nominal")
+dep-row("npmod", "noun phrase as adverbial modifier")
+dep-row("nsubj", "nominal subject")
+dep-row("nsubjpass", "nominal subject (passive)")
+dep-row("nummod", "numeric modifier")
+dep-row("oprd", "object predicate")
+dep-row("obj", "object")
+dep-row("obl", "oblique nominal")
+dep-row("parataxis", "parataxis")
+dep-row("pcomp", "complement of preposition")
+dep-row("pobj", "object of preposition")
+dep-row("poss", "possession modifier")
+dep-row("preconj", "pre-correlative conjunction")
+dep-row("prep", "prepositional modifier")
+dep-row("prt", "particle")
+dep-row("punct", "punctuation")
+dep-row("quantmod", "modifier of quantifier")
+dep-row("relcl", "relative clause modifier")
+dep-row("root", "root")
+dep-row("xcomp", "open clausal complement")
+accordion("German", "dependency-parsing-german")
p
| The German dependency labels use the
| #[+a("http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html") TIGER Treebank]
| annotation scheme.
+table(["Label", "Description"])
+dep-row("ac", "adpositional case marker")
+dep-row("adc", "adjective component")
+dep-row("ag", "genitive attribute")
+dep-row("ams", "measure argument of adjective")
+dep-row("app", "apposition")
+dep-row("avc", "adverbial phrase component")
+dep-row("cc", "comparative complement")
+dep-row("cd", "coordinating conjunction")
+dep-row("cj", "conjunct")
+dep-row("cm", "comparative conjunction")
+dep-row("cp", "complementizer")
+dep-row("cvc", "collocational verb construction")
+dep-row("da", "dative")
+dep-row("dh", "discourse-level head")
+dep-row("dm", "discourse marker")
+dep-row("ep", "expletive es")
+dep-row("hd", "head")
+dep-row("ju", "junctor")
+dep-row("mnr", "postnominal modifier")
+dep-row("mo", "modifier")
+dep-row("ng", "negation")
+dep-row("nk", "noun kernel element")
+dep-row("nmc", "numerical component")
+dep-row("oa", "accusative object")
+dep-row("oa", "second accusative object")
+dep-row("oc", "clausal object")
+dep-row("og", "genitive object")
+dep-row("op", "prepositional object")
+dep-row("par", "parenthetical element")
+dep-row("pd", "predicate")
+dep-row("pg", "phrasal genitive")
+dep-row("ph", "placeholder")
+dep-row("pm", "morphological particle")
+dep-row("pnc", "proper noun component")
+dep-row("rc", "relative clause")
+dep-row("re", "repeated element")
+dep-row("rs", "reported speech")
+dep-row("sb", "subject")
+dep-row("sbp", "passivised subject")
+dep-row("sp", "subject or predicate")
+dep-row("svp", "separable verb prefix")
+dep-row("uc", "unit component")
+dep-row("vo", "vocative")
+dep-row("ROOT", "root")

View File

@ -1,109 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > NAMED ENTITIES
p
| Models trained on the
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus
| support the following entity types:
+table(["Type", "Description"])
+row
+cell #[code PERSON]
+cell People, including fictional.
+row
+cell #[code NORP]
+cell Nationalities or religious or political groups.
+row
+cell #[code FAC]
+cell Buildings, airports, highways, bridges, etc.
+row
+cell #[code ORG]
+cell Companies, agencies, institutions, etc.
+row
+cell #[code GPE]
+cell Countries, cities, states.
+row
+cell #[code LOC]
+cell Non-GPE locations, mountain ranges, bodies of water.
+row
+cell #[code PRODUCT]
+cell Objects, vehicles, foods, etc. (Not services.)
+row
+cell #[code EVENT]
+cell Named hurricanes, battles, wars, sports events, etc.
+row
+cell #[code WORK_OF_ART]
+cell Titles of books, songs, etc.
+row
+cell #[code LAW]
+cell Named documents made into laws.
+row
+cell #[code LANGUAGE]
+cell Any named language.
+row
+cell #[code DATE]
+cell Absolute or relative dates or periods.
+row
+cell #[code TIME]
+cell Times smaller than a day.
+row
+cell #[code PERCENT]
+cell Percentage, including "%".
+row
+cell #[code MONEY]
+cell Monetary values, including unit.
+row
+cell #[code QUANTITY]
+cell Measurements, as of weight or distance.
+row
+cell #[code ORDINAL]
+cell "first", "second", etc.
+row
+cell #[code CARDINAL]
+cell Numerals that do not fall under another type.
+h(4, "ner-wikipedia-scheme") Wikipedia scheme
p
| Models trained on Wikipedia corpus
| (#[+a("http://www.sciencedirect.com/science/article/pii/S0004370212000276") Nothman et al., 2013])
| use a less fine-grained NER annotation scheme and recognise the
| following entities:
+table(["Type", "Description"])
+row
+cell #[code PER]
+cell Named person or family.
+row
+cell #[code LOC]
+cell
| Name of politically or geographically defined location (cities,
| provinces, countries, international regions, bodies of water,
| mountains).
+row
+cell #[code ORG]
+cell Named corporate, governmental, or other organizational entity.
+row
+cell #[code MISC]
+cell
| Miscellaneous entities, e.g. events, nationalities, products or
| works of art.

View File

@ -1,179 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > POS TAGS
p
| This section lists the fine-grained and coarse-grained part-of-speech
| tags assigned by spaCy's #[+a("/models") models]. The individual mapping
| is specific to the training corpus and can be defined in the respective
| language data's #[+a("/usage/adding-languages#tag-map") #[code tag_map.py]].
+accordion("Universal Part-of-speech Tags")
p
| spaCy also maps all language-specific part-of-speech tags to a small,
| fixed set of word type tags following the
| #[+a("http://universaldependencies.org/u/pos/") Universal Dependencies scheme].
| The universal tags don't code for any morphological features and only
| cover the word type. They're available as the
| #[+api("token#attributes") #[code Token.pos]] and
| #[+api("token#attributes") #[code Token.pos_]] attributes.
+table(["POS", "Description", "Examples"])
+ud-row("ADJ", "adjective", "big, old, green, incomprehensible, first")
+ud-row("ADP", "adposition", "in, to, during")
+ud-row("ADV", "adverb", "very, tomorrow, down, where, there")
+ud-row("AUX", "auxiliary", "is, has (done), will (do), should (do)")
+ud-row("CONJ", "conjunction", "and, or, but")
+ud-row("CCONJ", "coordinating conjunction", "and, or, but")
+ud-row("DET", "determiner", "a, an, the")
+ud-row("INTJ", "interjection", "psst, ouch, bravo, hello")
+ud-row("NOUN", "noun", "girl, cat, tree, air, beauty")
+ud-row("NUM", "numeral", "1, 2017, one, seventy-seven, IV, MMXIV")
+ud-row("PART", "particle", "'s, not, ")
+ud-row("PRON", "pronoun", "I, you, he, she, myself, themselves, somebody")
+ud-row("PROPN", "proper noun", "Mary, John, London, NATO, HBO")
+ud-row("PUNCT", "punctuation", "., (, ), ?")
+ud-row("SCONJ", "subordinating conjunction", "if, while, that")
+ud-row("SYM", "symbol", "$, %, §, ©, +, , ×, ÷, =, :), 😝")
+ud-row("VERB", "verb", "run, runs, running, eat, ate, eating")
+ud-row("X", "other", "sfpksdpsxmsa")
+ud-row("SPACE", "space", "")
+accordion("English", "pos-en")
p
| The English part-of-speech tagger uses the
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] version of
| the Penn Treebank tag set. We also map the tags to the simpler Google
| Universal POS tag set.
+table(["Tag", "POS", "Morphology", "Description"])
+pos-row("-LRB-", "PUNCT", "PunctType=brck PunctSide=ini", "left round bracket")
+pos-row("-RRB-", "PUNCT", "PunctType=brck PunctSide=fin", "right round bracket")
+pos-row(",", "PUNCT", "PunctType=comm", "punctuation mark, comma")
+pos-row(":", "PUNCT", "", "punctuation mark, colon or ellipsis")
+pos-row(".", "PUNCT", "PunctType=peri", "punctuation mark, sentence closer")
+pos-row("''", "PUNCT", "PunctType=quot PunctSide=fin", "closing quotation mark")
+pos-row("\"\"", "PUNCT", "PunctType=quot PunctSide=fin", "closing quotation mark")
+pos-row("#", "SYM", "SymType=numbersign", "symbol, number sign")
+pos-row("``", "PUNCT", "PunctType=quot PunctSide=ini", "opening quotation mark")
+pos-row("$", "SYM", "SymType=currency", "symbol, currency")
+pos-row("ADD", "X", "", "email")
+pos-row("AFX", "ADJ", "Hyph=yes", "affix")
+pos-row("BES", "VERB", "", 'auxiliary "be"')
+pos-row("CC", "CONJ", "ConjType=coor", "conjunction, coordinating")
+pos-row("CD", "NUM", "NumType=card", "cardinal number")
+pos-row("DT", "DET", "determiner")
+pos-row("EX", "ADV", "AdvType=ex", "existential there")
+pos-row("FW", "X", "Foreign=yes", "foreign word")
+pos-row("GW", "X", "", "additional word in multi-word expression")
+pos-row("HVS", "VERB", "", 'forms of "have"')
+pos-row("HYPH", "PUNCT", "PunctType=dash", "punctuation mark, hyphen")
+pos-row("IN", "ADP", "", "conjunction, subordinating or preposition")
+pos-row("JJ", "ADJ", "Degree=pos", "adjective")
+pos-row("JJR", "ADJ", "Degree=comp", "adjective, comparative")
+pos-row("JJS", "ADJ", "Degree=sup", "adjective, superlative")
+pos-row("LS", "PUNCT", "NumType=ord", "list item marker")
+pos-row("MD", "VERB", "VerbType=mod", "verb, modal auxiliary")
+pos-row("NFP", "PUNCT", "", "superfluous punctuation")
+pos-row("NIL", "", "", "missing tag")
+pos-row("NN", "NOUN", "Number=sing", "noun, singular or mass")
+pos-row("NNP", "PROPN", "NounType=prop Number=sign", "noun, proper singular")
+pos-row("NNPS", "PROPN", "NounType=prop Number=plur", "noun, proper plural")
+pos-row("NNS", "NOUN", "Number=plur", "noun, plural")
+pos-row("PDT", "ADJ", "AdjType=pdt PronType=prn", "predeterminer")
+pos-row("POS", "PART", "Poss=yes", "possessive ending")
+pos-row("PRP", "PRON", "PronType=prs", "pronoun, personal")
+pos-row("PRP$", "ADJ", "PronType=prs Poss=yes", "pronoun, possessive")
+pos-row("RB", "ADV", "Degree=pos", "adverb")
+pos-row("RBR", "ADV", "Degree=comp", "adverb, comparative")
+pos-row("RBS", "ADV", "Degree=sup", "adverb, superlative")
+pos-row("RP", "PART", "", "adverb, particle")
+pos-row("_SP", "SPACE", "", "space")
+pos-row("SYM", "SYM", "", "symbol")
+pos-row("TO", "PART", "PartType=inf VerbForm=inf", "infinitival to")
+pos-row("UH", "INTJ", "", "interjection")
+pos-row("VB", "VERB", "VerbForm=inf", "verb, base form")
+pos-row("VBD", "VERB", "VerbForm=fin Tense=past", "verb, past tense")
+pos-row("VBG", "VERB", "VerbForm=part Tense=pres Aspect=prog", "verb, gerund or present participle")
+pos-row("VBN", "VERB", "VerbForm=part Tense=past Aspect=perf", "verb, past participle")
+pos-row("VBP", "VERB", "VerbForm=fin Tense=pres", "verb, non-3rd person singular present")
+pos-row("VBZ", "VERB", "VerbForm=fin Tense=pres Number=sing Person=3", "verb, 3rd person singular present")
+pos-row("WDT", "ADJ", "PronType=int|rel", "wh-determiner")
+pos-row("WP", "NOUN", "PronType=int|rel", "wh-pronoun, personal")
+pos-row("WP$", "ADJ", "Poss=yes PronType=int|rel", "wh-pronoun, possessive")
+pos-row("WRB", "ADV", "PronType=int|rel", "wh-adverb")
+pos-row("XX", "X", "", "unknown")
+accordion("German", "pos-de")
p
| The German part-of-speech tagger uses the
| #[+a("http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html") TIGER Treebank]
| annotation scheme. We also map the tags to the simpler Google
| Universal POS tag set.
+table(["Tag", "POS", "Morphology", "Description"])
+pos-row("$(", "PUNCT", "PunctType=brck", "other sentence-internal punctuation mark")
+pos-row("$,", "PUNCT", "PunctType=comm", "comma")
+pos-row("$.", "PUNCT", "PunctType=peri", "sentence-final punctuation mark")
+pos-row("ADJA", "ADJ", "", "adjective, attributive")
+pos-row("ADJD", "ADJ", "Variant=short", "adjective, adverbial or predicative")
+pos-row("ADV", "ADV", "", "adverb")
+pos-row("APPO", "ADP", "AdpType=post", "postposition")
+pos-row("APPR", "ADP", "AdpType=prep", "preposition; circumposition left")
+pos-row("APPRART", "ADP", "AdpType=prep PronType=art", "preposition with article")
+pos-row("APZR", "ADP", "AdpType=circ", "circumposition right")
+pos-row("ART", "DET", "PronType=art", "definite or indefinite article")
+pos-row("CARD", "NUM", "NumType=card", "cardinal number")
+pos-row("FM", "X", "Foreign=yes", "foreign language material")
+pos-row("ITJ", "INTJ", "", "interjection")
+pos-row("KOKOM", "CONJ", "ConjType=comp", "comparative conjunction")
+pos-row("KON", "CONJ", "", "coordinate conjunction")
+pos-row("KOUI", "SCONJ", "", 'subordinate conjunction with "zu" and infinitive')
+pos-row("KOUS", "SCONJ", "", "subordinate conjunction with sentence")
+pos-row("NE", "PROPN", "", "proper noun")
+pos-row("NNE", "PROPN", "", "proper noun")
+pos-row("NN", "NOUN", "", "noun, singular or mass")
+pos-row("PAV", "ADV", "PronType=dem", "pronominal adverb")
+pos-row("PROAV", "ADV", "PronType=dem", "pronominal adverb")
+pos-row("PDAT", "DET", "PronType=dem", "attributive demonstrative pronoun")
+pos-row("PDS", "PRON", "PronType=dem", "substituting demonstrative pronoun")
+pos-row("PIAT", "DET", "PronType=ind|neg|tot", "attributive indefinite pronoun without determiner")
+pos-row("PIDAT", "DET", "AdjType=pdt PronType=ind|neg|tot", "attributive indefinite pronoun with determiner")
+pos-row("PIS", "PRON", "PronType=ind|neg|tot", "substituting indefinite pronoun")
+pos-row("PPER", "PRON", "PronType=prs", "non-reflexive personal pronoun")
+pos-row("PPOSAT", "DET", "Poss=yes PronType=prs", "attributive possessive pronoun")
+pos-row("PPOSS", "PRON", "PronType=rel", "substituting possessive pronoun")
+pos-row("PRELAT", "DET", "PronType=rel", "attributive relative pronoun")
+pos-row("PRELS", "PRON", "PronType=rel", "substituting relative pronoun")
+pos-row("PRF", "PRON", "PronType=prs Reflex=yes", "reflexive personal pronoun")
+pos-row("PTKA", "PART", "", "particle with adjective or adverb")
+pos-row("PTKANT", "PART", "PartType=res", "answer particle")
+pos-row("PTKNEG", "PART", "Negative=yes", "negative particle")
+pos-row("PTKVZ", "PART", "PartType=vbp", "separable verbal particle")
+pos-row("PTKZU", "PART", "PartType=inf", '"zu" before infinitive')
+pos-row("PWAT", "DET", "PronType=int", "attributive interrogative pronoun")
+pos-row("PWAV", "ADV", "PronType=int", "adverbial interrogative or relative pronoun")
+pos-row("PWS", "PRON", "PronType=int", "substituting interrogative pronoun")
+pos-row("TRUNC", "X", "Hyph=yes", "word remnant")
+pos-row("VAFIN", "AUX", "Mood=ind VerbForm=fin", "finite verb, auxiliary")
+pos-row("VAIMP", "AUX", "Mood=imp VerbForm=fin", "imperative, auxiliary")
+pos-row("VAINF", "AUX", "VerbForm=inf", "infinitive, auxiliary")
+pos-row("VAPP", "AUX", "Aspect=perf VerbForm=fin", "perfect participle, auxiliary")
+pos-row("VMFIN", "VERB", "Mood=ind VerbForm=fin VerbType=mod", "finite verb, modal")
+pos-row("VMINF", "VERB", "VerbForm=fin VerbType=mod", "infinitive, modal")
+pos-row("VMPP", "VERB", "Aspect=perf VerbForm=part VerbType=mod", "perfect participle, modal")
+pos-row("VVFIN", "VERB", "Mood=ind VerbForm=fin", "finite verb, full")
+pos-row("VVIMP", "VERB", "Mood=imp VerbForm=fin", "imperative, full")
+pos-row("VVINF", "VERB", "VerbForm=inf", "infinitive, full")
+pos-row("VVIZU", "VERB", "VerbForm=inf", 'infinitive with "zu", full')
+pos-row("VVPP", "VERB", "Aspect=perf VerbForm=part", "perfect participle, full")
+pos-row("XY", "X", "", "non-word containing non-letter")
+pos-row("SP", "SPACE", "", "space")
for _, lang in MODELS
- var exclude = ["en", "de", "xx"]
if !exclude.includes(lang)
- var lang_name = LANGUAGES[lang]
- var file_path = "lang/" + lang + "/tag_map.py"
+accordion(lang_name, "pos-" + lang)
p
| For details on the #{lang_name} tag map, see
| #[+src(gh("spacy", "spacy/" + file_path)) #[code=file_path]].

View File

@ -1,55 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > TEXT PROCESSING
+aside-code("Example").
from spacy.lang.en import English
nlp = English()
tokens = nlp('Some\nspaces and\ttab characters')
tokens_text = [t.text for t in tokens]
assert tokens_text == ['Some', '\n', 'spaces', ' ', 'and',
'\t', 'tab', 'characters']
p
| Tokenization standards are based on the
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus.
| The tokenizer differs from most by including
| #[strong tokens for significant whitespace]. Any sequence of
| whitespace characters beyond a single space (#[code ' ']) is included
| as a token. The whitespace tokens are useful for much the same reason
| punctuation is it's often an important delimiter in the text. By
| preserving it in the token output, we are able to maintain a simple
| alignment between the tokens and the original string, and we ensure
| that #[strong no information is lost] during processing.
+h(3, "lemmatization") Lemmatization
+aside("Examples")
| In English, this means:#[br]
| #[strong Adjectives]: happier, happiest &rarr; happy#[br]
| #[strong Adverbs]: worse, worst &rarr; badly#[br]
| #[strong Nouns]: dogs, children &rarr; dog, child#[br]
| #[strong Verbs]: writes, wirting, wrote, written &rarr; write
p
| A lemma is the uninflected form of a word. The English lemmatization
| data is taken from #[+a("https://wordnet.princeton.edu") WordNet].
| Lookup tables are taken from
| #[+a("http://www.lexiconista.com/datasets/lemmatization/") Lexiconista].
| spaCy also adds a #[strong special case for pronouns]: all pronouns
| are lemmatized to the special token #[code -PRON-].
+infobox("About spaCy's custom pronoun lemma", "⚠️")
| Unlike verbs and common nouns, there's no clear base form of a personal
| pronoun. Should the lemma of "me" be "I", or should we normalize person
| as well, giving "it" — or maybe "he"? spaCy's solution is to introduce a
| novel symbol, #[code -PRON-], which is used as the lemma for
| all personal pronouns.
+h(3, "sentence-boundary") Sentence boundary detection
p
| Sentence boundaries are calculated from the syntactic parse tree, so
| features such as punctuation and capitalisation play an important but
| non-decisive role in determining the sentence boundaries. Usually this
| means that the sentence boundaries will at least coincide with clause
| boundaries, even given poorly punctuated text.

View File

@ -1,104 +0,0 @@
//- 💫 DOCS > API > ANNOTATION > TRAINING
+h(3, "json-input") JSON input format for training
p
| spaCy takes training data in JSON format. The built-in
| #[+api("cli#convert") #[code convert]] command helps you convert the
| #[code .conllu] format used by the
| #[+a("https://github.com/UniversalDependencies") Universal Dependencies corpora]
| to spaCy's training format.
+aside("Annotating entities")
| Named entities are provided in the #[+a("/api/annotation#biluo") BILUO]
| notation. Tokens outside an entity are set to #[code "O"] and tokens
| that are part of an entity are set to the entity label, prefixed by the
| BILUO marker. For example #[code "B-ORG"] describes the first token of
| a multi-token #[code ORG] entity and #[code "U-PERSON"] a single
| token representing a #[code PERSON] entity. The
| #[+api("goldparse#biluo_tags_from_offsets") #[code biluo_tags_from_offsets]]
| function can help you convert entity offsets to the right format.
+code("Example structure").
[{
"id": int, # ID of the document within the corpus
"paragraphs": [{ # list of paragraphs in the corpus
"raw": string, # raw text of the paragraph
"sentences": [{ # list of sentences in the paragraph
"tokens": [{ # list of tokens in the sentence
"id": int, # index of the token in the document
"dep": string, # dependency label
"head": int, # offset of token head relative to token index
"tag": string, # part-of-speech tag
"orth": string, # verbatim text of the token
"ner": string # BILUO label, e.g. "O" or "B-ORG"
}],
"brackets": [{ # phrase structure (NOT USED by current models)
"first": int, # index of first token
"last": int, # index of last token
"label": string # phrase label
}]
}]
}]
}]
p
| Here's an example of dependencies, part-of-speech tags and names
| entities, taken from the English Wall Street Journal portion of the Penn
| Treebank:
+github("spacy", "examples/training/training-data.json", false, false, "json")
+h(3, "vocab-jsonl") Lexical data for vocabulary
+tag-new(2)
p
| To populate a model's vocabulary, you can use the
| #[+api("cli#vocab") #[code spacy vocab]] command and load in a
| #[+a("https://jsonlines.readthedocs.io/en/latest/") newline-delimited JSON]
| (JSONL) file containing one lexical entry per line. The first line
| defines the language and vocabulary settings. All other lines are
| expected to be JSON objects describing an individual lexeme. The lexical
| attributes will be then set as attributes on spaCy's
| #[+api("lexeme#attributes") #[code Lexeme]] object. The #[code vocab]
| command outputs a ready-to-use spaCy model with a #[code Vocab]
| containing the lexical data.
+code("First line").
{"lang": "en", "settings": {"oov_prob": -20.502029418945312}}
+code("Entry structure").
{
"orth": string,
"id": int,
"lower": string,
"norm": string,
"shape": string
"prefix": string,
"suffix": string,
"length": int,
"cluster": string,
"prob": float,
"is_alpha": bool,
"is_ascii": bool,
"is_digit": bool,
"is_lower": bool,
"is_punct": bool,
"is_space": bool,
"is_title": bool,
"is_upper": bool,
"like_url": bool,
"like_num": bool,
"like_email": bool,
"is_stop": bool,
"is_oov": bool,
"is_quote": bool,
"is_left_punct": bool,
"is_right_punct": bool
}
p
| Here's an example of the 20 most frequent lexemes in the English
| training data:
+github("spacy", "examples/training/vocab-data.jsonl", false, false, "json")

View File

@ -1,71 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > DOC
p
| The #[code Doc] object holds an array of
| #[+api("cython-structs#tokenc") #[code TokenC]] structs.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("doc") #[code Doc]].
+h(3, "doc_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code cymem.Pool]
+cell
| A memory pool. Allocated memory will be freed once the
| #[code Doc] object is garbage collected.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A reference to the shared #[code Vocab] object.
+row
+cell #[code c]
+cell #[code TokenC*]
+cell
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
| struct.
+row
+cell #[code length]
+cell #[code int]
+cell The number of tokens in the document.
+row
+cell #[code max_length]
+cell #[code int]
+cell The underlying size of the #[code Doc.c] array.
+h(3, "doc_push_back") Doc.push_back
+tag method
p
| Append a token to the #[code Doc]. The token can be provided as a
| #[+api("cython-structs#lexemec") #[code LexemeC]] or
| #[+api("cython-structs#tokenc") #[code TokenC]] pointer, using Cython's
| #[+a("http://cython.readthedocs.io/en/latest/src/userguide/fusedtypes.html") fused types].
+aside-code("Example").
from spacy.tokens cimport Doc
from spacy.vocab cimport Vocab
doc = Doc(Vocab())
lexeme = doc.vocab.get(u'hello')
doc.push_back(lexeme, True)
assert doc.text == u'hello '
+table(["Name", "Type", "Description"])
+row
+cell #[code lex_or_tok]
+cell #[code LexemeOrToken]
+cell The word to append to the #[code Doc].
+row
+cell #[code has_space]
+cell #[code bint]
+cell Whether the word has trailing whitespace.

View File

@ -1,30 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > LEXEME
p
| A Cython class providing access and methods for an entry in the
| vocabulary.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("lexeme") #[code Lexeme]].
+h(3, "lexeme_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code c]
+cell #[code LexemeC*]
+cell
| A pointer to a #[+api("cython-structs#lexemec") #[code LexemeC]]
| struct.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A reference to the shared #[code Vocab] object.
+row
+cell #[code orth]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell ID of the verbatim text content.

View File

@ -1,200 +0,0 @@
//- 💫 DOCS > API > CYTHON > STRUCTS > LEXEMEC
p
| Struct holding information about a lexical type. #[code LexemeC]
| structs are usually owned by the #[code Vocab], and accessed through a
| read-only pointer on the #[code TokenC] struct.
+aside-code("Example").
lex = doc.c[3].lex
+table(["Name", "Type", "Description"])
+row
+cell #[code flags]
+cell #[+abbr("uint64_t") #[code flags_t]]
+cell Bit-field for binary lexical flag values.
+row
+cell #[code id]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell
| Usually used to map lexemes to rows in a matrix, e.g. for word
| vectors. Does not need to be unique, so currently misnamed.
+row
+cell #[code length]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Number of unicode characters in the lexeme.
+row
+cell #[code orth]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell ID of the verbatim text content.
+row
+cell #[code lower]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell ID of the lowercase form of the lexeme.
+row
+cell #[code norm]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell ID of the lexeme's norm, i.e. a normalised form of the text.
+row
+cell #[code shape]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Transform of the lexeme's string, to show orthographic features.
+row
+cell #[code prefix]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell
| Length-N substring from the start of the lexeme. Defaults to
| #[code N=1].
+row
+cell #[code suffix]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell
| Length-N substring from the end of the lexeme. Defaults to
| #[code N=3].
+row
+cell #[code cluster]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Brown cluster ID.
+row
+cell #[code prob]
+cell #[code float]
+cell Smoothed log probability estimate of the lexeme's type.
+row
+cell #[code sentiment]
+cell #[code float]
+cell A scalar value indicating positivity or negativity.
+h(3, "lexeme_get_struct_attr", "spacy/lexeme.pxd") Lexeme.get_struct_attr
+tag staticmethod
+tag nogil
p Get the value of an attribute from the #[code LexemeC] struct by attribute ID.
+aside-code("Example").
from spacy.attrs cimport IS_ALPHA
from spacy.lexeme cimport Lexeme
lexeme = doc.c[3].lex
is_alpha = Lexeme.get_struct_attr(lexeme, IS_ALPHA)
+table(["Name", "Type", "Description"])
+row
+cell #[code lex]
+cell #[code const LexemeC*]
+cell A pointer to a #[code LexemeC] struct.
+row
+cell #[code feat_name]
+cell #[code attr_id_t]
+cell
| The ID of the attribute to look up. The attributes are
| enumerated in #[code spacy.typedefs].
+row("foot")
+cell returns
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell The value of the attribute.
+h(3, "lexeme_set_struct_attr", "spacy/lexeme.pxd") Lexeme.set_struct_attr
+tag staticmethod
+tag nogil
p Set the value of an attribute of the #[code LexemeC] struct by attribute ID.
+aside-code("Example").
from spacy.attrs cimport NORM
from spacy.lexeme cimport Lexeme
lexeme = doc.c[3].lex
Lexeme.set_struct_attr(lexeme, NORM, lexeme.lower)
+table(["Name", "Type", "Description"])
+row
+cell #[code lex]
+cell #[code const LexemeC*]
+cell A pointer to a #[code LexemeC] struct.
+row
+cell #[code feat_name]
+cell #[code attr_id_t]
+cell
| The ID of the attribute to look up. The attributes are
| enumerated in #[code spacy.typedefs].
+row
+cell #[code value]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell The value to set.
+h(3, "lexeme_c_check_flag", "spacy/lexeme.pxd") Lexeme.c_check_flag
+tag staticmethod
+tag nogil
p Check the value of a binary flag attribute.
+aside-code("Example").
from spacy.attrs cimport IS_STOP
from spacy.lexeme cimport Lexeme
lexeme = doc.c[3].lex
is_stop = Lexeme.c_check_flag(lexeme, IS_STOP)
+table(["Name", "Type", "Description"])
+row
+cell #[code lexeme]
+cell #[code const LexemeC*]
+cell A pointer to a #[code LexemeC] struct.
+row
+cell #[code flag_id]
+cell #[code attr_id_t]
+cell
| The ID of the flag to look up. The flag IDs are enumerated in
| #[code spacy.typedefs].
+row("foot")
+cell returns
+cell #[code bint]
+cell The boolean value of the flag.
+h(3, "lexeme_c_set_flag", "spacy/lexeme.pxd") Lexeme.c_set_flag
+tag staticmethod
+tag nogil
p Set the value of a binary flag attribute.
+aside-code("Example").
from spacy.attrs cimport IS_STOP
from spacy.lexeme cimport Lexeme
lexeme = doc.c[3].lex
Lexeme.c_set_flag(lexeme, IS_STOP, 0)
+table(["Name", "Type", "Description"])
+row
+cell #[code lexeme]
+cell #[code const LexemeC*]
+cell A pointer to a #[code LexemeC] struct.
+row
+cell #[code flag_id]
+cell #[code attr_id_t]
+cell
| The ID of the flag to look up. The flag IDs are enumerated in
| #[code spacy.typedefs].
+row
+cell #[code value]
+cell #[code bint]
+cell The value to set.

View File

@ -1,43 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > SPAN
p
| A Cython class providing access and methods for a slice of a #[code Doc]
| object.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("span") #[code Span]].
+h(3, "span_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code start]
+cell #[code int]
+cell The index of the first token of the span.
+row
+cell #[code end]
+cell #[code int]
+cell The index of the first token after the span.
+row
+cell #[code start_char]
+cell #[code int]
+cell The index of the first character of the span.
+row
+cell #[code end_char]
+cell #[code int]
+cell The index of the last character of the span.
+row
+cell #[code label]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell A label to attach to the span, e.g. for named entities.

View File

@ -1,23 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > STRINGSTORE
p A lookup table to retrieve strings by 64-bit hashes.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("stringstore") #[code StringStore]].
+h(3, "stringstore_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code cymem.Pool]
+cell
| A memory pool. Allocated memory will be freed once the
| #[code StringStore] object is garbage collected.
+row
+cell #[code keys]
+cell #[+abbr("vector[uint64_t]") #[code vector[hash_t]]]
+cell A list of hash values in the #[code StringStore].

View File

@ -1,73 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > TOKEN
p
| A Cython class providing access and methods for a
| #[+api("cython-structs#tokenc") #[code TokenC]] struct. Note that the
| #[code Token] object does not own the struct. It only receives a pointer
| to it.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("token") #[code Token]].
+h(3, "token_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A reference to the shared #[code Vocab] object.
+row
+cell #[code c]
+cell #[code TokenC*]
+cell
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
| struct.
+row
+cell #[code i]
+cell #[code int]
+cell The offset of the token within the document.
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+h(3, "token_cinit") Token.cinit
+tag method
p Create a #[code Token] object from a #[code TokenC*] pointer.
+aside-code("Example").
token = Token.cinit(&doc.c[3], doc, 3)
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A reference to the shared #[code Vocab].
+row
+cell #[code c]
+cell #[code TokenC*]
+cell
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
| struct.
+row
+cell #[code offset]
+cell #[code int]
+cell The offset of the token within the document.
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row("foot")
+cell returns
+cell #[code Token]
+cell The newly constructed object.

View File

@ -1,270 +0,0 @@
//- 💫 DOCS > API > CYTHON > STRUCTS > TOKENC
p
| Cython data container for the #[code Token] object.
+aside-code("Example").
token = &doc.c[3]
token_ptr = &doc.c[3]
+table(["Name", "Type", "Description"])
+row
+cell #[code lex]
+cell #[code const LexemeC*]
+cell A pointer to the lexeme for the token.
+row
+cell #[code morph]
+cell #[code uint64_t]
+cell An ID allowing lookup of morphological attributes.
+row
+cell #[code pos]
+cell #[code univ_pos_t]
+cell Coarse-grained part-of-speech tag.
+row
+cell #[code spacy]
+cell #[code bint]
+cell A binary value indicating whether the token has trailing whitespace.
+row
+cell #[code tag]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Fine-grained part-of-speech tag.
+row
+cell #[code idx]
+cell #[code int]
+cell The character offset of the token within the parent document.
+row
+cell #[code lemma]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Base form of the token, with no inflectional suffixes.
+row
+cell #[code sense]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Space for storing a word sense ID, currently unused.
+row
+cell #[code head]
+cell #[code int]
+cell Offset of the syntactic parent relative to the token.
+row
+cell #[code dep]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Syntactic dependency relation.
+row
+cell #[code l_kids]
+cell #[code uint32_t]
+cell Number of left children.
+row
+cell #[code r_kids]
+cell #[code uint32_t]
+cell Number of right children.
+row
+cell #[code l_edge]
+cell #[code uint32_t]
+cell Offset of the leftmost token of this token's syntactic descendents.
+row
+cell #[code r_edge]
+cell #[code uint32_t]
+cell Offset of the rightmost token of this token's syntactic descendents.
+row
+cell #[code sent_start]
+cell #[code int]
+cell
| Ternary value indicating whether the token is the first word of
| a sentence. #[code 0] indicates a missing value, #[code -1]
| indicates #[code False] and #[code 1] indicates #[code True]. The default value, 0,
| is interpretted as no sentence break. Sentence boundary detectors will usually
| set 0 for all tokens except tokens that follow a sentence boundary.
+row
+cell #[code ent_iob]
+cell #[code int]
+cell
| IOB code of named entity tag. #[code 0] indicates a missing
| value, #[code 1] indicates #[code I], #[code 2] indicates
| #[code 0] and #[code 3] indicates #[code B].
+row
+cell #[code ent_type]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell Named entity type.
+row
+cell #[code ent_id]
+cell #[+abbr("uint64_t") #[code hash_t]]
+cell
| ID of the entity the token is an instance of, if any. Currently
| not used, but potentially for coreference resolution.
+h(3, "token_get_struct_attr", "spacy/tokens/token.pxd") Token.get_struct_attr
+tag staticmethod
+tag nogil
p Get the value of an attribute from the #[code TokenC] struct by attribute ID.
+aside-code("Example").
from spacy.attrs cimport IS_ALPHA
from spacy.tokens cimport Token
is_alpha = Token.get_struct_attr(&doc.c[3], IS_ALPHA)
+table(["Name", "Type", "Description"])
+row
+cell #[code token]
+cell #[code const TokenC*]
+cell A pointer to a #[code TokenC] struct.
+row
+cell #[code feat_name]
+cell #[code attr_id_t]
+cell
| The ID of the attribute to look up. The attributes are
| enumerated in #[code spacy.typedefs].
+row("foot")
+cell returns
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell The value of the attribute.
+h(3, "token_set_struct_attr", "spacy/tokens/token.pxd") Token.set_struct_attr
+tag staticmethod
+tag nogil
p Set the value of an attribute of the #[code TokenC] struct by attribute ID.
+aside-code("Example").
from spacy.attrs cimport TAG
from spacy.tokens cimport Token
token = &doc.c[3]
Token.set_struct_attr(token, TAG, 0)
+table(["Name", "Type", "Description"])
+row
+cell #[code token]
+cell #[code const TokenC*]
+cell A pointer to a #[code TokenC] struct.
+row
+cell #[code feat_name]
+cell #[code attr_id_t]
+cell
| The ID of the attribute to look up. The attributes are
| enumerated in #[code spacy.typedefs].
+row
+cell #[code value]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell The value to set.
+h(3, "token_by_start", "spacy/tokens/doc.pxd") token_by_start
+tag function
p Find a token in a #[code TokenC*] array by the offset of its first character.
+aside-code("Example").
from spacy.tokens.doc cimport Doc, token_by_start
from spacy.vocab cimport Vocab
doc = Doc(Vocab(), words=[u'hello', u'world'])
assert token_by_start(doc.c, doc.length, 6) == 1
assert token_by_start(doc.c, doc.length, 4) == -1
+table(["Name", "Type", "Description"])
+row
+cell #[code tokens]
+cell #[code const TokenC*]
+cell A #[code TokenC*] array.
+row
+cell #[code length]
+cell #[code int]
+cell The number of tokens in the array.
+row
+cell #[code start_char]
+cell #[code int]
+cell The start index to search for.
+row("foot")
+cell returns
+cell #[code int]
+cell The index of the token in the array or #[code -1] if not found.
+h(3, "token_by_end", "spacy/tokens/doc.pxd") token_by_end
+tag function
p Find a token in a #[code TokenC*] array by the offset of its final character.
+aside-code("Example").
from spacy.tokens.doc cimport Doc, token_by_end
from spacy.vocab cimport Vocab
doc = Doc(Vocab(), words=[u'hello', u'world'])
assert token_by_end(doc.c, doc.length, 5) == 0
assert token_by_end(doc.c, doc.length, 1) == -1
+table(["Name", "Type", "Description"])
+row
+cell #[code tokens]
+cell #[code const TokenC*]
+cell A #[code TokenC*] array.
+row
+cell #[code length]
+cell #[code int]
+cell The number of tokens in the array.
+row
+cell #[code end_char]
+cell #[code int]
+cell The end index to search for.
+row("foot")
+cell returns
+cell #[code int]
+cell The index of the token in the array or #[code -1] if not found.
+h(3, "set_children_from_heads", "spacy/tokens/doc.pxd") set_children_from_heads
+tag function
p
| Set attributes that allow lookup of syntactic children on a
| #[code TokenC*] array. This function must be called after making changes
| to the #[code TokenC.head] attribute, in order to make the parse tree
| navigation consistent.
+aside-code("Example").
from spacy.tokens.doc cimport Doc, set_children_from_heads
from spacy.vocab cimport Vocab
doc = Doc(Vocab(), words=[u'Baileys', u'from', u'a', u'shoe'])
doc.c[0].head = 0
doc.c[1].head = 0
doc.c[2].head = 3
doc.c[3].head = 1
set_children_from_heads(doc.c, doc.length)
assert doc.c[3].l_kids == 1
+table(["Name", "Type", "Description"])
+row
+cell #[code tokens]
+cell #[code const TokenC*]
+cell A #[code TokenC*] array.
+row
+cell #[code length]
+cell #[code int]
+cell The number of tokens in the array.

View File

@ -1,88 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES > VOCAB
p
| A Cython class providing access and methods for a vocabulary and other
| data shared across a language.
+infobox
| This section documents the extra C-level attributes and methods that
| can't be accessed from Python. For the Python documentation, see
| #[+api("vocab") #[code Vocab]].
+h(3, "vocab_attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code cymem.Pool]
+cell
| A memory pool. Allocated memory will be freed once the
| #[code Vocab] object is garbage collected.
+row
+cell #[code strings]
+cell #[code StringStore]
+cell
| A #[code StringStore] that maps string to hash values and vice
| versa.
+row
+cell #[code length]
+cell #[code int]
+cell The number of entries in the vocabulary.
+h(3, "vocab_get") Vocab.get
+tag method
p
| Retrieve a #[+api("cython-structs#lexemec") #[code LexemeC*]] pointer
| from the vocabulary.
+aside-code("Example").
lexeme = vocab.get(vocab.mem, u'hello')
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code cymem.Pool]
+cell
| A memory pool. Allocated memory will be freed once the
| #[code Vocab] object is garbage collected.
+row
+cell #[code string]
+cell #[code unicode]
+cell The string of the word to look up.
+row("foot")
+cell returns
+cell #[code const LexemeC*]
+cell The lexeme in the vocabulary.
+h(3, "vocab_get_by_orth") Vocab.get_by_orth
+tag method
p
| Retrieve a #[+api("cython-structs#lexemec") #[code LexemeC*]] pointer
| from the vocabulary.
+aside-code("Example").
lexeme = vocab.get_by_orth(doc[0].lex.norm)
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code cymem.Pool]
+cell
| A memory pool. Allocated memory will be freed once the
| #[code Vocab] object is garbage collected.
+row
+cell #[code orth]
+cell #[+abbr("uint64_t") #[code attr_t]]
+cell ID of the verbatim text content.
+row("foot")
+cell returns
+cell #[code const LexemeC*]
+cell The lexeme in the vocabulary.

View File

@ -1,251 +0,0 @@
{
"sidebar": {
"Overview": {
"Architecture": "./",
"Annotation Specs": "annotation",
"Command Line": "cli",
"Functions": "top-level"
},
"Containers": {
"Doc": "doc",
"Token": "token",
"Span": "span",
"Lexeme": "lexeme"
},
"Pipeline": {
"Language": "language",
"Pipe": "pipe",
"Tagger": "tagger",
"DependencyParser": "dependencyparser",
"EntityRecognizer": "entityrecognizer",
"TextCategorizer": "textcategorizer",
"Tokenizer": "tokenizer",
"Lemmatizer": "lemmatizer",
"Matcher": "matcher",
"PhraseMatcher": "phrasematcher"
},
"Other": {
"Vocab": "vocab",
"StringStore": "stringstore",
"Vectors": "vectors",
"GoldParse": "goldparse",
"GoldCorpus": "goldcorpus"
},
"Cython": {
"Architecture": "cython",
"Structs": "cython-structs",
"Classes": "cython-classes"
}
},
"index": {
"title": "Architecture",
"next": "annotation",
"menu": {
"Basics": "basics",
"Neural Network Model": "nn-model"
}
},
"cli": {
"title": "Command Line Interface",
"teaser": "Download, train and package models, and debug spaCy.",
"source": "spacy/cli"
},
"top-level": {
"title": "Top-level Functions",
"menu": {
"spacy": "spacy",
"displacy": "displacy",
"Utility Functions": "util",
"Compatibility": "compat"
}
},
"language": {
"title": "Language",
"tag": "class",
"teaser": "A text-processing pipeline.",
"source": "spacy/language.py"
},
"doc": {
"title": "Doc",
"tag": "class",
"teaser": "A container for accessing linguistic annotations.",
"source": "spacy/tokens/doc.pyx"
},
"token": {
"title": "Token",
"tag": "class",
"source": "spacy/tokens/token.pyx"
},
"span": {
"title": "Span",
"tag": "class",
"source": "spacy/tokens/span.pyx"
},
"lexeme": {
"title": "Lexeme",
"tag": "class",
"source": "spacy/lexeme.pyx"
},
"vocab": {
"title": "Vocab",
"teaser": "A storage class for vocabulary and other data shared across a language.",
"tag": "class",
"source": "spacy/vocab.pyx"
},
"stringstore": {
"title": "StringStore",
"tag": "class",
"source": "spacy/strings.pyx"
},
"matcher": {
"title": "Matcher",
"teaser": "Match sequences of tokens, based on pattern rules.",
"tag": "class",
"source": "spacy/matcher.pyx"
},
"phrasematcher": {
"title": "PhraseMatcher",
"teaser": "Match sequences of tokens, based on documents.",
"tag": "class",
"tag_new": 2,
"source": "spacy/matcher.pyx"
},
"pipe": {
"title": "Pipe",
"teaser": "Abstract base class defining the API for pipeline components.",
"tag": "class",
"tag_new": 2,
"source": "spacy/pipeline.pyx"
},
"dependenyparser": {
"title": "DependencyParser",
"tag": "class",
"source": "spacy/pipeline.pyx"
},
"entityrecognizer": {
"title": "EntityRecognizer",
"teaser": "Annotate named entities on documents.",
"tag": "class",
"source": "spacy/pipeline.pyx"
},
"textcategorizer": {
"title": "TextCategorizer",
"teaser": "Add text categorization models to spaCy pipelines.",
"tag": "class",
"tag_new": 2,
"source": "spacy/pipeline.pyx"
},
"dependencyparser": {
"title": "DependencyParser",
"teaser": "Annotate syntactic dependencies on documents.",
"tag": "class",
"source": "spacy/pipeline.pyx"
},
"tokenizer": {
"title": "Tokenizer",
"teaser": "Segment text into words, punctuations marks etc.",
"tag": "class",
"source": "spacy/tokenizer.pyx"
},
"lemmatizer": {
"title": "Lemmatizer",
"teaser": "Assign the base forms of words.",
"tag": "class",
"source": "spacy/lemmatizer.py"
},
"tagger": {
"title": "Tagger",
"teaser": "Annotate part-of-speech tags on documents.",
"tag": "class",
"source": "spacy/pipeline.pyx"
},
"goldparse": {
"title": "GoldParse",
"tag": "class",
"source": "spacy/gold.pyx"
},
"goldcorpus": {
"title": "GoldCorpus",
"teaser": "An annotated corpus, using the JSON file format.",
"tag": "class",
"tag_new": 2,
"source": "spacy/gold.pyx"
},
"vectors": {
"title": "Vectors",
"teaser": "Store, save and load word vectors.",
"tag": "class",
"tag_new": 2,
"source": "spacy/vectors.pyx"
},
"annotation": {
"title": "Annotation Specifications",
"teaser": "Schemes used for labels, tags and training data.",
"menu": {
"Text Processing": "text-processing",
"POS Tagging": "pos-tagging",
"Dependencies": "dependency-parsing",
"Named Entities": "named-entities",
"Models & Training": "training"
}
},
"cython": {
"title": "Cython Architecture",
"next": "cython-structs",
"menu": {
"Overview": "overview",
"Conventions": "conventions"
}
},
"cython-structs": {
"title": "Cython Structs",
"teaser": "C-language objects that let you group variables together in a single contiguous block.",
"next": "cython-classes",
"menu": {
"TokenC": "tokenc",
"LexemeC": "lexemec"
}
},
"cython-classes": {
"title": "Cython Classes",
"menu": {
"Doc": "doc",
"Token": "token",
"Span": "span",
"Lexeme": "lexeme",
"Vocab": "vocab",
"StringStore": "stringstore"
}
}
}

View File

@ -1,84 +0,0 @@
//- 💫 DOCS > API > TOP-LEVEL > COMPATIBILITY
p
| All Python code is written in an
| #[strong intersection of Python 2 and Python 3]. This is easy in Cython,
| but somewhat ugly in Python. Logic that deals with Python or platform
| compatibility only lives in #[code spacy.compat]. To distinguish them from
| the builtin functions, replacement functions are suffixed with an
| underscore, e.e #[code unicode_].
+aside-code("Example").
from spacy.compat import unicode_
compatible_unicode = unicode_('hello world')
+table(["Name", "Python 2", "Python 3"])
+row
+cell #[code compat.bytes_]
+cell #[code str]
+cell #[code bytes]
+row
+cell #[code compat.unicode_]
+cell #[code unicode]
+cell #[code str]
+row
+cell #[code compat.basestring_]
+cell #[code basestring]
+cell #[code str]
+row
+cell #[code compat.input_]
+cell #[code raw_input]
+cell #[code input]
+row
+cell #[code compat.path2str]
+cell #[code str(path)] with #[code .decode('utf8')]
+cell #[code str(path)]
+h(3, "is_config") compat.is_config
+tag function
p
| Check if a specific configuration of Python version and operating system
| matches the user's setup. Mostly used to display targeted error messages.
+aside-code("Example").
from spacy.compat import is_config
if is_config(python2=True, windows=True):
print("You are using Python 2 on Windows.")
+table(["Name", "Type", "Description"])
+row
+cell #[code python2]
+cell bool
+cell spaCy is executed with Python 2.x.
+row
+cell #[code python3]
+cell bool
+cell spaCy is executed with Python 3.x.
+row
+cell #[code windows]
+cell bool
+cell spaCy is executed on Windows.
+row
+cell #[code linux]
+cell bool
+cell spaCy is executed on Linux.
+row
+cell #[code osx]
+cell bool
+cell spaCy is executed on OS X or macOS.
+row("foot")
+cell returns
+cell bool
+cell Whether the specified configuration matches the user's platform.

View File

@ -1,259 +0,0 @@
//- 💫 DOCS > API > TOP-LEVEL > DISPLACY
p
| As of v2.0, spaCy comes with a built-in visualization suite. For more
| info and examples, see the usage guide on
| #[+a("/usage/visualizers") visualizing spaCy].
+h(3, "displacy.serve") displacy.serve
+tag method
+tag-new(2)
p
| Serve a dependency parse tree or named entity visualization to view it
| in your browser. Will run a simple web server.
+aside-code("Example").
import spacy
from spacy import displacy
nlp = spacy.load('en')
doc1 = nlp(u'This is a sentence.')
doc2 = nlp(u'This is another sentence.')
displacy.serve([doc1, doc2], style='dep')
+table(["Name", "Type", "Description", "Default"])
+row
+cell #[code docs]
+cell list, #[code Doc], #[code Span]
+cell Document(s) to visualize.
+cell
+row
+cell #[code style]
+cell unicode
+cell Visualization style, #[code 'dep'] or #[code 'ent'].
+cell #[code 'dep']
+row
+cell #[code page]
+cell bool
+cell Render markup as full HTML page.
+cell #[code True]
+row
+cell #[code minify]
+cell bool
+cell Minify HTML markup.
+cell #[code False]
+row
+cell #[code options]
+cell dict
+cell #[+a("#options") Visualizer-specific options], e.g. colors.
+cell #[code {}]
+row
+cell #[code manual]
+cell bool
+cell
| Don't parse #[code Doc] and instead, expect a dict or list of
| dicts. #[+a("/usage/visualizers#manual-usage") See here]
| for formats and examples.
+cell #[code False]
+row
+cell #[code port]
+cell int
+cell Port to serve visualization.
+cell #[code 5000]
+row
+cell #[code host]
+cell unicode
+cell Host to serve visualization.
+cell #[code '0.0.0.0']
+h(3, "displacy.render") displacy.render
+tag method
+tag-new(2)
p Render a dependency parse tree or named entity visualization.
+aside-code("Example").
import spacy
from spacy import displacy
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')
html = displacy.render(doc, style='dep')
+table(["Name", "Type", "Description", "Default"])
+row
+cell #[code docs]
+cell list, #[code Doc], #[code Span]
+cell Document(s) to visualize.
+cell
+row
+cell #[code style]
+cell unicode
+cell Visualization style, #[code 'dep'] or #[code 'ent'].
+cell #[code 'dep']
+row
+cell #[code page]
+cell bool
+cell Render markup as full HTML page.
+cell #[code False]
+row
+cell #[code minify]
+cell bool
+cell Minify HTML markup.
+cell #[code False]
+row
+cell #[code jupyter]
+cell bool
+cell
| Explicitly enable "#[+a("http://jupyter.org/") Jupyter] mode" to
| return markup ready to be rendered in a notebook.
+cell detected automatically
+row
+cell #[code options]
+cell dict
+cell #[+a("#options") Visualizer-specific options], e.g. colors.
+cell #[code {}]
+row
+cell #[code manual]
+cell bool
+cell
| Don't parse #[code Doc] and instead, expect a dict or list of
| dicts. #[+a("/usage/visualizers#manual-usage") See here]
| for formats and examples.
+cell #[code False]
+row("foot")
+cell returns
+cell unicode
+cell Rendered HTML markup.
+cell
+h(3, "displacy_options") Visualizer options
p
| The #[code options] argument lets you specify additional settings for
| each visualizer. If a setting is not present in the options, the default
| value will be used.
+h(4, "options-dep") Dependency Visualizer options
+aside-code("Example").
options = {'compact': True, 'color': 'blue'}
displacy.serve(doc, style='dep', options=options)
+table(["Name", "Type", "Description", "Default"])
+row
+cell #[code collapse_punct]
+cell bool
+cell
| Attach punctuation to tokens. Can make the parse more readable,
| as it prevents long arcs to attach punctuation.
+cell #[code True]
+row
+cell #[code collapse_phrases]
+cell bool
+cell Merge noun phrases into one token.
+cell #[code False]
+row
+cell #[code compact]
+cell bool
+cell "Compact mode" with square arrows that takes up less space.
+cell #[code False]
+row
+cell #[code color]
+cell unicode
+cell Text color (HEX, RGB or color names).
+cell #[code '#000000']
+row
+cell #[code bg]
+cell unicode
+cell Background color (HEX, RGB or color names).
+cell #[code '#ffffff']
+row
+cell #[code font]
+cell unicode
+cell Font name or font family for all text.
+cell #[code 'Arial']
+row
+cell #[code offset_x]
+cell int
+cell Spacing on left side of the SVG in px.
+cell #[code 50]
+row
+cell #[code arrow_stroke]
+cell int
+cell Width of arrow path in px.
+cell #[code 2]
+row
+cell #[code arrow_width]
+cell int
+cell Width of arrow head in px.
+cell #[code 10] / #[code 8] (compact)
+row
+cell #[code arrow_spacing]
+cell int
+cell Spacing between arrows in px to avoid overlaps.
+cell #[code 20] / #[code 12] (compact)
+row
+cell #[code word_spacing]
+cell int
+cell Vertical spacing between words and arcs in px.
+cell #[code 45]
+row
+cell #[code distance]
+cell int
+cell Distance between words in px.
+cell #[code 175] / #[code 85] (compact)
+h(4, "displacy_options-ent") Named Entity Visualizer options
+aside-code("Example").
options = {'ents': ['PERSON', 'ORG', 'PRODUCT'],
'colors': {'ORG': 'yellow'}}
displacy.serve(doc, style='ent', options=options)
+table(["Name", "Type", "Description", "Default"])
+row
+cell #[code ents]
+cell list
+cell
| Entity types to highlight (#[code None] for all types).
+cell #[code None]
+row
+cell #[code colors]
+cell dict
+cell
| Color overrides. Entity types in uppercase should be mapped to
| color names or values.
+cell #[code {}]
p
| By default, displaCy comes with colours for all
| #[+a("/api/annotation#named-entities") entity types supported by spaCy].
| If you're using custom entity types, you can use the #[code colors]
| setting to add your own colours for them.

View File

@ -1,201 +0,0 @@
//- 💫 DOCS > API > TOP-LEVEL > SPACY
+h(3, "spacy.load") spacy.load
+tag function
+tag-model
p
| Load a model via its #[+a("/usage/models#usage") shortcut link],
| the name of an installed
| #[+a("/usage/training#models-generating") model package], a unicode
| path or a #[code Path]-like object. spaCy will try resolving the load
| argument in this order. If a model is loaded from a shortcut link or
| package name, spaCy will assume it's a Python package and import it and
| call the model's own #[code load()] method. If a model is loaded from a
| path, spaCy will assume it's a data directory, read the language and
| pipeline settings off the meta.json and initialise the #[code Language]
| class. The data will be loaded in via
| #[+api("language#from_disk") #[code Language.from_disk()]].
+aside-code("Example").
nlp = spacy.load('en') # shortcut link
nlp = spacy.load('en_core_web_sm') # package
nlp = spacy.load('/path/to/en') # unicode path
nlp = spacy.load(Path('/path/to/en')) # pathlib Path
nlp = spacy.load('en', disable=['parser', 'tagger'])
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode or #[code Path]
+cell Model to load, i.e. shortcut link, package name or path.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell A #[code Language] object with the loaded model.
p
| Essentially, #[code spacy.load()] is a convenience wrapper that reads
| the language ID and pipeline components from a model's #[code meta.json],
| initialises the #[code Language] class, loads in the model data and
| returns it.
+code("Abstract example").
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en'
nlp = cls() # initialise the language
for name in pipeline:
component = nlp.create_pipe(name) # create each pipeline component
nlp.add_pipe(component) # add component to pipeline
nlp.from_disk(model_data_path) # load in model data
+infobox("Changed in v2.0", "⚠️")
| As of spaCy 2.0, the #[code path] keyword argument is deprecated. spaCy
| will also raise an error if no model could be loaded and never just
| return an empty #[code Language] object. If you need a blank language,
| you can use the new function #[+api("spacy#blank") #[code spacy.blank()]]
| or import the class explicitly, e.g.
| #[code from spacy.lang.en import English].
+code-wrapper
+code-new nlp = spacy.load('/model')
+code-old nlp = spacy.load('en', path='/model')
+h(3, "spacy.blank") spacy.blank
+tag function
+tag-new(2)
p
| Create a blank model of a given language class. This function is the
| twin of #[code spacy.load()].
+aside-code("Example").
nlp_en = spacy.blank('en')
nlp_de = spacy.blank('de')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code]
| of the language class to load.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell An empty #[code Language] object of the appropriate subclass.
+h(4, "spacy.info") spacy.info
+tag function
p
| The same as the #[+api("cli#info") #[code info] command]. Pretty-print
| information about your installation, models and local setup from within
| spaCy. To get the model meta data as a dictionary instead, you can
| use the #[code meta] attribute on your #[code nlp] object with a
| loaded model, e.g. #[code nlp.meta].
+aside-code("Example").
spacy.info()
spacy.info('en')
spacy.info('de', markdown=True)
+table(["Name", "Type", "Description"])
+row
+cell #[code model]
+cell unicode
+cell A model, i.e. shortcut link, package name or path (optional).
+row
+cell #[code markdown]
+cell bool
+cell Print information as Markdown.
+h(3, "spacy.explain") spacy.explain
+tag function
p
| Get a description for a given POS tag, dependency label or entity type.
| For a list of available terms, see
| #[+src(gh("spacy", "spacy/glossary.py")) #[code glossary.py]].
+aside-code("Example").
spacy.explain(u'NORP')
# Nationalities or religious or political groups
doc = nlp(u'Hello world')
for word in doc:
print(word.text, word.tag_, spacy.explain(word.tag_))
# Hello UH interjection
# world NN noun, singular or mass
+table(["Name", "Type", "Description"])
+row
+cell #[code term]
+cell unicode
+cell Term to explain.
+row("foot")
+cell returns
+cell unicode
+cell The explanation, or #[code None] if not found in the glossary.
+h(3, "spacy.prefer_gpu") spacy.prefer_gpu
+tag function
+tag-new("2.0.14")
p
| Allocate data and perform operations on #[+a("/usage/#gpu") GPU], if
| available. If data has already been allocated on CPU, it will not be
| moved. Ideally, this function should be called right after
| importing spaCy and #[em before] loading any models.
+aside-code("Example").
import spacy
activated = spacy.prefer_gpu()
nlp = spacy.load('en_core_web_sm')
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the GPU was activated.
+h(3, "spacy.require_gpu") spacy.require_gpu
+tag function
+tag-new("2.0.14")
p
| Allocate data and perform operations on #[+a("/usage/#gpu") GPU]. Will
| raise an error if no GPU is available. If data has already been allocated
| on CPU, it will not be moved. Ideally, this function should be called
| right after importing spaCy and #[em before] loading any models.
+aside-code("Example").
import spacy
spacy.require_gpu()
nlp = spacy.load('en_core_web_sm')
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell #[code True]

View File

@ -1,454 +0,0 @@
//- 💫 DOCS > API > TOP-LEVEL > UTIL
p
| spaCy comes with a small collection of utility functions located in
| #[+src(gh("spaCy", "spacy/util.py")) #[code spacy/util.py]].
| Because utility functions are mostly intended for
| #[strong internal use within spaCy], their behaviour may change with
| future releases. The functions documented on this page should be safe
| to use and we'll try to ensure backwards compatibility. However, we
| recommend having additional tests in place if your application depends on
| any of spaCy's utilities.
+h(3, "util.get_data_path") util.get_data_path
+tag function
p
| Get path to the data directory where spaCy looks for models. Defaults to
| #[code spacy/data].
+table(["Name", "Type", "Description"])
+row
+cell #[code require_exists]
+cell bool
+cell Only return path if it exists, otherwise return #[code None].
+row("foot")
+cell returns
+cell #[code Path] / #[code None]
+cell Data path or #[code None].
+h(3, "util.set_data_path") util.set_data_path
+tag function
p
| Set custom path to the data directory where spaCy looks for models.
+aside-code("Example").
util.set_data_path('/custom/path')
util.get_data_path()
# PosixPath('/custom/path')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell Path to new data directory.
+h(3, "util.get_lang_class") util.get_lang_class
+tag function
p
| Import and load a #[code Language] class. Allows lazy-loading
| #[+a("/usage/adding-languages") language data] and importing
| languages using the two-letter language code. To add a language code
| for a custom language class, you can use the
| #[+api("top-level#util.set_lang_class") #[code set_lang_class]] helper.
+aside-code("Example").
for lang_id in ['en', 'de']:
lang_class = util.get_lang_class(lang_id)
lang = lang_class()
tokenizer = lang.Defaults.create_tokenizer()
+table(["Name", "Type", "Description"])
+row
+cell #[code lang]
+cell unicode
+cell Two-letter language code, e.g. #[code 'en'].
+row("foot")
+cell returns
+cell #[code Language]
+cell Language class.
+h(3, "util.set_lang_class") util.set_lang_class
+tag function
p
| Set a custom #[code Language] class name that can be loaded via
| #[+api("top-level#util.get_lang_class") #[code get_lang_class]]. If
| your model uses a custom language, this is required so that spaCy can
| load the correct class from the two-letter language code.
+aside-code("Example").
from spacy.lang.xy import CustomLanguage
util.set_lang_class('xy', CustomLanguage)
lang_class = util.get_lang_class('xy')
nlp = lang_class()
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Two-letter language code, e.g. #[code 'en'].
+row
+cell #[code cls]
+cell #[code Language]
+cell The language class, e.g. #[code English].
+h(3, "util.load_model") util.load_model
+tag function
+tag-new(2)
p
| Load a model from a shortcut link, package or data path. If called with a
| shortcut link or package name, spaCy will assume the model is a Python
| package and import and call its #[code load()] method. If called with a
| path, spaCy will assume it's a data directory, read the language and
| pipeline settings from the meta.json and initialise a #[code Language]
| class. The model data will then be loaded in via
| #[+api("language#from_disk") #[code Language.from_disk()]].
+aside-code("Example").
nlp = util.load_model('en')
nlp = util.load_model('en_core_web_sm', disable=['ner'])
nlp = util.load_model('/path/to/data')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Package name, shortcut link or model path.
+row
+cell #[code **overrides]
+cell -
+cell Specific overrides, like pipeline components to disable.
+row("foot")
+cell returns
+cell #[code Language]
+cell #[code Language] class with the loaded model.
+h(3, "util.load_model_from_path") util.load_model_from_path
+tag function
+tag-new(2)
p
| Load a model from a data directory path. Creates the
| #[+api("language") #[code Language]] class and pipeline based on the
| directory's meta.json and then calls
| #[+api("language#from_disk") #[code from_disk()]] with the path. This
| function also makes it easy to test a new model that you haven't packaged
| yet.
+aside-code("Example").
nlp = load_model_from_path('/path/to/data')
+table(["Name", "Type", "Description"])
+row
+cell #[code model_path]
+cell unicode
+cell Path to model data directory.
+row
+cell #[code meta]
+cell dict
+cell
| Model meta data. If #[code False], spaCy will try to load the
| meta from a meta.json in the same directory.
+row
+cell #[code **overrides]
+cell -
+cell Specific overrides, like pipeline components to disable.
+row("foot")
+cell returns
+cell #[code Language]
+cell #[code Language] class with the loaded model.
+h(3, "util.load_model_from_init_py") util.load_model_from_init_py
+tag function
+tag-new(2)
p
| A helper function to use in the #[code load()] method of a model package's
| #[+src(gh("spacy-models", "template/model/xx_model_name/__init__.py")) #[code __init__.py]].
+aside-code("Example").
from spacy.util import load_model_from_init_py
def load(**overrides):
return load_model_from_init_py(__file__, **overrides)
+table(["Name", "Type", "Description"])
+row
+cell #[code init_file]
+cell unicode
+cell Path to model's __init__.py, i.e. #[code __file__].
+row
+cell #[code **overrides]
+cell -
+cell Specific overrides, like pipeline components to disable.
+row("foot")
+cell returns
+cell #[code Language]
+cell #[code Language] class with the loaded model.
+h(3, "util.get_model_meta") util.get_model_meta
+tag function
+tag-new(2)
p
| Get a model's meta.json from a directory path and validate its contents.
+aside-code("Example").
meta = util.get_model_meta('/path/to/model')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell Path to model directory.
+row("foot")
+cell returns
+cell dict
+cell The model's meta data.
+h(3, "util.is_package") util.is_package
+tag function
p
| Check if string maps to a package installed via pip. Mainly used to
| validate #[+a("/usage/models") model packages].
+aside-code("Example").
util.is_package('en_core_web_sm') # True
util.is_package('xyz') # False
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of package.
+row("foot")
+cell returns
+cell #[code bool]
+cell #[code True] if installed package, #[code False] if not.
+h(3, "util.get_package_path") util.get_package_path
+tag function
+tag-new(2)
p
| Get path to an installed package. Mainly used to resolve the location of
| #[+a("/usage/models") model packages]. Currently imports the package
| to find its path.
+aside-code("Example").
util.get_package_path('en_core_web_sm')
# /usr/lib/python3.6/site-packages/en_core_web_sm
+table(["Name", "Type", "Description"])
+row
+cell #[code package_name]
+cell unicode
+cell Name of installed package.
+row("foot")
+cell returns
+cell #[code Path]
+cell Path to model package directory.
+h(3, "util.is_in_jupyter") util.is_in_jupyter
+tag function
+tag-new(2)
p
| Check if user is running spaCy from a #[+a("https://jupyter.org") Jupyter]
| notebook by detecting the IPython kernel. Mainly used for the
| #[+api("top-level#displacy") #[code displacy]] visualizer.
+aside-code("Example").
html = '&lt;h1&gt;Hello world!&lt;/h1&gt;'
if util.is_in_jupyter():
from IPython.core.display import display, HTML
display(HTML(html))
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell #[code True] if in Jupyter, #[code False] if not.
+h(3, "util.update_exc") util.update_exc
+tag function
p
| Update, validate and overwrite
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions].
| Used to combine global exceptions with custom, language-specific
| exceptions. Will raise an error if key doesn't match #[code ORTH] values.
+aside-code("Example").
BASE = {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
NEW = {"a.": [{ORTH: "a.", LEMMA: "all"}]}
exceptions = util.update_exc(BASE, NEW)
# {"a.": [{ORTH: "a.", LEMMA: "all"}], ":)": [{ORTH: ":)"}]}
+table(["Name", "Type", "Description"])
+row
+cell #[code base_exceptions]
+cell dict
+cell Base tokenizer exceptions.
+row
+cell #[code *addition_dicts]
+cell dicts
+cell Exception dictionaries to add to the base exceptions, in order.
+row("foot")
+cell returns
+cell dict
+cell Combined tokenizer exceptions.
+h(3, "util.minibatch") util.minibatch
+tag function
+tag-new(2)
p
| Iterate over batches of items. #[code size] may be an iterator, so that
| batch-size can vary on each step.
+aside-code("Example").
batches = minibatch(train_data)
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations)
+table(["Name", "Type", "Description"])
+row
+cell #[code items]
+cell iterable
+cell The items to batch up.
+row
+cell #[code size]
+cell int / iterable
+cell
| The batch size(s). Use
| #[+api("top-level#util.compounding") #[code util.compounding]] or
| #[+api("top-level#util.decaying") #[code util.decaying]] or
| for an infinite series of compounding or decaying values.
+row("foot")
+cell yields
+cell list
+cell The batches.
+h(3, "util.compounding") util.compounding
+tag function
+tag-new(2)
p
| Yield an infinite series of compounding values. Each time the generator
| is called, a value is produced by multiplying the previous value by the
| compound rate.
+aside-code("Example").
sizes = compounding(1., 10., 1.5)
assert next(sizes) == 1.
assert next(sizes) == 1. * 1.5
assert next(sizes) == 1.5 * 1.5
+table(["Name", "Type", "Description"])
+row
+cell #[code start]
+cell int / float
+cell The first value.
+row
+cell #[code stop]
+cell int / float
+cell The maximum value.
+row
+cell #[code compound]
+cell int / float
+cell The compounding factor.
+row("foot")
+cell yields
+cell int
+cell Compounding values.
+h(3, "util.decaying") util.decaying
+tag function
+tag-new(2)
p
| Yield an infinite series of linearly decaying values.
+aside-code("Example").
sizes = decaying(1., 10., 0.001)
assert next(sizes) == 1.
assert next(sizes) == 1. - 0.001
assert next(sizes) == 0.999 - 0.001
+table(["Name", "Type", "Description"])
+row
+cell #[code start]
+cell int / float
+cell The first value.
+row
+cell #[code end]
+cell int / float
+cell The maximum value.
+row
+cell #[code decay]
+cell int / float
+cell The decaying factor.
+row("foot")
+cell yields
+cell int
+cell The decaying values.
+h(3, "util.itershuffle") util.itershuffle
+tag function
+tag-new(2)
p
| Shuffle an iterator. This works by holding #[code bufsize] items back and
| yielding them sometime later. Obviously, this is not unbiased but
| should be good enough for batching. Larger bufsize means less bias.
+aside-code("Example").
values = range(1000)
shuffled = itershuffle(values)
+table(["Name", "Type", "Description"])
+row
+cell #[code iterable]
+cell iterable
+cell Iterator to shuffle.
+row
+cell #[code buffsize]
+cell int
+cell Items to hold back.
+row("foot")
+cell yields
+cell iterable
+cell The shuffled iterator.

View File

@ -1,46 +0,0 @@
//- 💫 DOCS > API > ANNOTATION SPECS
include ../_includes/_mixins
+section("text-processing")
+h(2, "text-processing") Text Processing
include _annotation/_text-processing
+section("pos-tagging")
+h(2, "pos-tagging") Part-of-speech Tagging
+aside("Tip: Understanding tags")
| You can also use #[code spacy.explain()] to get the description for the
| string representation of a tag. For example,
| #[code spacy.explain("RB")] will return "adverb".
include _annotation/_pos-tags
+section("dependency-parsing")
+h(2, "dependency-parsing") Syntactic Dependency Parsing
+aside("Tip: Understanding labels")
| You can also use #[code spacy.explain()] to get the description for the
| string representation of a label. For example,
| #[code spacy.explain("prt")] will return "particle".
include _annotation/_dep-labels
+section("named-entities")
+h(2, "named-entities") Named Entity Recognition
+aside("Tip: Understanding entity types")
| You can also use #[code spacy.explain()] to get the description for the
| string representation of an entity label. For example,
| #[code spacy.explain("LANGUAGE")] will return "any named language".
include _annotation/_named-entities
+h(3, "biluo") BILUO Scheme
include _annotation/_biluo
+section("training")
+h(2, "training") Models and training data
include _annotation/_training

View File

@ -1,738 +0,0 @@
//- 💫 DOCS > API > COMMAND LINE INTERFACE
include ../_includes/_mixins
p
| As of v1.7.0, spaCy comes with new command line helpers to download and
| link models and show useful debugging information. For a list of available
| commands, type #[code spacy --help].
+h(3, "download") Download
p
| Download #[+a("/usage/models") models] for spaCy. The downloader finds the
| best-matching compatible version, uses pip to download the model as a
| package and automatically creates a
| #[+a("/usage/models#usage") shortcut link] to load the model by name.
| Direct downloads don't perform any compatibility checks and require the
| model name to be specified with its version (e.g.
| #[code en_core_web_sm-2.0.0]).
+aside("Downloading best practices")
| The #[code download] command is mostly intended as a convenient,
| interactive wrapper it performs compatibility checks and prints
| detailed messages in case things go wrong. It's #[strong not recommended]
| to use this command as part of an automated process. If you know which
| model your project needs, you should consider a
| #[+a("/usage/models#download-pip") direct download via pip], or
| uploading the model to a local PyPi installation and fetching it straight
| from there. This will also allow you to add it as a versioned package
| dependency to your project.
+code(false, "bash", "$").
python -m spacy download [model] [--direct]
+table(["Argument", "Type", "Description"])
+row
+cell #[code model]
+cell positional
+cell
| Model name or shortcut (#[code en], #[code de],
| #[code en_core_web_sm]).
+row
+cell #[code --direct], #[code -d]
+cell flag
+cell Force direct download of exact model version.
+row
+cell other
+tag-new(2.1)
+cell -
+cell
| Additional installation options to be passed to
| #[code pip install] when installing the model package. For
| example, #[code --user] to install to the user home directory.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell creates
+cell directory, symlink
+cell
| The installed model package in your #[code site-packages]
| directory and a shortcut link as a symlink in #[code spacy/data].
+h(3, "link") Link
p
| Create a #[+a("/usage/models#usage") shortcut link] for a model,
| either a Python package or a local directory. This will let you load
| models from any location using a custom name via
| #[+api("spacy#load") #[code spacy.load()]].
+infobox("Important note")
| In spaCy v1.x, you had to use the model data directory to set up a shortcut
| link for a local path. As of v2.0, spaCy expects all shortcut links to
| be #[strong loadable model packages]. If you want to load a data directory,
| call #[+api("spacy#load") #[code spacy.load()]] or
| #[+api("language#from_disk") #[code Language.from_disk()]] with the path,
| or use the #[+api("cli#package") #[code package]] command to create a
| model package.
+code(false, "bash", "$").
python -m spacy link [origin] [link_name] [--force]
+table(["Argument", "Type", "Description"])
+row
+cell #[code origin]
+cell positional
+cell Model name if package, or path to local directory.
+row
+cell #[code link_name]
+cell positional
+cell Name of the shortcut link to create.
+row
+cell #[code --force], #[code -f]
+cell flag
+cell Force overwriting of existing link.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell creates
+cell symlink
+cell
| A shortcut link of the given name as a symlink in
| #[code spacy/data].
+h(3, "info") Info
p
| Print information about your spaCy installation, models and local setup,
| and generate #[+a("https://en.wikipedia.org/wiki/Markdown") Markdown]-formatted
| markup to copy-paste into #[+a(gh("spacy") + "/issues") GitHub issues].
+code(false, "bash").
python -m spacy info [--markdown]
python -m spacy info [model] [--markdown]
+table(["Argument", "Type", "Description"])
+row
+cell #[code model]
+cell positional
+cell A model, i.e. shortcut link, package name or path (optional).
+row
+cell #[code --markdown], #[code -md]
+cell flag
+cell Print information as Markdown.
+row
+cell #[code --silent], #[code -s]
+tag-new("2.0.12")
+cell flag
+cell Don't print anything, just return the values.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell prints
+cell #[code stdout]
+cell Information about your spaCy installation.
+h(3, "validate") Validate
+tag-new(2)
p
| Find all models installed in the current environment (both packages and
| shortcut links) and check whether they are compatible with the currently
| installed version of spaCy. Should be run after upgrading spaCy via
| #[code pip install -U spacy] to ensure that all installed models are
| can be used with the new version. The command is also useful to detect
| out-of-sync model links resulting from links created in different virtual
| environments. It will a list of models, the installed versions, the
| latest compatible version (if out of date) and the commands for updating.
+aside("Automated validation")
| You can also use the #[code validate] command as part of your build
| process or test suite, to ensure all models are up to date before
| proceeding. If incompatible models or shortcut links are found, it will
| return #[code 1].
+code(false, "bash", "$").
python -m spacy validate
+table(["Argument", "Type", "Description"])
+row("foot")
+cell prints
+cell #[code stdout]
+cell Details about the compatibility of your installed models.
+h(3, "convert") Convert
p
| Convert files into spaCy's #[+a("/api/annotation#json-input") JSON format]
| for use with the #[code train] command and other experiment management
| functions. The converter can be specified on the command line, or
| chosen based on the file extension of the input file.
+code(false, "bash", "$", false, false, true).
python -m spacy convert [input_file] [output_dir] [--converter] [--n-sents]
[--morphology]
+table(["Argument", "Type", "Description"])
+row
+cell #[code input_file]
+cell positional
+cell Input file.
+row
+cell #[code output_dir]
+cell positional
+cell Output directory for converted JSON file.
+row
+cell #[code converter], #[code -c]
+cell option
+cell #[+tag-new(2)] Name of converter to use (see below).
+row
+cell #[code --n-sents], #[code -n]
+cell option
+cell Number of sentences per document.
+row
+cell #[code --morphology], #[code -m]
+cell option
+cell Enable appending morphology to tags.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell creates
+cell JSON
+cell Data in spaCy's #[+a("/api/annotation#json-input") JSON format].
p The following file format converters are available:
+table(["ID", "Description"])
+row
+cell #[code auto]
+cell Automatically pick converter based on file extension (default).
+row
+cell #[code conllu], #[code conll]
+cell Universal Dependencies #[code .conllu] or #[code .conll] format.
+row
+cell #[code ner]
+cell Tab-based named entity recognition format.
+row
+cell #[code iob]
+cell IOB or IOB2 named entity recognition format.
+h(3, "train") Train
p
| Train a model. Expects data in spaCy's
| #[+a("/api/annotation#json-input") JSON format]. On each epoch, a model
| will be saved out to the directory. Accuracy scores and model details
| will be added to a #[+a("/usage/training#models-generating") #[code meta.json]]
| to allow packaging the model using the
| #[+api("cli#package") #[code package]] command.
+infobox("Changed in v2.1", "⚠️")
| As of spaCy 2.1, the #[code --no-tagger], #[code --no-parser] and
| #[code --no-parser] flags have been replaced by a #[code --pipeline]
| option, which lets you define comma-separated names of pipeline
| components to train. For example, #[code --pipeline tagger,parser] will
| only train the tagger and parser.
+code(false, "bash", "$", false, false, true).
python -m spacy train [lang] [output_path] [train_path] [dev_path]
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
[--verbose]
+table(["Argument", "Type", "Description"])
+row
+cell #[code lang]
+cell positional
+cell Model language.
+row
+cell #[code output_path]
+cell positional
+cell Directory to store model in. Will be created if it doesn't exist.
+row
+cell #[code train_path]
+cell positional
+cell Location of JSON-formatted training data.
+row
+cell #[code dev_path]
+cell positional
+cell Location of JSON-formatted development data for evaluation.
+row
+cell #[code --base-model], #[code -b]
+cell option
+cell
| Optional name of base model to update. Can be any loadable
| spaCy model.
+row
+cell #[code --pipeline], #[code -p]
+tag-new("2.1.0")
+cell option
+cell
| Comma-separated names of pipeline components to train. Defaults
| to #[code 'tagger,parser,ner'].
+row
+cell #[code --vectors], #[code -v]
+cell option
+cell Model to load vectors from.
+row
+cell #[code --n-iter], #[code -n]
+cell option
+cell Number of iterations (default: #[code 30]).
+row
+cell #[code --n-examples], #[code -ns]
+cell option
+cell Number of examples to use (defaults to #[code 0] for all examples).
+row
+cell #[code --use-gpu], #[code -g]
+cell option
+cell
| Whether to use GPU. Can be either #[code 0], #[code 1] or
| #[code -1].
+row
+cell #[code --version], #[code -V]
+cell option
+cell
| Model version. Will be written out to the model's
| #[code meta.json] after training.
+row
+cell #[code --meta-path], #[code -m]
+tag-new(2)
+cell option
+cell
| Optional path to model
| #[+a("/usage/training#models-generating") #[code meta.json]].
| All relevant properties like #[code lang], #[code pipeline] and
| #[code spacy_version] will be overwritten.
+row
+cell #[code --init-tok2vec], #[code -t2v]
+tag-new("2.1.0")
+cell option
+cell
| Path to pretrained weights for the token-to-vector parts of the
| models. See #[code spacy pretrain]. Experimental.
+row
+cell #[code --parser-multitasks], #[code -pt]
+cell option
+cell
| Side objectives for parser CNN, e.g. #[code 'dep'] or
| #[code 'dep,tag']
+row
+cell #[code --entity-multitasks], #[code -et]
+cell option
+cell
| Side objectives for NER CNN, e.g. #[code 'dep'] or
| #[code 'dep,tag']
+row
+cell #[code --noise-level], #[code -nl]
+cell option
+cell Float indicating the amount of corruption for data agumentation.
+row
+cell #[code --gold-preproc], #[code -G]
+cell flag
+cell Use gold preprocessing.
+row
+cell #[code --learn-tokens], #[code -T]
+cell flag
+cell
| Make parser learn gold-standard tokenization by merging
] subtokens. Typically used for languages like Chinese.
+row
+cell #[code --verbose], #[code -VV]
+tag-new("2.0.13")
+cell flag
+cell Show more detailed messages during training.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell creates
+cell model, pickle
+cell A spaCy model on each epoch.
+h(4, "train-hyperparams") Environment variables for hyperparameters
+tag-new(2)
p
| spaCy lets you set hyperparameters for training via environment variables.
| This is useful, because it keeps the command simple and allows you to
| #[+a("https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias/17537#17537") create an alias]
| for your custom #[code train] command while still being able to easily
| tweak the hyperparameters. For example:
+code(false, "bash", "$").
parser_hidden_depth=2 parser_maxout_pieces=1 spacy train [...]
+code("Usage with alias", "bash", "$").
alias train-parser="spacy train en /output /data /train /dev -n 1000"
parser_maxout_pieces=1 train-parser
+table(["Name", "Description", "Default"])
+row
+cell #[code dropout_from]
+cell Initial dropout rate.
+cell #[code 0.2]
+row
+cell #[code dropout_to]
+cell Final dropout rate.
+cell #[code 0.2]
+row
+cell #[code dropout_decay]
+cell Rate of dropout change.
+cell #[code 0.0]
+row
+cell #[code batch_from]
+cell Initial batch size.
+cell #[code 1]
+row
+cell #[code batch_to]
+cell Final batch size.
+cell #[code 64]
+row
+cell #[code batch_compound]
+cell Rate of batch size acceleration.
+cell #[code 1.001]
+row
+cell #[code token_vector_width]
+cell Width of embedding tables and convolutional layers.
+cell #[code 128]
+row
+cell #[code embed_size]
+cell Number of rows in embedding tables.
+cell #[code 7500]
//- +row
//- +cell #[code parser_maxout_pieces]
//- +cell Number of pieces in the parser's and NER's first maxout layer.
//- +cell #[code 2]
//- +row
//- +cell #[code parser_hidden_depth]
//- +cell Number of hidden layers in the parser and NER.
//- +cell #[code 1]
+row
+cell #[code hidden_width]
+cell Size of the parser's and NER's hidden layers.
+cell #[code 128]
//- +row
//- +cell #[code history_feats]
//- +cell Number of previous action ID features for parser and NER.
//- +cell #[code 128]
//- +row
//- +cell #[code history_width]
//- +cell Number of embedding dimensions for each action ID.
//- +cell #[code 128]
+row
+cell #[code learn_rate]
+cell Learning rate.
+cell #[code 0.001]
+row
+cell #[code optimizer_B1]
+cell Momentum for the Adam solver.
+cell #[code 0.9]
+row
+cell #[code optimizer_B2]
+cell Adagrad-momentum for the Adam solver.
+cell #[code 0.999]
+row
+cell #[code optimizer_eps]
+cell Epsilon value for the Adam solver.
+cell #[code 1e-08]
+row
+cell #[code L2_penalty]
+cell L2 regularisation penalty.
+cell #[code 1e-06]
+row
+cell #[code grad_norm_clip]
+cell Gradient L2 norm constraint.
+cell #[code 1.0]
+h(3, "vocab") Vocab
+tag-new(2)
p
| Compile a vocabulary from a
| #[+a("/api/annotation#vocab-jsonl") lexicon JSONL] file and optional
| word vectors. Will save out a valid spaCy model that you can load via
| #[+api("spacy#load") #[code spacy.load]] or package using the
| #[+api("cli#package") #[code package]] command.
+code(false, "bash", "$").
python -m spacy vocab [lang] [output_dir] [lexemes_loc] [vectors_loc]
+table(["Argument", "Type", "Description"])
+row
+cell #[code lang]
+cell positional
+cell
| Model language
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code],
| e.g. #[code en].
+row
+cell #[code output_dir]
+cell positional
+cell Model output directory. Will be created if it doesn't exist.
+row
+cell #[code lexemes_loc]
+cell positional
+cell
| Location of lexical data in spaCy's
| #[+a("/api/annotation#vocab-jsonl") JSONL format].
+row
+cell #[code vectors_loc]
+cell positional
+cell Optional location of vectors data as numpy #[code .npz] file.
+row("foot")
+cell creates
+cell model
+cell A spaCy model containing the vocab and vectors.
+h(3, "init-model") Init Model
+tag-new(2)
p
| Create a new model directory from raw data, like word frequencies, Brown
| clusters and word vectors. This command is similar to the
| #[code spacy model] command in v1.x.
+code(false, "bash", "$", false, false, true).
python -m spacy init-model [lang] [output_dir] [freqs_loc] [--clusters-loc] [--vectors-loc] [--prune-vectors]
+table(["Argument", "Type", "Description"])
+row
+cell #[code lang]
+cell positional
+cell
| Model language
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code],
| e.g. #[code en].
+row
+cell #[code output_dir]
+cell positional
+cell Model output directory. Will be created if it doesn't exist.
+row
+cell #[code freqs_loc]
+cell positional
+cell
| Location of word frequencies file. Should be a tab-separated
| file with three columns: frequency, document frequency and
| frequency count.
+row
+cell #[code --clusters-loc], #[code -c]
+cell option
+cell
| Optional location of clusters file. Should be a tab-separated
| file with three columns: cluster, word and frequency.
+row
+cell #[code --vectors-loc], #[code -v]
+cell option
+cell
| Optional location of vectors file. Should be a tab-separated
| file in Word2Vec format where the first column contains the word
| and the remaining columns the values. File can be provided in
| #[code .txt] format or as a zipped text file in #[code .zip] or
| #[code .tar.gz] format.
+row
+cell #[code --prune-vectors], #[code -V]
+cell flag
+cell
| Number of vectors to prune the vocabulary to. Defaults to
| #[code -1] for no pruning.
+row("foot")
+cell creates
+cell model
+cell A spaCy model containing the vocab and vectors.
+h(3, "evaluate") Evaluate
+tag-new(2)
p
| Evaluate a model's accuracy and speed on JSON-formatted annotated data.
| Will print the results and optionally export
| #[+a("/usage/visualizers") displaCy visualizations] of a sample set of
| parses to #[code .html] files. Visualizations for the dependency parse
| and NER will be exported as separate files if the respective component
| is present in the model's pipeline.
+code(false, "bash", "$", false, false, true).
python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] [--gpu-id] [--gold-preproc]
+table(["Argument", "Type", "Description"])
+row
+cell #[code model]
+cell positional
+cell
| Model to evaluate. Can be a package or shortcut link name, or a
| path to a model data directory.
+row
+cell #[code data_path]
+cell positional
+cell Location of JSON-formatted evaluation data.
+row
+cell #[code --displacy-path], #[code -dp]
+cell option
+cell
| Directory to output rendered parses as HTML. If not set, no
| visualizations will be generated.
+row
+cell #[code --displacy-limit], #[code -dl]
+cell option
+cell
| Number of parses to generate per file. Defaults to #[code 25].
| Keep in mind that a significantly higher number might cause the
| #[code .html] files to render slowly.
+row
+cell #[code --gpu-id], #[code -g]
+cell option
+cell GPU to use, if any. Defaults to #[code -1] for CPU.
+row
+cell #[code --gold-preproc], #[code -G]
+cell flag
+cell Use gold preprocessing.
+row("foot")
+cell prints / creates
+cell #[code stdout], HTML
+cell Training results and optional displaCy visualizations.
+h(3, "package") Package
p
| Generate a #[+a("/usage/training#models-generating") model Python package]
| from an existing model data directory. All data files are copied over.
| If the path to a #[code meta.json] is supplied, or a #[code meta.json] is
| found in the input directory, this file is used. Otherwise, the data can
| be entered directly from the command line. After packaging, you can run
| #[code python setup.py sdist] from the newly created directory to turn
| your model into an installable archive file.
+code(false, "bash", "$", false, false, true).
python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--force]
+aside-code("Example", "bash").
python -m spacy package /input /output
cd /output/en_model-0.0.0
python setup.py sdist
pip install dist/en_model-0.0.0.tar.gz
+table(["Argument", "Type", "Description"])
+row
+cell #[code input_dir]
+cell positional
+cell Path to directory containing model data.
+row
+cell #[code output_dir]
+cell positional
+cell Directory to create package folder in.
+row
+cell #[code --meta-path], #[code -m]
+cell option
+cell #[+tag-new(2)] Path to #[code meta.json] file (optional).
+row
+cell #[code --create-meta], #[code -c]
+cell flag
+cell
| #[+tag-new(2)] Create a #[code meta.json] file on the command
| line, even if one already exists in the directory. If an
| existing file is found, its entries will be shown as the defaults
| in the command line prompt.
+row
+cell #[code --force], #[code -f]
+cell flag
+cell Force overwriting of existing folder in output directory.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+row("foot")
+cell creates
+cell directory
+cell A Python package containing the spaCy model.

View File

@ -1,39 +0,0 @@
//- 💫 DOCS > API > CYTHON > CLASSES
include ../_includes/_mixins
+section("doc")
+h(2, "doc", "spacy/tokens/doc.pxd") Doc
+tag cdef class
include _cython/_doc
+section("token")
+h(2, "token", "spacy/tokens/token.pxd") Token
+tag cdef class
include _cython/_token
+section("span")
+h(2, "span", "spacy/tokens/span.pxd") Span
+tag cdef class
include _cython/_span
+section("lexeme")
+h(2, "lexeme", "spacy/lexeme.pxd") Lexeme
+tag cdef class
include _cython/_lexeme
+section("vocab")
+h(2, "vocab", "spacy/vocab.pxd") Vocab
+tag cdef class
include _cython/_vocab
+section("stringstore")
+h(2, "stringstore", "spacy/strings.pxd") StringStore
+tag cdef class
include _cython/_stringstore

View File

@ -1,15 +0,0 @@
//- 💫 DOCS > API > CYTHON > STRUCTS
include ../_includes/_mixins
+section("tokenc")
+h(2, "tokenc", "spacy/structs.pxd") TokenC
+tag C struct
include _cython/_tokenc
+section("lexemec")
+h(2, "lexemec", "spacy/structs.pxd") LexemeC
+tag C struct
include _cython/_lexemec

View File

@ -1,176 +0,0 @@
//- 💫 DOCS > API > CYTHON > ARCHITECTURE
include ../_includes/_mixins
+section("overview")
+aside("What's Cython?")
| #[+a("http://cython.org/") Cython] is a language for writing
| C extensions for Python. Most Python code is also valid Cython, but
| you can add type declarations to get efficient memory-managed code
| just like C or C++.
p
| This section documents spaCy's C-level data structures and
| interfaces, intended for use from Cython. Some of the attributes are
| primarily for internal use, and all C-level functions and methods are
| designed for speed over safety if you make a mistake and access an
| array out-of-bounds, the program may crash abruptly.
p
| With Cython there are four ways of declaring complex data types.
| Unfortunately we use all four in different places, as they all have
| different utility:
+table(["Declaration", "Description", "Example"])
+row
+cell #[code class]
+cell A normal Python class.
+cell #[+api("language") #[code Language]]
+row
+cell #[code cdef class]
+cell
| A Python extension type. Differs from a normal Python class
| in that its attributes can be defined on the underlying
| struct. Can have C-level objects as attributes (notably
| structs and pointers), and can have methods which have
| C-level objects as arguments or return types.
+cell #[+api("cython-classes#lexeme") #[code Lexeme]]
+row
+cell #[code cdef struct]
+cell
| A struct is just a collection of variables, sort of like a
| named tuple, except the memory is contiguous. Structs can't
| have methods, only attributes.
+cell #[+api("cython-structs#lexemec") #[code LexemeC]]
+row
+cell #[code cdef cppclass]
+cell
| A C++ class. Like a struct, this can be allocated on the
| stack, but can have methods, a constructor and a destructor.
| Differs from `cdef class` in that it can be created and
| destroyed without acquiring the Python global interpreter
| lock. This style is the most obscure.
+cell #[+src(gh("spacy", "spacy/syntax/_state.pxd")) #[code StateC]]
p
| The most important classes in spaCy are defined as #[code cdef class]
| objects. The underlying data for these objects is usually gathered
| into a struct, which is usually named #[code c]. For instance, the
| #[+api("cython-classses#lexeme") #[code Lexeme]] class holds a
| #[+api("cython-structs#lexemec") #[code LexemeC]] struct, at
| #[code Lexeme.c]. This lets you shed the Python container, and pass
| a pointer to the underlying data into C-level functions.
+section("conventions")
+h(2, "conventions") Conventions
p
| spaCy's core data structures are implemented as
| #[+a("http://cython.org/") Cython] #[code cdef] classes. Memory is
| managed through the #[+a(gh("cymem")) #[code cymem]]
| #[code cymem.Pool] class, which allows you
| to allocate memory which will be freed when the #[code Pool] object
| is garbage collected. This means you usually don't have to worry
| about freeing memory. You just have to decide which Python object
| owns the memory, and make it own the #[code Pool]. When that object
| goes out of scope, the memory will be freed. You do have to take
| care that no pointers outlive the object that owns them — but this
| is generally quite easy.
p
| All Cython modules should have the #[code # cython: infer_types=True]
| compiler directive at the top of the file. This makes the code much
| cleaner, as it avoids the need for many type declarations. If
| possible, you should prefer to declare your functions #[code nogil],
| even if you don't especially care about multi-threading. The reason
| is that #[code nogil] functions help the Cython compiler reason about
| your code quite a lot — you're telling the compiler that no Python
| dynamics are possible. This lets many errors be raised, and ensures
| your function will run at C speed.
p
| Cython gives you many choices of sequences: you could have a Python
| list, a numpy array, a memory view, a C++ vector, or a pointer.
| Pointers are preferred, because they are fastest, have the most
| explicit semantics, and let the compiler check your code more
| strictly. C++ vectors are also great — but you should only use them
| internally in functions. It's less friendly to accept a vector as an
| argument, because that asks the user to do much more work. Here's
| how to get a pointer from a numpy array, memory view or vector:
+code.
cdef void get_pointers(np.ndarray[int, mode='c'] numpy_array, vector[int] cpp_vector, int[::1] memory_view) nogil:
pointer1 = &lt;int*&gt;numpy_array.data
pointer2 = cpp_vector.data()
pointer3 = &memory_view[0]
p
| Both C arrays and C++ vectors reassure the compiler that no Python
| operations are possible on your variable. This is a big advantage:
| it lets the Cython compiler raise many more errors for you.
p
| When getting a pointer from a numpy array or memoryview, take care
| that the data is actually stored in C-contiguous order — otherwise
| you'll get a pointer to nonsense. The type-declarations in the code
| above should generate runtime errors if buffers with incorrect
| memory layouts are passed in. To iterate over the array, the
| following style is preferred:
+code.
cdef int c_total(const int* int_array, int length) nogil:
total = 0
for item in int_array[:length]:
total += item
return total
p
| If this is confusing, consider that the compiler couldn't deal with
| #[code for item in int_array:] — there's no length attached to a raw
| pointer, so how could we figure out where to stop? The length is
| provided in the slice notation as a solution to this. Note that we
| don't have to declare the type of #[code item] in the code above —
| the compiler can easily infer it. This gives us tidy code that looks
| quite like Python, but is exactly as fast as C — because we've made
| sure the compilation to C is trivial.
p
| Your functions cannot be declared #[code nogil] if they need to
| create Python objects or call Python functions. This is perfectly
| okay — you shouldn't torture your code just to get #[code nogil]
| functions. However, if your function isn't #[code nogil], you should
| compile your module with #[code cython -a --cplus my_module.pyx] and
| open the resulting #[code my_module.html] file in a browser. This
| will let you see how Cython is compiling your code. Calls into the
| Python run-time will be in bright yellow. This lets you easily see
| whether Cython is able to correctly type your code, or whether there
| are unexpected problems.
p
| Working in Cython is very rewarding once you're over the initial
| learning curve. As with C and C++, the first way you write something
| in Cython will often be the performance-optimal approach. In
| contrast, Python optimisation generally requires a lot of
| experimentation. Is it faster to have an #[code if item in my_dict]
| check, or to use #[code .get()]? What about
| #[code try]/#[code except]? Does this numpy operation create a copy?
| There's no way to guess the answers to these questions, and you'll
| usually be dissatisfied with your results — so there's no way to
| know when to stop this process. In the worst case, you'll make a
| mess that invites the next reader to try their luck too. This is
| like one of those
| #[+a("http://www.wemjournal.org/article/S1080-6032%2809%2970088-2/abstract") volcanic gas-traps],
| where the rescuers keep passing out from low oxygen, causing
| another rescuer to follow — only to succumb themselves. In short,
| just say no to optimizing your Python. If it's not fast enough the
| first time, just switch to Cython.
+infobox("Resources")
+list.o-no-block
+item #[+a("http://docs.cython.org/en/latest/") Official Cython documentation] (cython.org)
+item #[+a("https://explosion.ai/blog/writing-c-in-cython", true) Writing C in Cython] (explosion.ai)
+item #[+a("https://explosion.ai/blog/multithreading-with-cython") Multi-threading spaCys parser and named entity recogniser] (explosion.ai)

View File

@ -1,6 +0,0 @@
//- 💫 DOCS > API > DEPENDENCYPARSER
include ../_includes/_mixins
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
!=partial("pipe", { subclass: "DependencyParser", short: "parser", pipeline_id: "parser" })

View File

@ -1,827 +0,0 @@
//- 💫 DOCS > API > DOC
include ../_includes/_mixins
p
| A #[code Doc] is a sequence of #[+api("token") #[code Token]] objects.
| Access sentences and named entities, export annotations to numpy arrays,
| losslessly serialize to compressed binary strings. The #[code Doc] object
| holds an array of #[code TokenC] structs. The Python-level #[code Token]
| and #[+api("span") #[code Span]] objects are views of this array, i.e.
| they don't own the data themselves.
+aside-code("Example").
# Construction 1
doc = nlp(u'Some text')
# Construction 2
from spacy.tokens import Doc
doc = Doc(nlp.vocab, words=[u'hello', u'world', u'!'],
spaces=[True, False, False])
+h(2, "init") Doc.__init__
+tag method
p
| Construct a #[code Doc] object. The most common way to get a #[code Doc]
| object is via the #[code nlp] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A storage container for lexical types.
+row
+cell #[code words]
+cell -
+cell A list of strings to add to the container.
+row
+cell #[code spaces]
+cell -
+cell
| A list of boolean values indicating whether each word has a
| subsequent space. Must have the same length as #[code words], if
| specified. Defaults to a sequence of #[code True].
+row("foot")
+cell returns
+cell #[code Doc]
+cell The newly constructed object.
+h(2, "getitem") Doc.__getitem__
+tag method
p
| Get a #[+api("token") #[code Token]] object at position #[code i], where
| #[code i] is an integer. Negative indexing is supported, and follows the
| usual Python semantics, i.e. #[code doc[-2]] is #[code doc[len(doc) - 2]].
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
assert doc[0].text == 'Give'
assert doc[-1].text == '.'
span = doc[1:3]
assert span.text == 'it back'
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The index of the token.
+row("foot")
+cell returns
+cell #[code Token]
+cell The token at #[code doc[i]].
p
| Get a #[+api("span") #[code Span]] object, starting at position
| #[code start] (token index) and ending at position #[code end] (token
| index).
p
| For instance, #[code doc[2:5]] produces a span consisting of tokens 2, 3
| and 4. Stepped slices (e.g. #[code doc[start : end : step]]) are not
| supported, as #[code Span] objects must be contiguous (cannot have gaps).
| You can use negative indices and open-ended ranges, which have their
| normal Python semantics.
+table(["Name", "Type", "Description"])
+row
+cell #[code start_end]
+cell tuple
+cell The slice of the document to get.
+row("foot")
+cell returns
+cell #[code Span]
+cell The span at #[code doc[start : end]].
+h(2, "iter") Doc.__iter__
+tag method
p
| Iterate over #[code Token] objects, from which the annotations can be
| easily accessed.
+aside-code("Example").
doc = nlp(u'Give it back')
assert [t.text for t in doc] == [u'Give', u'it', u'back']
p
| This is the main way of accessing #[+api("token") #[code Token]] objects,
| which are the main way annotations are accessed from Python. If
| faster-than-Python speeds are required, you can instead access the
| annotations as a numpy array, or access the underlying C data directly
| from Cython.
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A #[code Token] object.
+h(2, "len") Doc.__len__
+tag method
p Get the number of tokens in the document.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
assert len(doc) == 7
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of tokens in the document.
+h(2, "set_extension") Doc.set_extension
+tag classmethod
+tag-new(2)
p
| Define a custom attribute on the #[code Doc] which becomes available via
| #[code Doc._]. For details, see the documentation on
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
+aside-code("Example").
from spacy.tokens import Doc
city_getter = lambda doc: any(city in doc.text for city in ('New York', 'Paris', 'Berlin'))
Doc.set_extension('has_city', getter=city_getter)
doc = nlp(u'I like New York')
assert doc._.has_city
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell
| Name of the attribute to set by the extension. For example,
| #[code 'my_attr'] will be available as #[code doc._.my_attr].
+row
+cell #[code default]
+cell -
+cell
| Optional default value of the attribute if no getter or method
| is defined.
+row
+cell #[code method]
+cell callable
+cell
| Set a custom method on the object, for example
| #[code doc._.compare(other_doc)].
+row
+cell #[code getter]
+cell callable
+cell
| Getter function that takes the object and returns an attribute
| value. Is called when the user accesses the #[code ._] attribute.
+row
+cell #[code setter]
+cell callable
+cell
| Setter function that takes the #[code Doc] and a value, and
| modifies the object. Is called when the user writes to the
| #[code Doc._] attribute.
+h(2, "get_extension") Doc.get_extension
+tag classmethod
+tag-new(2)
p
| Look up a previously registered extension by name. Returns a 4-tuple
| #[code.u-break (default, method, getter, setter)] if the extension is
| registered. Raises a #[code KeyError] otherwise.
+aside-code("Example").
from spacy.tokens import Doc
Doc.set_extension('has_city', default=False)
extension = Doc.get_extension('has_city')
assert extension == (False, None, None, None)
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| extension.
+h(2, "has_extension") Doc.has_extension
+tag classmethod
+tag-new(2)
p Check whether an extension has been registered on the #[code Doc] class.
+aside-code("Example").
from spacy.tokens import Doc
Doc.set_extension('has_city', default=False)
assert Doc.has_extension('has_city')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the extension has been registered.
+h(2, "remove_extension") Doc.remove_extension
+tag classmethod
+tag-new("2.0.12")
p Remove a previously registered extension.
+aside-code("Example").
from spacy.tokens import Doc
Doc.set_extension('has_city', default=False)
removed = Doc.remove_extension('has_city')
assert not Doc.has_extension('has_city')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| removed extension.
+h(2, "char_span") Doc.char_span
+tag method
+tag-new(2)
p
| Create a #[code Span] object from the slice #[code doc.text[start : end]].
| Returns #[code None] if the character indices don't map to a valid span.
+aside-code("Example").
doc = nlp(u'I like New York')
span = doc.char_span(7, 15, label=u'GPE')
assert span.text == 'New York'
+table(["Name", "Type", "Description"])
+row
+cell #[code start]
+cell int
+cell The index of the first character of the span.
+row
+cell #[code end]
+cell int
+cell The index of the last character after the span.
+row
+cell #[code label]
+cell uint64 / unicode
+cell A label to attach to the Span, e.g. for named entities.
+row
+cell #[code vector]
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A meaning representation of the span.
+row("foot")
+cell returns
+cell #[code Span]
+cell The newly constructed object or #[code None].
+h(2, "similarity") Doc.similarity
+tag method
+tag-model("vectors")
p
| Make a semantic similarity estimate. The default estimate is cosine
| similarity using an average of word vectors.
+aside-code("Example").
apples = nlp(u'I like apples')
oranges = nlp(u'I like oranges')
apples_oranges = apples.similarity(oranges)
oranges_apples = oranges.similarity(apples)
assert apples_oranges == oranges_apples
+table(["Name", "Type", "Description"])
+row
+cell #[code other]
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+row("foot")
+cell returns
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "count_by") Doc.count_by
+tag method
p
| Count the frequencies of a given attribute. Produces a dict of
| #[code {attr (int): count (ints)}] frequencies, keyed by the values
| of the given attribute ID.
+aside-code("Example").
from spacy.attrs import ORTH
doc = nlp(u'apple apple orange banana')
assert doc.count_by(ORTH) == {7024L: 1, 119552L: 1, 2087L: 2}
doc.to_array([attrs.ORTH])
# array([[11880], [11880], [7561], [12800]])
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_id]
+cell int
+cell The attribute ID
+row("foot")
+cell returns
+cell dict
+cell A dictionary mapping attributes to integer counts.
+h(2, "get_lca_matrix") Doc.get_lca_matrix
+tag method
p
| Calculates the lowest common ancestor matrix for a given #[code Doc].
| Returns LCA matrix containing the integer index of the ancestor, or
| #[code -1] if no common ancestor is found, e.g. if span excludes a
| necessary ancestor.
+aside-code("Example").
doc = nlp(u"This is a test")
matrix = doc.get_lca_matrix()
# array([[0, 1, 1, 1], [1, 1, 1, 1], [1, 1, 2, 3], [1, 1, 3, 3]], dtype=int32)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
+cell The lowest common ancestor matrix of the #[code Doc].
+h(2, "to_array") Doc.to_array
+tag method
p
| Export given token attributes to a numpy #[code ndarray].
| If #[code attr_ids] is a sequence of #[code M] attributes,
| the output array will be of shape #[code (N, M)], where #[code N]
| is the length of the #[code Doc] (in tokens). If #[code attr_ids] is
| a single attribute, the output shape will be #[code (N,)]. You can
| specify attributes by integer ID (e.g. #[code spacy.attrs.LEMMA])
| or string name (e.g. 'LEMMA' or 'lemma'). The values will be 64-bit
| integers.
+aside-code("Example").
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
doc = nlp(text)
# All strings mapped to integers, for easy export to numpy
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
np_array = doc.to_array("POS")
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_ids]
+cell list or int or string
+cell
| A list of attributes (int IDs or string names) or
| a single attribute (int ID or string name)
+row("foot")
+cell returns
+cell
| #[code.u-break numpy.ndarray[ndim=2, dtype='uint64']] or
| #[code.u-break numpy.ndarray[ndim=1, dtype='uint64']] or
+cell
| The exported attributes as a 2D numpy array, with one row per
| token and one column per attribute (when #[code attr_ids] is a
| list), or as a 1D numpy array, with one item per attribute (when
| #[code attr_ids] is a single value).
+h(2, "from_array") Doc.from_array
+tag method
p
| Load attributes from a numpy array. Write to a #[code Doc] object, from
| an #[code (M, N)] array of attributes.
+aside-code("Example").
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
doc = nlp("Hello world!")
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
doc2 = Doc(doc.vocab, words=[t.text for t in doc])
doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array)
assert doc[0].pos_ == doc2[0].pos_
+table(["Name", "Type", "Description"])
+row
+cell #[code attrs]
+cell ints
+cell A list of attribute ID ints.
+row
+cell #[code array]
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
+cell The attribute values to load.
+row("foot")
+cell returns
+cell #[code Doc]
+cell Itself.
+h(2, "to_disk") Doc.to_disk
+tag method
+tag-new(2)
p Save the current state to a directory.
+aside-code("Example").
doc.to_disk('/path/to/doc')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+h(2, "from_disk") Doc.from_disk
+tag method
+tag-new(2)
p Loads state from a directory. Modifies the object in place and returns it.
+aside-code("Example").
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row("foot")
+cell returns
+cell #[code Doc]
+cell The modified #[code Doc] object.
+h(2, "to_bytes") Doc.to_bytes
+tag method
p Serialize, i.e. export the document contents to a binary string.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
doc_bytes = doc.to_bytes()
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bytes
+cell
| A losslessly serialized copy of the #[code Doc], including all
| annotations.
+h(2, "from_bytes") Doc.from_bytes
+tag method
p Deserialize, i.e. import the document contents from a binary string.
+aside-code("Example").
from spacy.tokens import Doc
text = u'Give it back! He pleaded.'
doc = nlp(text)
bytes = doc.to_bytes()
doc2 = Doc(doc.vocab).from_bytes(bytes)
assert doc.text == doc2.text
+table(["Name", "Type", "Description"])
+row
+cell #[code data]
+cell bytes
+cell The string to load from.
+row("foot")
+cell returns
+cell #[code Doc]
+cell The #[code Doc] object.
+h(2, "merge") Doc.merge
+tag method
p
| Retokenize the document, such that the span at
| #[code doc.text[start_idx : end_idx]] is merged into a single token. If
| #[code start_idx] and #[code end_idx] do not mark start and end token
| boundaries, the document remains unchanged.
+aside-code("Example").
doc = nlp(u'Los Angeles start.')
doc.merge(0, len('Los Angeles'), 'NNP', 'Los Angeles', 'GPE')
assert [t.text for t in doc] == [u'Los Angeles', u'start', u'.']
+table(["Name", "Type", "Description"])
+row
+cell #[code start_idx]
+cell int
+cell The character index of the start of the slice to merge.
+row
+cell #[code end_idx]
+cell int
+cell The character index after the end of the slice to merge.
+row
+cell #[code **attributes]
+cell -
+cell
| Attributes to assign to the merged token. By default,
| attributes are inherited from the syntactic root token of
| the span.
+row("foot")
+cell returns
+cell #[code Token]
+cell
| The newly merged token, or #[code None] if the start and end
| indices did not fall at token boundaries
+h(2, "print_tree") Doc.print_tree
+tag method
+tag-model("parse")
p
| Returns the parse trees in JSON (dict) format. Especially useful for
| web applications.
+aside-code("Example").
doc = nlp(u'Alice ate the pizza.')
trees = doc.print_tree()
# {'modifiers': [
# {'modifiers': [], 'NE': 'PERSON', 'word': 'Alice', 'arc': 'nsubj', 'POS_coarse': 'PROPN', 'POS_fine': 'NNP', 'lemma': 'Alice'},
# {'modifiers': [{'modifiers': [], 'NE': '', 'word': 'the', 'arc': 'det', 'POS_coarse': 'DET', 'POS_fine': 'DT', 'lemma': 'the'}], 'NE': '', 'word': 'pizza', 'arc': 'dobj', 'POS_coarse': 'NOUN', 'POS_fine': 'NN', 'lemma': 'pizza'},
# {'modifiers': [], 'NE': '', 'word': '.', 'arc': 'punct', 'POS_coarse': 'PUNCT', 'POS_fine': '.', 'lemma': '.'}
# ], 'NE': '', 'word': 'ate', 'arc': 'ROOT', 'POS_coarse': 'VERB', 'POS_fine': 'VBD', 'lemma': 'eat'}
+table(["Name", "Type", "Description"])
+row
+cell #[code light]
+cell bool
+cell Don't include lemmas or entities.
+row
+cell #[code flat]
+cell bool
+cell Don't include arcs or modifiers.
+row("foot")
+cell returns
+cell dict
+cell Parse tree as dict.
+h(2, "ents") Doc.ents
+tag property
+tag-model("NER")
p
| Iterate over the entities in the document. Yields named-entity
| #[code Span] objects, if the entity recognizer has been applied to the
| document.
+aside-code("Example").
doc = nlp(u'Mr. Best flew to New York on Saturday morning.')
ents = list(doc.ents)
assert ents[0].label == 346
assert ents[0].label_ == 'PERSON'
assert ents[0].text == 'Mr. Best'
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Span]
+cell Entities in the document.
+h(2, "noun_chunks") Doc.noun_chunks
+tag property
+tag-model("parse")
p
| Iterate over the base noun phrases in the document. Yields base
| noun-phrase #[code Span] objects, if the document has been syntactically
| parsed. A base noun phrase, or "NP chunk", is a noun phrase that does not
| permit other NPs to be nested within it so no NP-level coordination, no
| prepositional phrases, and no relative clauses.
+aside-code("Example").
doc = nlp(u'A phrase with another phrase occurs.')
chunks = list(doc.noun_chunks)
assert chunks[0].text == "A phrase"
assert chunks[1].text == "another phrase"
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Span]
+cell Noun chunks in the document.
+h(2, "sents") Doc.sents
+tag property
+tag-model("parse")
p
| Iterate over the sentences in the document. Sentence spans have no label.
| To improve accuracy on informal texts, spaCy calculates sentence boundaries
| from the syntactic dependency parse. If the parser is disabled,
| the #[code sents] iterator will be unavailable.
+aside-code("Example").
doc = nlp(u"This is a sentence. Here's another...")
sents = list(doc.sents)
assert len(sents) == 2
assert [s.root.text for s in sents] == ["is", "'s"]
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Span]
+cell Sentences in the document.
+h(2, "has_vector") Doc.has_vector
+tag property
+tag-model("vectors")
p
| A boolean value indicating whether a word vector is associated with the
| object.
+aside-code("Example").
doc = nlp(u'I like apples')
assert doc.has_vector
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the document has a vector data attached.
+h(2, "vector") Doc.vector
+tag property
+tag-model("vectors")
p
| A real-valued meaning representation. Defaults to an average of the
| token vectors.
+aside-code("Example").
doc = nlp(u'I like apples')
assert doc.vector.dtype == 'float32'
assert doc.vector.shape == (300,)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the document's semantics.
+h(2, "vector_norm") Doc.vector_norm
+tag property
+tag-model("vectors")
p
| The L2 norm of the document's vector representation.
+aside-code("Example").
doc1 = nlp(u'I like apples')
doc2 = nlp(u'I like oranges')
doc1.vector_norm # 4.54232424414368
doc2.vector_norm # 3.304373298575751
assert doc1.vector_norm != doc2.vector_norm
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell float
+cell The L2 norm of the vector representation.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code text]
+cell unicode
+cell A unicode representation of the document text.
+row
+cell #[code text_with_ws]
+cell unicode
+cell
| An alias of #[code Doc.text], provided for duck-type compatibility
| with #[code Span] and #[code Token].
+row
+cell #[code mem]
+cell #[code Pool]
+cell The document's local memory heap, for all C data it owns.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The store of lexical types.
+row
+cell #[code tensor] #[+tag-new(2)]
+cell object
+cell Container for dense vector representations.
+row
+cell #[code cats] #[+tag-new(2)]
+cell dictionary
+cell
| Maps either a label to a score for categories applied to whole
| document, or #[code (start_char, end_char, label)] to score for
| categories applied to spans. #[code start_char] and #[code end_char]
| should be character offsets, label can be either a string or an
| integer ID, and score should be a float.
+row
+cell #[code user_data]
+cell -
+cell A generic storage area, for user custom data.
+row
+cell #[code is_tagged]
+cell bool
+cell
| A flag indicating that the document has been part-of-speech
| tagged.
+row
+cell #[code is_parsed]
+cell bool
+cell A flag indicating that the document has been syntactically parsed.
+row
+cell #[code is_sentenced]
+cell bool
+cell
| A flag indicating that sentence boundaries have been applied to
| the document.
+row
+cell #[code sentiment]
+cell float
+cell The document's positivity/negativity score, if available.
+row
+cell #[code user_hooks]
+cell dict
+cell
| A dictionary that allows customisation of the #[code Doc]'s
| properties.
+row
+cell #[code user_token_hooks]
+cell dict
+cell
| A dictionary that allows customisation of properties of
| #[code Token] children.
+row
+cell #[code user_span_hooks]
+cell dict
+cell
| A dictionary that allows customisation of properties of
| #[code Span] children.
+row
+cell #[code _]
+cell #[code Underscore]
+cell
| User space for adding custom
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].

View File

@ -1,6 +0,0 @@
//- 💫 DOCS > API > ENTITYRECOGNIZER
include ../_includes/_mixins
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
!=partial("pipe", { subclass: "EntityRecognizer", short: "ner", pipeline_id: "ner" })

View File

@ -1,35 +0,0 @@
//- 💫 DOCS > API > GOLDCORPUS
include ../_includes/_mixins
p
| This class manages annotations for tagging, dependency parsing and NER.
+h(2, "init") GoldCorpus.__init__
+tag method
p Create a #[code GoldCorpus].
+table(["Name", "Type", "Description"])
+row
+cell #[code train]
+cell unicode or #[code Path] or iterable
+cell
| Training data, as a path (file or directory) or iterable. If an
| iterable, each item should be a #[code (text, paragraphs)]
| tuple, where each paragraph is a tuple
| #[code.u-break (sentences, brackets)],and each sentence is a
| tuple #[code.u-break (ids, words, tags, heads, ner)]. See the
| implementation of
| #[+src(gh("spacy", "spacy/gold.pyx")) #[code gold.read_json_file]]
| for further details.
+row
+cell #[code dev]
+cell unicode or #[code Path] or iterable
+cell Development data, as a path (file or directory) or iterable.
+row("foot")
+cell returns
+cell #[code GoldCorpus]
+cell The newly constructed object.

View File

@ -1,203 +0,0 @@
//- 💫 DOCS > API > GOLDPARSE
include ../_includes/_mixins
p Collection for training annotations.
+h(2, "init") GoldParse.__init__
+tag method
p Create a #[code GoldParse].
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document the annotations refer to.
+row
+cell #[code words]
+cell iterable
+cell A sequence of unicode word strings.
+row
+cell #[code tags]
+cell iterable
+cell A sequence of strings, representing tag annotations.
+row
+cell #[code heads]
+cell iterable
+cell A sequence of integers, representing syntactic head offsets.
+row
+cell #[code deps]
+cell iterable
+cell A sequence of strings, representing the syntactic relation types.
+row
+cell #[code entities]
+cell iterable
+cell A sequence of named entity annotations, either as BILUO tag strings, or as #[code (start_char, end_char, label)] tuples, representing the entity positions.
+row("foot")
+cell returns
+cell #[code GoldParse]
+cell The newly constructed object.
+h(2, "len") GoldParse.__len__
+tag method
p Get the number of gold-standard tokens.
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of gold-standard tokens.
+h(2, "is_projective") GoldParse.is_projective
+tag property
p
| Whether the provided syntactic annotations form a projective dependency
| tree.
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether annotations form projective tree.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code tags]
+cell list
+cell The part-of-speech tag annotations.
+row
+cell #[code heads]
+cell list
+cell The syntactic head annotations.
+row
+cell #[code labels]
+cell list
+cell The syntactic relation-type annotations.
+row
+cell #[code ents]
+cell list
+cell The named entity annotations.
+row
+cell #[code cand_to_gold]
+cell list
+cell The alignment from candidate tokenization to gold tokenization.
+row
+cell #[code gold_to_cand]
+cell list
+cell The alignment from gold tokenization to candidate tokenization.
+row
+cell #[code cats] #[+tag-new(2)]
+cell list
+cell
| Entries in the list should be either a label, or a
| #[code (start, end, label)] triple. The tuple form is used for
| categories applied to spans of the document.
+h(2, "util") Utilities
+h(3, "biluo_tags_from_offsets") gold.biluo_tags_from_offsets
+tag function
p
| Encode labelled spans into per-token tags, using the
| #[+a("/api/annotation#biluo") BILUO scheme] (Begin/In/Last/Unit/Out).
p
| Returns a list of unicode strings, describing the tags. Each tag string
| will be of the form of either #[code ""], #[code "O"] or
| #[code "{action}-{label}"], where action is one of #[code "B"],
| #[code "I"], #[code "L"], #[code "U"]. The string #[code &quot;-&quot;]
| is used where the entity offsets don't align with the tokenization in the
| #[code Doc] object. The training algorithm will view these as missing
| values. #[code O] denotes a non-entity token. #[code B] denotes the
| beginning of a multi-token entity, #[code I] the inside of an entity
| of three or more tokens, and #[code L] the end of an entity of two or
| more tokens. #[code U] denotes a single-token entity.
+aside-code("Example").
from spacy.gold import biluo_tags_from_offsets
doc = nlp(u'I like London.')
entities = [(7, 13, 'LOC')]
tags = biluo_tags_from_offsets(doc, entities)
assert tags == ['O', 'O', 'U-LOC', 'O']
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell
| The document that the entity offsets refer to. The output tags
| will refer to the token boundaries within the document.
+row
+cell #[code entities]
+cell iterable
+cell
| A sequence of #[code (start, end, label)] triples. #[code start]
| and #[code end] should be character-offset integers denoting the
| slice into the original string.
+row("foot")
+cell returns
+cell list
+cell
| Unicode strings, describing the
| #[+a("/api/annotation#biluo") BILUO] tags.
+h(3, "offsets_from_biluo_tags") gold.offsets_from_biluo_tags
p
| Encode per-token tags following the
| #[+a("/api/annotation#biluo") BILUO scheme] into entity offsets.
+aside-code("Example").
from spacy.gold import offsets_from_biluo_tags
doc = nlp('I like London.')
tags = ['O', 'O', 'U-LOC', 'O']
entities = offsets_from_biluo_tags(doc, tags)
assert entities == [(7, 13, 'LOC')]
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document that the BILUO tags refer to.
+row
+cell #[code entities]
+cell iterable
+cell
| A sequence of #[+a("/api/annotation#biluo") BILUO] tags with
| each tag describing one token. Each tag string will be of the
| form of either #[code ""], #[code "O"] or
| #[code "{action}-{label}"], where action is one of #[code "B"],
| #[code "I"], #[code "L"], #[code "U"].
+row("foot")
+cell returns
+cell list
+cell
| A sequence of #[code (start, end, label)] triples. #[code start]
| and #[code end] will be character-offset integers denoting the
| slice into the original string.

View File

@ -1,157 +0,0 @@
//- 💫 DOCS > API > ARCHITECTURE
include ../_includes/_mixins
+section("basics")
include ../usage/_spacy-101/_architecture
+section("nn-model")
+h(2, "nn-model") Neural network model architecture
p
| spaCy's statistical models have been custom-designed to give a
| high-performance mix of speed and accuracy. The current architecture
| hasn't been published yet, but in the meantime we prepared a video that
| explains how the models work, with particular focus on NER.
+youtube("sqDHBH9IjRU")
p
| The parsing model is a blend of recent results. The two recent
| inspirations have been the work of Eli Klipperwasser and Yoav Goldberg at
| Bar Ilan#[+fn(1)], and the SyntaxNet team from Google. The foundation of
| the parser is still based on the work of Joakim Nivre#[+fn(2)], who
| introduced the transition-based framework#[+fn(3)], the arc-eager
| transition system, and the imitation learning objective. The model is
| implemented using #[+a(gh("thinc")) Thinc], spaCy's machine learning
| library. We first predict context-sensitive vectors for each word in the
| input:
+code.
(embed_lower | embed_prefix | embed_suffix | embed_shape)
&gt;&gt; Maxout(token_width)
&gt;&gt; convolution ** 4
p
| This convolutional layer is shared between the tagger, parser and NER,
| and will also be shared by the future neural lemmatizer. Because the
| parser shares these layers with the tagger, the parser does not require
| tag features. I got this trick from David Weiss's "Stack Combination"
| paper#[+fn(4)].
p
| To boost the representation, the tagger actually predicts a "super tag"
| with POS, morphology and dependency label#[+fn(5)]. The tagger predicts
| these supertags by adding a softmax layer onto the convolutional layer
| so, we're teaching the convolutional layer to give us a representation
| that's one affine transform from this informative lexical information.
| This is obviously good for the parser (which backprops to the
| convolutions too). The parser model makes a state vector by concatenating
| the vector representations for its context tokens. The current context
| tokens:
+table
+row
+cell #[code S0], #[code S1], #[code S2]
+cell Top three words on the stack.
+row
+cell #[code B0], #[code B1]
+cell First two words of the buffer.
+row
+cell
| #[code S0L1], #[code S1L1], #[code S2L1], #[code B0L1],
| #[code B1L1]#[br]
| #[code S0L2], #[code S1L2], #[code S2L2], #[code B0L2],
| #[code B1L2]
+cell
| Leftmost and second leftmost children of #[code S0], #[code S1],
| #[code S2], #[code B0] and #[code B1].
+row
+cell
| #[code S0R1], #[code S1R1], #[code S2R1], #[code B0R1],
| #[code B1R1]#[br]
| #[code S0R2], #[code S1R2], #[code S2R2], #[code B0R2],
| #[code B1R2]
+cell
| Rightmost and second rightmost children of #[code S0], #[code S1],
| #[code S2], #[code B0] and #[code B1].
p
| This makes the state vector quite long: #[code 13*T], where #[code T] is
| the token vector width (128 is working well). Fortunately, there's a way
| to structure the computation to save some expense (and make it more
| GPU-friendly).
p
| The parser typically visits #[code 2*N] states for a sentence of length
| #[code N] (although it may visit more, if it back-tracks with a
| non-monotonic transition#[+fn(4)]). A naive implementation would require
| #[code 2*N (B, 13*T) @ (13*T, H)] matrix multiplications for a batch of
| size #[code B]. We can instead perform one #[code (B*N, T) @ (T, 13*H)]
| multiplication, to pre-compute the hidden weights for each positional
| feature with respect to the words in the batch. (Note that our token
| vectors come from the CNN — so we can't play this trick over the
| vocabulary. That's how Stanford's NN parser#[+fn(3)] works — and why its
| model is so big.)
p
| This pre-computation strategy allows a nice compromise between
| GPU-friendliness and implementation simplicity. The CNN and the wide
| lower layer are computed on the GPU, and then the precomputed hidden
| weights are moved to the CPU, before we start the transition-based
| parsing process. This makes a lot of things much easier. We don't have to
| worry about variable-length batch sizes, and we don't have to implement
| the dynamic oracle in CUDA to train.
p
| Currently the parser's loss function is multilabel log loss#[+fn(6)], as
| the dynamic oracle allows multiple states to be 0 cost. This is defined
| as follows, where #[code gZ] is the sum of the scores assigned to gold
| classes:
+code.
(exp(score) / Z) - (exp(score) / gZ)
+bibliography
+item
| #[+a("https://www.semanticscholar.org/paper/Simple-and-Accurate-Dependency-Parsing-Using-Bidir-Kiperwasser-Goldberg/3cf31ecb2724b5088783d7c96a5fc0d5604cbf41") Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations]
br
| Eliyahu Kiperwasser, Yoav Goldberg. (2016)
+item
| #[+a("https://www.semanticscholar.org/paper/A-Dynamic-Oracle-for-Arc-Eager-Dependency-Parsing-Goldberg-Nivre/22697256ec19ecc3e14fcfc63624a44cf9c22df4") A Dynamic Oracle for Arc-Eager Dependency Parsing]
br
| Yoav Goldberg, Joakim Nivre (2012)
+item
| #[+a("https://explosion.ai/blog/parsing-english-in-python") Parsing English in 500 Lines of Python]
br
| Matthew Honnibal (2013)
+item
| #[+a("https://www.semanticscholar.org/paper/Stack-propagation-Improved-Representation-Learning-Zhang-Weiss/0c133f79b23e8c680891d2e49a66f0e3d37f1466") Stack-propagation: Improved Representation Learning for Syntax]
br
| Yuan Zhang, David Weiss (2016)
+item
| #[+a("https://www.semanticscholar.org/paper/Deep-multi-task-learning-with-low-level-tasks-supe-S%C3%B8gaard-Goldberg/03ad06583c9721855ccd82c3d969a01360218d86") Deep multi-task learning with low level tasks supervised at lower layers]
br
| Anders Søgaard, Yoav Goldberg (2016)
+item
| #[+a("https://www.semanticscholar.org/paper/An-Improved-Non-monotonic-Transition-System-for-De-Honnibal-Johnson/4094cee47ade13b77b5ab4d2e6cb9dd2b8a2917c") An Improved Non-monotonic Transition System for Dependency Parsing]
br
| Matthew Honnibal, Mark Johnson (2015)
+item
| #[+a("http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf") A Fast and Accurate Dependency Parser using Neural Networks]
br
| Danqi Cheng, Christopher D. Manning (2014)
+item
| #[+a("https://www.semanticscholar.org/paper/Parsing-the-Wall-Street-Journal-using-a-Lexical-Fu-Riezler-King/0ad07862a91cd59b7eb5de38267e47725a62b8b2") Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques]
br
| Stefan Riezler et al. (2002)

View File

@ -1,702 +0,0 @@
//- 💫 DOCS > API > LANGUAGE
include ../_includes/_mixins
p
| Usually you'll load this once per process as #[code nlp] and pass the
| instance around your application. The #[code Language] class is created
| when you call #[+api("spacy#load") #[code spacy.load()]] and contains
| the shared vocabulary and #[+a("/usage/adding-languages") language data],
| optional model data loaded from a #[+a("/models") model package] or
| a path, and a #[+a("/usage/processing-pipelines") processing pipeline]
| containing components like the tagger or parser that are called on a
| document in order. You can also add your own processing pipeline
| components that take a #[code Doc] object, modify it and return it.
+h(2, "init") Language.__init__
+tag method
p Initialise a #[code Language] object.
+aside-code("Example").
from spacy.vocab import Vocab
from spacy.language import Language
nlp = Language(Vocab())
from spacy.lang.en import English
nlp = English()
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell
| A #[code Vocab] object. If #[code True], a vocab is created via
| #[code Language.Defaults.create_vocab].
+row
+cell #[code make_doc]
+cell callable
+cell
| A function that takes text and returns a #[code Doc] object.
| Usually a #[code Tokenizer].
+row
+cell #[code meta]
+cell dict
+cell
| Custom meta data for the #[code Language] class. Is written to by
| models to add model meta data.
+row("foot")
+cell returns
+cell #[code Language]
+cell The newly constructed object.
+h(2, "call") Language.__call__
+tag method
p
| Apply the pipeline to some text. The text can span multiple sentences,
| and can contain arbtrary whitespace. Alignment into the original string
| is preserved.
+aside-code("Example").
doc = nlp(u'An example sentence. Another sentence.')
assert (doc[0].text, doc[0].head.tag_) == ('An', 'NN')
+table(["Name", "Type", "Description"])
+row
+cell #[code text]
+cell unicode
+cell The text to be processed.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Doc]
+cell A container for accessing the annotations.
+infobox("Changed in v2.0", "⚠️")
| Pipeline components to prevent from being loaded can now be added as
| a list to #[code disable], instead of specifying one keyword argument
| per component.
+code-wrapper
+code-new doc = nlp(u"I don't want parsed", disable=['parser'])
+code-old doc = nlp(u"I don't want parsed", parse=False)
+h(2, "pipe") Language.pipe
+tag method
p
| Process texts as a stream, and yield #[code Doc] objects in order.
| Supports GIL-free multi-threading.
+infobox("Important note for spaCy v2.0.x", "⚠️")
| By default, multiple threads will be launched for matrix multiplication,
| which may be inefficient on multi-core machines. Setting
| #[code OPENBLAS_NUM_THREADS=1] should fix this problem. spaCy v2.1.x
| will be switching to single-thread by default.
+aside-code("Example").
texts = [u'One document.', u'...', u'Lots of documents']
for doc in nlp.pipe(texts, batch_size=50, n_threads=4):
assert doc.is_parsed
+table(["Name", "Type", "Description"])
+row
+cell #[code texts]
+cell -
+cell A sequence of unicode objects.
+row
+cell #[code as_tuples]
+cell bool
+cell
| If set to #[code True], inputs should be a sequence of
| #[code (text, context)] tuples. Output will then be a sequence of
| #[code (doc, context)] tuples. Defaults to #[code False].
+row
+cell #[code n_threads]
+cell int
+cell
| The number of worker threads to use. If #[code -1], OpenMP will
| decide how many to use at run time. Default is #[code 2].
+row
+cell #[code batch_size]
+cell int
+cell The number of texts to buffer.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell yields
+cell #[code Doc]
+cell Documents in the order of the original text.
+h(2, "update") Language.update
+tag method
p Update the models in the pipeline.
+aside-code("Example").
for raw_text, entity_offsets in train_data:
doc = nlp.make_doc(raw_text)
gold = GoldParse(doc, entities=entity_offsets)
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell
| A batch of #[code Doc] objects or unicode. If unicode, a
| #[code Doc] object will be created from the text.
+row
+cell #[code golds]
+cell iterable
+cell
| A batch of #[code GoldParse] objects or dictionaries.
| Dictionaries will be used to create
| #[+api("goldparse") #[code GoldParse]] objects. For the available
| keys and their usage, see
| #[+api("goldparse#init") #[code GoldParse.__init__]].
+row
+cell #[code drop]
+cell float
+cell The dropout rate.
+row
+cell #[code sgd]
+cell callable
+cell An optimizer.
+row("foot")
+cell returns
+cell dict
+cell Results from the update.
+h(2, "begin_training") Language.begin_training
+tag method
p
| Allocate models, pre-process training data and acquire an optimizer.
+aside-code("Example").
optimizer = nlp.begin_training(gold_tuples)
+table(["Name", "Type", "Description"])
+row
+cell #[code gold_tuples]
+cell iterable
+cell Gold-standard training data.
+row
+cell #[code **cfg]
+cell -
+cell Config parameters.
+row("foot")
+cell returns
+cell callable
+cell An optimizer.
+h(2, "use_params") Language.use_params
+tag contextmanager
+tag method
p
| Replace weights of models in the pipeline with those provided in the
| params dictionary. Can be used as a contextmanager, in which case, models
| go back to their original weights after the block.
+aside-code("Example").
with nlp.use_params(optimizer.averages):
nlp.to_disk('/tmp/checkpoint')
+table(["Name", "Type", "Description"])
+row
+cell #[code params]
+cell dict
+cell A dictionary of parameters keyed by model ID.
+row
+cell #[code **cfg]
+cell -
+cell Config parameters.
+h(2, "preprocess_gold") Language.preprocess_gold
+tag method
p
| Can be called before training to pre-process gold data. By default, it
| handles nonprojectivity and adds missing tags to the tag map.
+table(["Name", "Type", "Description"])
+row
+cell #[code docs_golds]
+cell iterable
+cell Tuples of #[code Doc] and #[code GoldParse] objects.
+row("foot")
+cell yields
+cell tuple
+cell Tuples of #[code Doc] and #[code GoldParse] objects.
+h(2, "create_pipe") Language.create_pipe
+tag method
+tag-new(2)
p Create a pipeline component from a factory.
+aside-code("Example").
parser = nlp.create_pipe('parser')
nlp.add_pipe(parser)
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell
| Factory name to look up in
| #[+api("language#class-attributes") #[code Language.factories]].
+row
+cell #[code config]
+cell dict
+cell Configuration parameters to initialise component.
+row("foot")
+cell returns
+cell callable
+cell The pipeline component.
+h(2, "add_pipe") Language.add_pipe
+tag method
+tag-new(2)
p
| Add a component to the processing pipeline. Valid components are
| callables that take a #[code Doc] object, modify it and return it. Only
| one of #[code before], #[code after], #[code first] or #[code last] can
| be set. Default behaviour is #[code last=True].
+aside-code("Example").
def component(doc):
# modify Doc and return it
return doc
nlp.add_pipe(component, before='ner')
nlp.add_pipe(component, name='custom_name', last=True)
+table(["Name", "Type", "Description"])
+row
+cell #[code component]
+cell callable
+cell The pipeline component.
+row
+cell #[code name]
+cell unicode
+cell
| Name of pipeline component. Overwrites existing
| #[code component.name] attribute if available. If no #[code name]
| is set and the component exposes no name attribute,
| #[code component.__name__] is used. An error is raised if the
| name already exists in the pipeline.
+row
+cell #[code before]
+cell unicode
+cell Component name to insert component directly before.
+row
+cell #[code after]
+cell unicode
+cell Component name to insert component directly after:
+row
+cell #[code first]
+cell bool
+cell Insert component first / not first in the pipeline.
+row
+cell #[code last]
+cell bool
+cell Insert component last / not last in the pipeline.
+h(2, "has_pipe") Language.has_pipe
+tag method
+tag-new(2)
p
| Check whether a component is present in the pipeline. Equivalent to
| #[code name in nlp.pipe_names].
+aside-code("Example").
nlp.add_pipe(lambda doc: doc, name='component')
assert 'component' in nlp.pipe_names
assert nlp.has_pipe('component')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the pipeline component to check.
+row("foot")
+cell returns
+cell bool
+cell Whether a component of that name exists in the pipeline.
+h(2, "get_pipe") Language.get_pipe
+tag method
+tag-new(2)
p Get a pipeline component for a given component name.
+aside-code("Example").
parser = nlp.get_pipe('parser')
custom_component = nlp.get_pipe('custom_component')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the pipeline component to get.
+row("foot")
+cell returns
+cell callable
+cell The pipeline component.
+h(2, "replace_pipe") Language.replace_pipe
+tag method
+tag-new(2)
p Replace a component in the pipeline.
+aside-code("Example").
nlp.replace_pipe('parser', my_custom_parser)
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the component to replace.
+row
+cell #[code component]
+cell callable
+cell The pipeline component to inser.
+h(2, "rename_pipe") Language.rename_pipe
+tag method
+tag-new(2)
p
| Rename a component in the pipeline. Useful to create custom names for
| pre-defined and pre-loaded components. To change the default name of
| a component added to the pipeline, you can also use the #[code name]
| argument on #[+api("language#add_pipe") #[code add_pipe]].
+aside-code("Example").
nlp.rename_pipe('parser', 'spacy_parser')
+table(["Name", "Type", "Description"])
+row
+cell #[code old_name]
+cell unicode
+cell Name of the component to rename.
+row
+cell #[code new_name]
+cell unicode
+cell New name of the component.
+h(2, "remove_pipe") Language.remove_pipe
+tag method
+tag-new(2)
p
| Remove a component from the pipeline. Returns the removed component name
| and component function.
+aside-code("Example").
name, component = nlp.remove_pipe('parser')
assert name == 'parser'
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the component to remove.
+row("foot")
+cell returns
+cell tuple
+cell A #[code (name, component)] tuple of the removed component.
+h(2, "disable_pipes") Language.disable_pipes
+tag contextmanager
+tag-new(2)
p
| Disable one or more pipeline components. If used as a context manager,
| the pipeline will be restored to the initial state at the end of the
| block. Otherwise, a #[code DisabledPipes] object is returned, that has a
| #[code .restore()] method you can use to undo your changes.
+aside-code("Example").
with nlp.disable_pipes('tagger', 'parser'):
optimizer = nlp.begin_training(gold_tuples)
disabled = nlp.disable_pipes('tagger', 'parser')
optimizer = nlp.begin_training(gold_tuples)
disabled.restore()
+table(["Name", "Type", "Description"])
+row
+cell #[code *disabled]
+cell unicode
+cell Names of pipeline components to disable.
+row("foot")
+cell returns
+cell #[code DisabledPipes]
+cell
| The disabled pipes that can be restored by calling the object's
| #[code .restore()] method.
+h(2, "to_disk") Language.to_disk
+tag method
+tag-new(2)
p
| Save the current state to a directory. If a model is loaded, this will
| #[strong include the model].
+aside-code("Example").
nlp.to_disk('/path/to/models')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable]
| and prevent from being saved.
+h(2, "from_disk") Language.from_disk
+tag method
+tag-new(2)
p
| Loads state from a directory. Modifies the object in place and returns
| it. If the saved #[code Language] object contains a model, the
| model will be loaded. Note that this method is commonly used via the
| subclasses like #[code English] or #[code German] to make
| language-specific functionality like the
| #[+a("/usage/adding-languages#lex-attrs") lexical attribute getters]
| available to the loaded object.
+aside-code("Example").
from spacy.language import Language
nlp = Language().from_disk('/path/to/model')
# using language-specific subclass
from spacy.lang.en import English
nlp = English().from_disk('/path/to/en_model')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell The modified #[code Language] object.
+infobox("Changed in v2.0", "⚠️")
| As of spaCy v2.0, the #[code save_to_directory] method has been
| renamed to #[code to_disk], to improve consistency across classes.
| Pipeline components to prevent from being loaded can now be added as
| a list to #[code disable], instead of specifying one keyword argument
| per component.
+code-wrapper
+code-new nlp = English().from_disk(disable=['tagger', 'ner'])
+code-old nlp = spacy.load('en', tagger=False, entity=False)
+h(2, "to_bytes") Language.to_bytes
+tag method
p Serialize the current state to a binary string.
+aside-code("Example").
nlp_bytes = nlp.to_bytes()
+table(["Name", "Type", "Description"])
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable]
| and prevent from being serialized.
+row("foot")
+cell returns
+cell bytes
+cell The serialized form of the #[code Language] object.
+h(2, "from_bytes") Language.from_bytes
+tag method
p
| Load state from a binary string. Note that this method is commonly used
| via the subclasses like #[code English] or #[code German] to make
| language-specific functionality like the
| #[+a("/usage/adding-languages#lex-attrs") lexical attribute getters]
| available to the loaded object.
+aside-code("Example").
from spacy.lang.en import English
nlp_bytes = nlp.to_bytes()
nlp2 = English()
nlp2.from_bytes(nlp_bytes)
+table(["Name", "Type", "Description"])
+row
+cell #[code bytes_data]
+cell bytes
+cell The data to load from.
+row
+cell #[code disable]
+cell list
+cell
| Names of pipeline components to
| #[+a("/usage/processing-pipelines#disabling") disable].
+row("foot")
+cell returns
+cell #[code Language]
+cell The #[code Language] object.
+infobox("Changed in v2.0", "⚠️")
| Pipeline components to prevent from being loaded can now be added as
| a list to #[code disable], instead of specifying one keyword argument
| per component.
+code-wrapper
+code-new nlp = English().from_bytes(bytes, disable=['tagger', 'ner'])
+code-old nlp = English().from_bytes('en', tagger=False, entity=False)
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A container for the lexical types.
+row
+cell #[code tokenizer]
+cell #[code Tokenizer]
+cell The tokenizer.
+row
+cell #[code make_doc]
+cell #[code lambda text: Doc]
+cell Create a #[code Doc] object from unicode text.
+row
+cell #[code pipeline]
+cell list
+cell
| List of #[code (name, component)] tuples describing the current
| processing pipeline, in order.
+row
+cell #[code pipe_names]
+tag-new(2)
+cell list
+cell List of pipeline component names, in order.
+row
+cell #[code meta]
+cell dict
+cell
| Custom meta data for the Language class. If a model is loaded,
| contains meta data of the model.
+row
+cell #[code path]
+tag-new(2)
+cell #[code Path]
+cell
| Path to the model data directory, if a model is loaded. Otherwise
| #[code None].
+h(2, "class-attributes") Class attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code Defaults]
+cell class
+cell
| Settings, data and factory methods for creating the
| #[code nlp] object and processing pipeline.
+row
+cell #[code lang]
+cell unicode
+cell
| Two-letter language ID, i.e.
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code].
+row
+cell #[code factories]
+tag-new(2)
+cell dict
+cell
| Factories that create pre-defined pipeline components, e.g. the
| tagger, parser or entity recognizer, keyed by their component
| name.

View File

@ -1,160 +0,0 @@
//- 💫 DOCS > API > LEMMATIZER
include ../_includes/_mixins
p
| The #[code Lemmatizer] supports simple part-of-speech-sensitive suffix
| rules and lookup tables.
+h(2, "init") Lemmatizer.__init__
+tag method
p Create a #[code Lemmatizer].
+aside-code("Example").
from spacy.lemmatizer import Lemmatizer
lemmatizer = Lemmatizer()
+table(["Name", "Type", "Description"])
+row
+cell #[code index]
+cell dict / #[code None]
+cell Inventory of lemmas in the language.
+row
+cell #[code exceptions]
+cell dict / #[code None]
+cell Mapping of string forms to lemmas that bypass the #[code rules].
+row
+cell #[code rules]
+cell dict / #[code None]
+cell List of suffix rewrite rules.
+row
+cell #[code lookup]
+cell dict / #[code None]
+cell Lookup table mapping string to their lemmas.
+row("foot")
+cell returns
+cell #[code Lemmatizer]
+cell The newly created object.
+h(2, "call") Lemmatizer.__call__
+tag method
p Lemmatize a string.
+aside-code("Example").
from spacy.lemmatizer import Lemmatizer
from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES
lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
lemmas = lemmatizer(u'ducks', u'NOUN')
assert lemmas == [u'duck']
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to lemmatize, e.g. the token text.
+row
+cell #[code univ_pos]
+cell unicode / int
+cell The token's universal part-of-speech tag.
+row
+cell #[code morphology]
+cell dict / #[code None]
+cell
| Morphological features following the
| #[+a("http://universaldependencies.org/") Universal Dependencies]
| scheme.
+row("foot")
+cell returns
+cell list
+cell The available lemmas for the string.
+h(2, "lookup") Lemmatizer.lookup
+tag method
+tag-new(2)
p
| Look up a lemma in the lookup table, if available. If no lemma is found,
| the original string is returned. Languages can provide a
| #[+a("/usage/adding-languages#lemmatizer") lookup table] via the
| #[code lemma_lookup] variable, set on the individual #[code Language]
| class.
+aside-code("Example").
lookup = {u'going': u'go'}
lemmatizer = Lemmatizer(lookup=lookup)
assert lemmatizer.lookup(u'going') == u'go'
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to look up.
+row("foot")
+cell returns
+cell unicode
+cell The lemma if the string was found, otherwise the original string.
+h(2, "is_base_form") Lemmatizer.is_base_form
+tag method
p
| Check whether we're dealing with an uninflected paradigm, so we can
| avoid lemmatization entirely.
+aside-code("Example").
pos = 'verb'
morph = {'VerbForm': 'inf'}
is_base_form = lemmatizer.is_base_form(pos, morph)
assert is_base_form == True
+table(["Name", "Type", "Description"])
+row
+cell #[code univ_pos]
+cell unicode / int
+cell The token's universal part-of-speech tag.
+row
+cell #[code morphology]
+cell dict
+cell The token's morphological features.
+row("foot")
+cell returns
+cell bool
+cell
| Whether the token's part-of-speech tag and morphological features
| describe a base form.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code index]
+cell dict / #[code None]
+cell Inventory of lemmas in the language.
+row
+cell #[code exc]
+cell dict / #[code None]
+cell Mapping of string forms to lemmas that bypass the #[code rules].
+row
+cell #[code rules]
+cell dict / #[code None]
+cell List of suffix rewrite rules.
+row
+cell #[code lookup_table]
+tag-new(2)
+cell dict / #[code None]
+cell The lemma lookup table, if available.

View File

@ -1,384 +0,0 @@
//- 💫 DOCS > API > LEXEME
include ../_includes/_mixins
p
| An entry in the vocabulary. A #[code Lexeme] has no string context it's
| a word type, as opposed to a word token. It therefore has no
| part-of-speech tag, dependency parse, or lemma (if lemmatization depends
| on the part-of-speech tag).
+h(2, "init") Lexeme.__init__
+tag method
p Create a #[code Lexeme] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The parent vocabulary.
+row
+cell #[code orth]
+cell int
+cell The orth id of the lexeme.
+row("foot")
+cell returns
+cell #[code Lexeme]
+cell The newly constructed object.
+h(2, "set_flag") Lexeme.set_flag
+tag method
p Change the value of a boolean flag.
+aside-code("Example").
COOL_FLAG = nlp.vocab.add_flag(lambda text: False)
nlp.vocab[u'spaCy'].set_flag(COOL_FLAG, True)
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to set.
+row
+cell #[code value]
+cell bool
+cell The new value of the flag.
+h(2, "check_flag") Lexeme.check_flag
+tag method
p Check the value of a boolean flag.
+aside-code("Example").
is_my_library = lambda text: text in ['spaCy', 'Thinc']
MY_LIBRARY = nlp.vocab.add_flag(is_my_library)
assert nlp.vocab[u'spaCy'].check_flag(MY_LIBRARY) == True
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to query.
+row("foot")
+cell returns
+cell bool
+cell The value of the flag.
+h(2, "similarity") Lexeme.similarity
+tag method
+tag-model("vectors")
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
+aside-code("Example").
apple = nlp.vocab[u'apple']
orange = nlp.vocab[u'orange']
apple_orange = apple.similarity(orange)
orange_apple = orange.similarity(apple)
assert apple_orange == orange_apple
+table(["Name", "Type", "Description"])
+row
+cell other
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+row("foot")
+cell returns
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "has_vector") Lexeme.has_vector
+tag property
+tag-model("vectors")
p
| A boolean value indicating whether a word vector is associated with the
| lexeme.
+aside-code("Example").
apple = nlp.vocab[u'apple']
assert apple.has_vector
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the lexeme has a vector data attached.
+h(2, "vector") Lexeme.vector
+tag property
+tag-model("vectors")
p A real-valued meaning representation.
+aside-code("Example").
apple = nlp.vocab[u'apple']
assert apple.vector.dtype == 'float32'
assert apple.vector.shape == (300,)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the lexeme's semantics.
+h(2, "vector_norm") Lexeme.vector_norm
+tag property
+tag-model("vectors")
p The L2 norm of the lexeme's vector representation.
+aside-code("Example").
apple = nlp.vocab[u'apple']
pasta = nlp.vocab[u'pasta']
apple.vector_norm # 7.1346845626831055
pasta.vector_norm # 7.759851932525635
assert apple.vector_norm != pasta.vector_norm
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell float
+cell The L2 norm of the vector representation.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The lexeme's vocabulary.
+row
+cell #[code text]
+cell unicode
+cell Verbatim text content.
+row
+cell #[code orth]
+cell int
+cell ID of the verbatim text content.
+row
+cell #[code orth_]
+cell unicode
+cell
| Verbatim text content (identical to #[code Lexeme.text]). Exists
| mostly for consistency with the other attributes.
+row
+cell #[code lex_id]
+cell int
+cell ID of the lexeme's lexical type.
+row
+cell #[code rank]
+cell int
+cell
| Sequential ID of the lexemes's lexical type, used to index into
| tables, e.g. for word vectors.
+row
+cell #[code flags]
+cell int
+cell Container of the lexeme's binary flags.
+row
+cell #[code norm]
+cell int
+cell The lexemes's norm, i.e. a normalised form of the lexeme text.
+row
+cell #[code norm_]
+cell unicode
+cell The lexemes's norm, i.e. a normalised form of the lexeme text.
+row
+cell #[code lower]
+cell int
+cell Lowercase form of the word.
+row
+cell #[code lower_]
+cell unicode
+cell Lowercase form of the word.
+row
+cell #[code shape]
+cell int
+cell Transform of the word's string, to show orthographic features.
+row
+cell #[code shape_]
+cell unicode
+cell Transform of the word's string, to show orthographic features.
+row
+cell #[code prefix]
+cell int
+cell
| Length-N substring from the start of the word. Defaults to
| #[code N=1].
+row
+cell #[code prefix_]
+cell unicode
+cell
| Length-N substring from the start of the word. Defaults to
| #[code N=1].
+row
+cell #[code suffix]
+cell int
+cell
| Length-N substring from the end of the word. Defaults to
| #[code N=3].
+row
+cell #[code suffix_]
+cell unicode
+cell
| Length-N substring from the start of the word. Defaults to
| #[code N=3].
+row
+cell #[code is_alpha]
+cell bool
+cell
| Does the lexeme consist of alphabetic characters? Equivalent to
| #[code lexeme.text.isalpha()].
+row
+cell #[code is_ascii]
+cell bool
+cell
| Does the lexeme consist of ASCII characters? Equivalent to
| #[code [any(ord(c) >= 128 for c in lexeme.text)]].
+row
+cell #[code is_digit]
+cell bool
+cell
| Does the lexeme consist of digits? Equivalent to
| #[code lexeme.text.isdigit()].
+row
+cell #[code is_lower]
+cell bool
+cell
| Is the lexeme in lowercase? Equivalent to
| #[code lexeme.text.islower()].
+row
+cell #[code is_upper]
+cell bool
+cell
| Is the lexeme in uppercase? Equivalent to
| #[code lexeme.text.isupper()].
+row
+cell #[code is_title]
+cell bool
+cell
| Is the lexeme in titlecase? Equivalent to
| #[code lexeme.text.istitle()].
+row
+cell #[code is_punct]
+cell bool
+cell Is the lexeme punctuation?
+row
+cell #[code is_left_punct]
+cell bool
+cell Is the lexeme a left punctuation mark, e.g. #[code (]?
+row
+cell #[code is_right_punct]
+cell bool
+cell Is the lexeme a right punctuation mark, e.g. #[code )]?
+row
+cell #[code is_space]
+cell bool
+cell
| Does the lexeme consist of whitespace characters? Equivalent to
| #[code lexeme.text.isspace()].
+row
+cell #[code is_bracket]
+cell bool
+cell Is the lexeme a bracket?
+row
+cell #[code is_quote]
+cell bool
+cell Is the lexeme a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the lexeme a currency symbol?
+row
+cell #[code like_url]
+cell bool
+cell Does the lexeme resemble a URL?
+row
+cell #[code like_num]
+cell bool
+cell Does the lexeme represent a number? e.g. "10.9", "10", "ten", etc.
+row
+cell #[code like_email]
+cell bool
+cell Does the lexeme resemble an email address?
+row
+cell #[code is_oov]
+cell bool
+cell Is the lexeme out-of-vocabulary?
+row
+cell #[code is_stop]
+cell bool
+cell Is the lexeme part of a "stop list"?
+row
+cell #[code lang]
+cell int
+cell Language of the parent vocabulary.
+row
+cell #[code lang_]
+cell unicode
+cell Language of the parent vocabulary.
+row
+cell #[code prob]
+cell float
+cell Smoothed log probability estimate of the lexeme's type.
+row
+cell #[code cluster]
+cell int
+cell Brown cluster ID.
+row
+cell #[code sentiment]
+cell float
+cell
| A scalar value indicating the positivity or negativity of the
| lexeme.

View File

@ -1,281 +0,0 @@
//- 💫 DOCS > API > MATCHER
include ../_includes/_mixins
+infobox("Changed in v2.0", "⚠️")
| As of spaCy 2.0, #[code Matcher.add_pattern] and #[code Matcher.add_entity]
| are deprecated and have been replaced with a simpler
| #[+api("matcher#add") #[code Matcher.add]] that lets you add a list of
| patterns and a callback for a given match ID. #[code Matcher.get_entity]
| is now called #[+api("matcher#get") #[code matcher.get]].
| #[code Matcher.load] (not useful, as it didn't allow specifying callbacks),
| and #[code Matcher.has_entity] (now redundant) have been removed. The
| concept of "acceptor functions" has also been retired this logic can
| now be handled in the callback functions.
+h(2, "init") Matcher.__init__
+tag method
p Create the rule-based #[code Matcher].
+aside-code("Example").
from spacy.matcher import Matcher
patterns = {'HelloWorld': [{'LOWER': 'hello'}, {'LOWER': 'world'}]}
matcher = Matcher(nlp.vocab)
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell
| The vocabulary object, which must be shared with the documents
| the matcher will operate on.
+row
+cell #[code patterns]
+cell dict
+cell Patterns to add to the matcher, keyed by ID.
+row("foot")
+cell returns
+cell #[code Matcher]
+cell The newly constructed object.
+h(2, "call") Matcher.__call__
+tag method
p Find all token sequences matching the supplied patterns on the #[code Doc].
+aside-code("Example").
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern = [{'LOWER': "hello"}, {'LOWER': "world"}]
matcher.add("HelloWorld", None, pattern)
doc = nlp(u'hello world!')
matches = matcher(doc)
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to match over.
+row("foot")
+cell returns
+cell list
+cell
| A list of #[code (match_id, start, end)] tuples, describing the
| matches. A match tuple describes a span #[code doc[start:end]].
| The #[code match_id] is the ID of the added match pattern.
+infobox("Important note")
| By default, the matcher #[strong does not perform any action] on matches,
| like tagging matched phrases with entity types. Instead, actions need to
| be specified when #[strong adding patterns or entities], by
| passing in a callback function as the #[code on_match] argument on
| #[+api("matcher#add") #[code add]]. This allows you to define custom
| actions per pattern within the same matcher. For example, you might only
| want to merge some entity types, and set custom flags for other matched
| patterns. For more details and examples, see the usage guide on
| #[+a("/usage/linguistic-features#rule-based-matching") rule-based matching].
+h(2, "pipe") Matcher.pipe
+tag method
p Match a stream of documents, yielding them in turn.
+aside-code("Example").
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
for doc in matcher.pipe(docs, batch_size=50, n_threads=4):
pass
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell A stream of documents.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel, if the #[code Matcher] implementation supports
| multi-threading.
+row
+cell #[code return_matches]
+tag-new(2.1)
+cell bool
+cell
| Yield the match lists along with the docs, making results
| #[code (doc, matches)] tuples.
+row
+cell #[code as_tuples]
+tag-new(2.1)
+cell bool
+cell
| Interpret the input stream as #[code (doc, context)] tuples, and
| yield #[code (result, context)] tuples out. If both
| #[code return_matches] and #[code as_tuples] are #[code True],
| the output will be a sequence of
| #[code ((doc, matches), context)] tuples.
+row("foot")
+cell yields
+cell #[code Doc]
+cell Documents, in order.
+h(2, "len") Matcher.__len__
+tag method
+tag-new(2)
p
| Get the number of rules added to the matcher. Note that this only returns
| the number of rules (identical with the number of IDs), not the number
| of individual patterns.
+aside-code("Example").
matcher = Matcher(nlp.vocab)
assert len(matcher) == 0
matcher.add('Rule', None, [{'ORTH': 'test'}])
assert len(matcher) == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of rules.
+h(2, "contains") Matcher.__contains__
+tag method
+tag-new(2)
p Check whether the matcher contains rules for a match ID.
+aside-code("Example").
matcher = Matcher(nlp.vocab)
assert 'Rule' not in matcher
matcher.add('Rule', None, [{'ORTH': 'test'}])
assert 'Rule' in matcher
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell unicode
+cell The match ID.
+row("foot")
+cell returns
+cell int
+cell Whether the matcher contains rules for this match ID.
+h(2, "add") Matcher.add
+tag method
+tag-new(2)
p
| Add a rule to the matcher, consisting of an ID key, one or more patterns, and
| a callback function to act on the matches. The callback function will
| receive the arguments #[code matcher], #[code doc], #[code i] and
| #[code matches]. If a pattern already exists for the given ID, the
| patterns will be extended. An #[code on_match] callback will be
| overwritten.
+aside-code("Example").
def on_match(matcher, doc, id, matches):
print('Matched!', matches)
matcher = Matcher(nlp.vocab)
matcher.add('HelloWorld', on_match, [{'LOWER': 'hello'}, {'LOWER': 'world'}])
matcher.add('GoogleMaps', on_match, [{'ORTH': 'Google'}, {'ORTH': 'Maps'}])
doc = nlp(u'HELLO WORLD on Google Maps.')
matches = matcher(doc)
+table(["Name", "Type", "Description"])
+row
+cell #[code match_id]
+cell unicode
+cell An ID for the thing you're matching.
+row
+cell #[code on_match]
+cell callable or #[code None]
+cell
| Callback function to act on matches. Takes the arguments
| #[code matcher], #[code doc], #[code i] and #[code matches].
+row
+cell #[code *patterns]
+cell list
+cell
| Match pattern. A pattern consists of a list of dicts, where each
| dict describes a token.
+infobox("Changed in v2.0", "⚠️")
| As of spaCy 2.0, #[code Matcher.add_pattern] and #[code Matcher.add_entity]
| are deprecated and have been replaced with a simpler
| #[+api("matcher#add") #[code Matcher.add]] that lets you add a list of
| patterns and a callback for a given match ID.
+code-wrapper
+code-new.
matcher.add('GoogleNow', merge_phrases, [{ORTH: 'Google'}, {ORTH: 'Now'}])
+code-old.
matcher.add_entity('GoogleNow', on_match=merge_phrases)
matcher.add_pattern('GoogleNow', [{ORTH: 'Google'}, {ORTH: 'Now'}])
+h(2, "remove") Matcher.remove
+tag method
+tag-new(2)
p
| Remove a rule from the matcher. A #[code KeyError] is raised if the match
| ID does not exist.
+aside-code("Example").
matcher.add('Rule', None, [{'ORTH': 'test'}])
assert 'Rule' in matcher
matcher.remove('Rule')
assert 'Rule' not in matcher
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell unicode
+cell The ID of the match rule.
+h(2, "get") Matcher.get
+tag method
+tag-new(2)
p
| Retrieve the pattern stored for a key. Returns the rule as an
| #[code (on_match, patterns)] tuple containing the callback and available
| patterns.
+aside-code("Example").
pattern = [{'ORTH': 'test'}]
matcher.add('Rule', None, pattern)
on_match, patterns = matcher.get('Rule')
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell unicode
+cell The ID of the match rule.
+row("foot")
+cell returns
+cell tuple
+cell The rule, as an #[code (on_match, patterns)] tuple.

View File

@ -1,181 +0,0 @@
//- 💫 DOCS > API > PHRASEMATCHER
include ../_includes/_mixins
p
| The #[code PhraseMatcher] lets you efficiently match large terminology
| lists. While the #[+api("matcher") #[code Matcher]] lets you match
| sequences based on lists of token descriptions, the #[code PhraseMatcher]
| accepts match patterns in the form of #[code Doc] objects.
+h(2, "init") PhraseMatcher.__init__
+tag method
p Create the rule-based #[code PhraseMatcher].
+aside-code("Example").
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab, max_length=6)
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell
| The vocabulary object, which must be shared with the documents
| the matcher will operate on.
+row
+cell #[code max_length]
+cell int
+cell Maximum length of a phrase pattern to add.
+row("foot")
+cell returns
+cell #[code PhraseMatcher]
+cell The newly constructed object.
+h(2, "call") PhraseMatcher.__call__
+tag method
p Find all token sequences matching the supplied patterns on the #[code Doc].
+aside-code("Example").
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
doc = nlp(u"Barack Obama lifts America one last time in emotional farewell")
matches = matcher(doc)
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to match over.
+row("foot")
+cell returns
+cell list
+cell
| A list of #[code (match_id, start, end)] tuples, describing the
| matches. A match tuple describes a span #[code doc[start:end]].
| The #[code match_id] is the ID of the added match pattern.
+h(2, "pipe") PhraseMatcher.pipe
+tag method
p Match a stream of documents, yielding them in turn.
+aside-code("Example").
from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)
for doc in matcher.pipe(texts, batch_size=50, n_threads=4):
pass
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell A stream of documents.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel, if the #[code PhraseMatcher] implementation supports
| multi-threading.
+row("foot")
+cell yields
+cell #[code Doc]
+cell Documents, in order.
+h(2, "len") PhraseMatcher.__len__
+tag method
p
| Get the number of rules added to the matcher. Note that this only returns
| the number of rules (identical with the number of IDs), not the number
| of individual patterns.
+aside-code("Example").
matcher = PhraseMatcher(nlp.vocab)
assert len(matcher) == 0
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
assert len(matcher) == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of rules.
+h(2, "contains") PhraseMatcher.__contains__
+tag method
p Check whether the matcher contains rules for a match ID.
+aside-code("Example").
matcher = PhraseMatcher(nlp.vocab)
assert 'OBAMA' not in matcher
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
assert 'OBAMA' in matcher
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell unicode
+cell The match ID.
+row("foot")
+cell returns
+cell int
+cell Whether the matcher contains rules for this match ID.
+h(2, "add") PhraseMatcher.add
+tag method
p
| Add a rule to the matcher, consisting of an ID key, one or more patterns, and
| a callback function to act on the matches. The callback function will
| receive the arguments #[code matcher], #[code doc], #[code i] and
| #[code matches]. If a pattern already exists for the given ID, the
| patterns will be extended. An #[code on_match] callback will be
| overwritten.
+aside-code("Example").
def on_match(matcher, doc, id, matches):
print('Matched!', matches)
matcher = PhraseMatcher(nlp.vocab)
matcher.add('OBAMA', on_match, nlp(u"Barack Obama"))
matcher.add('HEALTH', on_match, nlp(u"health care reform"),
nlp(u"healthcare reform"))
doc = nlp(u"Barack Obama urges Congress to find courage to defend his healthcare reforms")
matches = matcher(doc)
+table(["Name", "Type", "Description"])
+row
+cell #[code match_id]
+cell unicode
+cell An ID for the thing you're matching.
+row
+cell #[code on_match]
+cell callable or #[code None]
+cell
| Callback function to act on matches. Takes the arguments
| #[code matcher], #[code doc], #[code i] and #[code matches].
+row
+cell #[code *docs]
+cell list
+cell
| #[code Doc] objects of the phrases to match.

View File

@ -1,449 +0,0 @@
//- 💫 DOCS > API > PIPE
include ../_includes/_mixins
//- This page can be used as a template for all other classes that inherit
//- from `Pipe`.
if subclass
+infobox
| This class is a subclass of #[+api("pipe") #[code Pipe]] and
| follows the same API. The pipeline component is available in the
| #[+a("/usage/processing-pipelines") processing pipeline] via the ID
| #[code "#{pipeline_id}"].
else
p
| This class is not instantiated directly. Components inherit from it,
| and it defines the interface that components should follow to
| function as components in a spaCy analysis pipeline.
- CLASSNAME = subclass || 'Pipe'
- VARNAME = short || CLASSNAME.toLowerCase()
+h(2, "model") #{CLASSNAME}.Model
+tag classmethod
p
| Initialise a model for the pipe. The model should implement the
| #[code thinc.neural.Model] API. Wrappers are under development for
| most major machine learning libraries.
+table(["Name", "Type", "Description"])
+row
+cell #[code **kwargs]
+cell -
+cell Parameters for initialising the model
+row("foot")
+cell returns
+cell object
+cell The initialised model.
+h(2, "init") #{CLASSNAME}.__init__
+tag method
p Create a new pipeline instance.
+aside-code("Example").
from spacy.pipeline import #{CLASSNAME}
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.from_disk('/path/to/model')
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The shared vocabulary.
+row
+cell #[code model]
+cell #[code thinc.neural.Model] or #[code True]
+cell
| The model powering the pipeline component. If no model is
| supplied, the model is created when you call
| #[code begin_training], #[code from_disk] or #[code from_bytes].
+row
+cell #[code **cfg]
+cell -
+cell Configuration parameters.
+row("foot")
+cell returns
+cell #[code=CLASSNAME]
+cell The newly constructed object.
+h(2, "call") #{CLASSNAME}.__call__
+tag method
p
| Apply the pipe to one document. The document is modified in place, and
| returned. Both #[code #{CLASSNAME}.__call__] and
| #[code #{CLASSNAME}.pipe] should delegate to the
| #[code #{CLASSNAME}.predict] and #[code #{CLASSNAME}.set_annotations]
| methods.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
doc = nlp(u"This is a sentence.")
processed = #{VARNAME}(doc)
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to process.
+row("foot")
+cell returns
+cell #[code Doc]
+cell The processed document.
+h(2, "pipe") #{CLASSNAME}.pipe
+tag method
p
| Apply the pipe to a stream of documents. Both
| #[code #{CLASSNAME}.__call__] and #[code #{CLASSNAME}.pipe] should
| delegate to the #[code #{CLASSNAME}.predict] and
| #[code #{CLASSNAME}.set_annotations] methods.
+aside-code("Example").
texts = [u'One doc', u'...', u'Lots of docs']
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
for doc in #{VARNAME}.pipe(texts, batch_size=50):
pass
+table(["Name", "Type", "Description"])
+row
+cell #[code stream]
+cell iterable
+cell A stream of documents.
+row
+cell #[code batch_size]
+cell int
+cell The number of texts to buffer. Defaults to #[code 128].
+row
+cell #[code n_threads]
+cell int
+cell
| The number of worker threads to use. If #[code -1], OpenMP will
| decide how many to use at run time. Default is #[code -1].
+row("foot")
+cell yields
+cell #[code Doc]
+cell Processed documents in the order of the original text.
+h(2, "predict") #{CLASSNAME}.predict
+tag method
p
| Apply the pipeline's model to a batch of docs, without modifying them.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
scores = #{VARNAME}.predict([doc1, doc2])
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell The documents to predict.
+row("foot")
+cell returns
+cell -
+cell Scores from the model.
+h(2, "set_annotations") #{CLASSNAME}.set_annotations
+tag method
p
| Modify a batch of documents, using pre-computed scores.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
scores = #{VARNAME}.predict([doc1, doc2])
#{VARNAME}.set_annotations([doc1, doc2], scores)
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell The documents to modify.
+row
+cell #[code scores]
+cell -
+cell The scores to set, produced by #[code #{CLASSNAME}.predict].
+h(2, "update") #{CLASSNAME}.update
+tag method
p
| Learn from a batch of documents and gold-standard information, updating
| the pipe's model. Delegates to #[code #{CLASSNAME}.predict] and
| #[code #{CLASSNAME}.get_loss].
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
losses = {}
optimizer = nlp.begin_training()
#{VARNAME}.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell A batch of documents to learn from.
+row
+cell #[code golds]
+cell iterable
+cell The gold-standard data. Must have the same length as #[code docs].
+row
+cell #[code drop]
+cell float
+cell The dropout rate.
+row
+cell #[code sgd]
+cell callable
+cell
| The optimizer. Should take two arguments #[code weights] and
| #[code gradient], and an optional ID.
+row
+cell #[code losses]
+cell dict
+cell
| Optional record of the loss during training. The value keyed by
| the model's name is updated.
+h(2, "get_loss") #{CLASSNAME}.get_loss
+tag method
p
| Find the loss and gradient of loss for the batch of documents and their
| predicted scores.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
scores = #{VARNAME}.predict([doc1, doc2])
loss, d_loss = #{VARNAME}.get_loss([doc1, doc2], [gold1, gold2], scores)
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell iterable
+cell The batch of documents.
+row
+cell #[code golds]
+cell iterable
+cell The gold-standard data. Must have the same length as #[code docs].
+row
+cell #[code scores]
+cell -
+cell Scores representing the model's predictions.
+row("foot")
+cell returns
+cell tuple
+cell The loss and the gradient, i.e. #[code (loss, gradient)].
+h(2, "begin_training") #{CLASSNAME}.begin_training
+tag method
p
| Initialise the pipe for training, using data exampes if available. If no
| model has been initialised yet, the model is added.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
nlp.pipeline.append(#{VARNAME})
optimizer = #{VARNAME}.begin_training(pipeline=nlp.pipeline)
+table(["Name", "Type", "Description"])
+row
+cell #[code gold_tuples]
+cell iterable
+cell
| Optional gold-standard annotations from which to construct
| #[+api("goldparse") #[code GoldParse]] objects.
+row
+cell #[code pipeline]
+cell list
+cell
| Optional list of #[+api("pipe") #[code Pipe]] components that
| this component is part of.
+row
+cell #[code sgd]
+cell callable
+cell
| An optional optimizer. Should take two arguments #[code weights]
| and #[code gradient], and an optional ID. Will be created via
| #[+api(CLASSNAME.toLowerCase() + "#create_optimizer") #[code create_optimizer]]
| if not set.
+row("foot")
+cell returns
+cell callable
+cell An optimizer.
+h(2, "create_optimizer") #{CLASSNAME}.create_optimizer
+tag method
p
| Create an optmizer for the pipeline component.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
optimizer = #{VARNAME}.create_optimizer()
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell callable
+cell The optimizer.
+h(2, "use_params") #{CLASSNAME}.use_params
+tag method
+tag contextmanager
p Modify the pipe's model, to use the given parameter values.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
with #{VARNAME}.use_params():
#{VARNAME}.to_disk('/best_model')
+table(["Name", "Type", "Description"])
+row
+cell #[code params]
+cell -
+cell
| The parameter values to use in the model. At the end of the
| context, the original parameters are restored.
+h(2, "add_label") #{CLASSNAME}.add_label
+tag method
p Add a new label to the pipe.
if CLASSNAME == "Tagger"
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.add_label('MY_LABEL', {POS: 'NOUN'})
else
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.add_label('MY_LABEL')
+table(["Name", "Type", "Description"])
+row
+cell #[code label]
+cell unicode
+cell The label to add.
if CLASSNAME == "Tagger"
+row
+cell #[code values]
+cell dict
+cell
| Optional values to map to the label, e.g. a tag map
| dictionary.
+h(2, "to_disk") #{CLASSNAME}.to_disk
+tag method
p Serialize the pipe to disk.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.to_disk('/path/to/#{VARNAME}')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+h(2, "from_disk") #{CLASSNAME}.from_disk
+tag method
p Load the pipe from disk. Modifies the object in place and returns it.
+aside-code("Example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.from_disk('/path/to/#{VARNAME}')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row("foot")
+cell returns
+cell #[code=CLASSNAME]
+cell The modified #[code=CLASSNAME] object.
+h(2, "to_bytes") #{CLASSNAME}.to_bytes
+tag method
+aside-code("example").
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}_bytes = #{VARNAME}.to_bytes()
p Serialize the pipe to a bytestring.
+table(["Name", "Type", "Description"])
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being serialized.
+row("foot")
+cell returns
+cell bytes
+cell The serialized form of the #[code=CLASSNAME] object.
+h(2, "from_bytes") #{CLASSNAME}.from_bytes
+tag method
p Load the pipe from a bytestring. Modifies the object in place and returns it.
+aside-code("Example").
#{VARNAME}_bytes = #{VARNAME}.to_bytes()
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
#{VARNAME}.from_bytes(#{VARNAME}_bytes)
+table(["Name", "Type", "Description"])
+row
+cell #[code bytes_data]
+cell bytes
+cell The data to load from.
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being loaded.
+row("foot")
+cell returns
+cell #[code=CLASSNAME]
+cell The #[code=CLASSNAME] object.

View File

@ -1,655 +0,0 @@
//- 💫 DOCS > API > SPAN
include ../_includes/_mixins
p A slice from a #[+api("doc") #[code Doc]] object.
+h(2, "init") Span.__init__
+tag method
p Create a Span object from the #[code slice doc[start : end]].
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
span = doc[1:4]
assert [t.text for t in span] == [u'it', u'back', u'!']
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code start]
+cell int
+cell The index of the first token of the span.
+row
+cell #[code end]
+cell int
+cell The index of the first token after the span.
+row
+cell #[code label]
+cell int
+cell A label to attach to the span, e.g. for named entities.
+row
+cell #[code vector]
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A meaning representation of the span.
+row("foot")
+cell returns
+cell #[code Span]
+cell The newly constructed object.
+h(2, "getitem") Span.__getitem__
+tag method
p Get a #[code Token] object.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
span = doc[1:4]
assert span[1].text == 'back'
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The index of the token within the span.
+row("foot")
+cell returns
+cell #[code Token]
+cell The token at #[code span[i]].
p Get a #[code Span] object.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
span = doc[1:4]
assert span[1:3].text == 'back!'
+table(["Name", "Type", "Description"])
+row
+cell #[code start_end]
+cell tuple
+cell The slice of the span to get.
+row("foot")
+cell returns
+cell #[code Span]
+cell The span at #[code span[start : end]].
+h(2, "iter") Span.__iter__
+tag method
p Iterate over #[code Token] objects.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
span = doc[1:4]
assert [t.text for t in span] == ['it', 'back', '!']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A #[code Token] object.
+h(2, "len") Span.__len__
+tag method
p Get the number of tokens in the span.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
span = doc[1:4]
assert len(span) == 3
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of tokens in the span.
+h(2, "set_extension") Span.set_extension
+tag classmethod
+tag-new(2)
p
| Define a custom attribute on the #[code Span] which becomes available via
| #[code Span._]. For details, see the documentation on
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
+aside-code("Example").
from spacy.tokens import Span
city_getter = lambda span: any(city in span.text for city in ('New York', 'Paris', 'Berlin'))
Span.set_extension('has_city', getter=city_getter)
doc = nlp(u'I like New York in Autumn')
assert doc[1:4]._.has_city
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell
| Name of the attribute to set by the extension. For example,
| #[code 'my_attr'] will be available as #[code span._.my_attr].
+row
+cell #[code default]
+cell -
+cell
| Optional default value of the attribute if no getter or method
| is defined.
+row
+cell #[code method]
+cell callable
+cell
| Set a custom method on the object, for example
| #[code span._.compare(other_span)].
+row
+cell #[code getter]
+cell callable
+cell
| Getter function that takes the object and returns an attribute
| value. Is called when the user accesses the #[code ._] attribute.
+row
+cell #[code setter]
+cell callable
+cell
| Setter function that takes the #[code Span] and a value, and
| modifies the object. Is called when the user writes to the
| #[code Span._] attribute.
+h(2, "get_extension") Span.get_extension
+tag classmethod
+tag-new(2)
p
| Look up a previously registered extension by name. Returns a 4-tuple
| #[code.u-break (default, method, getter, setter)] if the extension is
| registered. Raises a #[code KeyError] otherwise.
+aside-code("Example").
from spacy.tokens import Span
Span.set_extension('is_city', default=False)
extension = Span.get_extension('is_city')
assert extension == (False, None, None, None)
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| extension.
+h(2, "has_extension") Span.has_extension
+tag classmethod
+tag-new(2)
p Check whether an extension has been registered on the #[code Span] class.
+aside-code("Example").
from spacy.tokens import Span
Span.set_extension('is_city', default=False)
assert Span.has_extension('is_city')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the extension has been registered.
+h(2, "remove_extension") Span.remove_extension
+tag classmethod
+tag-new("2.0.12")
p Remove a previously registered extension.
+aside-code("Example").
from spacy.tokens import Span
Span.set_extension('is_city', default=False)
removed = Span.remove_extension('is_city')
assert not Span.has_extension('is_city')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| removed extension.
+h(2, "similarity") Span.similarity
+tag method
+tag-model("vectors")
p
| Make a semantic similarity estimate. The default estimate is cosine
| similarity using an average of word vectors.
+aside-code("Example").
doc = nlp(u'green apples and red oranges')
green_apples = doc[:2]
red_oranges = doc[3:]
apples_oranges = green_apples.similarity(red_oranges)
oranges_apples = red_oranges.similarity(green_apples)
assert apples_oranges == oranges_apples
+table(["Name", "Type", "Description"])
+row
+cell #[code other]
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+row("foot")
+cell returns
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "get_lca_matrix") Span.get_lca_matrix
+tag method
p
| Calculates the lowest common ancestor matrix for a given #[code Span].
| Returns LCA matrix containing the integer index of the ancestor, or
| #[code -1] if no common ancestor is found, e.g. if span excludes a
| necessary ancestor.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn')
span = doc[1:4]
matrix = span.get_lca_matrix()
# array([[0, 0, 0], [0, 1, 2], [0, 2, 2]], dtype=int32)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
+cell The lowest common ancestor matrix of the #[code Span].
+h(2, "to_array") Span.to_array
+tag method
+tag-new(2)
p
| Given a list of #[code M] attribute IDs, export the tokens to a numpy
| #[code ndarray] of shape #[code (N, M)], where #[code N] is the length of
| the document. The values will be 32-bit integers.
+aside-code("Example").
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
doc = nlp(u'I like New York in Autumn.')
span = doc[2:3]
# All strings mapped to integers, for easy export to numpy
np_array = span.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_ids]
+cell list
+cell A list of attribute ID ints.
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[long, ndim=2]]
+cell
| A feature matrix, with one row per word, and one column per
| attribute indicated in the input #[code attr_ids].
+h(2, "merge") Span.merge
+tag method
p Retokenize the document, such that the span is merged into a single token.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
span = doc[2:4]
span.merge()
assert len(doc) == 6
assert doc[2].text == 'New York'
+table(["Name", "Type", "Description"])
+row
+cell #[code **attributes]
+cell -
+cell
| Attributes to assign to the merged token. By default, attributes
| are inherited from the syntactic root token of the span.
+row("foot")
+cell returns
+cell #[code Token]
+cell The newly merged token.
+h(2, "ents") Span.ents
+tag property
+tag-model("NER")
+tag-new("2.0.12")
p
| Iterate over the entities in the span. Yields named-entity
| #[code Span] objects, if the entity recognizer has been applied to the
| parent document.
+aside-code("Example").
doc = nlp(u'Mr. Best flew to New York on Saturday morning.')
span = doc[0:6]
ents = list(span.ents)
assert ents[0].label == 346
assert ents[0].label_ == 'PERSON'
assert ents[0].text == 'Mr. Best'
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Span]
+cell Entities in the document.
+h(2, "as_doc") Span.as_doc
p
| Create a new #[code Doc] object corresponding to the #[code Span], with
| a copy of the data.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
span = doc[2:4]
doc2 = span.as_doc()
assert doc2.text == 'New York'
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code Doc]
+cell A #[code Doc] object of the #[code Span]'s content.
+h(2, "root") Span.root
+tag property
+tag-model("parse")
p
| The token within the span that's highest in the parse tree. If there's a
| tie, the earliest is preferred.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
i, like, new, york, in_, autumn, dot = range(len(doc))
assert doc[new].head.text == 'York'
assert doc[york].head.text == 'like'
new_york = doc[new&#58;york+1]
assert new_york.root.text == 'York'
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code Token]
+cell The root token.
+h(2, "lefts") Span.lefts
+tag property
+tag-model("parse")
p Tokens that are to the left of the span, whose heads are within the span.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
lefts = [t.text for t in doc[3:7].lefts]
assert lefts == [u'New']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A left-child of a token of the span.
+h(2, "rights") Span.rights
+tag property
+tag-model("parse")
p Tokens that are to the right of the span, whose heads are within the span.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
rights = [t.text for t in doc[2:4].rights]
assert rights == [u'in']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A right-child of a token of the span.
+h(2, "n_lefts") Span.n_lefts
+tag property
+tag-model("parse")
p
| The number of tokens that are to the left of the span, whose heads are
| within the span.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
assert doc[3:7].n_lefts == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of left-child tokens.
+h(2, "n_rights") Span.n_rights
+tag property
+tag-model("parse")
p
| The number of tokens that are to the right of the span, whose heads are
| within the span.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
assert doc[2:4].n_rights == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of right-child tokens.
+h(2, "subtree") Span.subtree
+tag property
+tag-model("parse")
p Tokens within the span and tokens which descend from them.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
subtree = [t.text for t in doc[:3].subtree]
assert subtree == [u'Give', u'it', u'back', u'!']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A token within the span, or a descendant from it.
+h(2, "has_vector") Span.has_vector
+tag property
+tag-model("vectors")
p
| A boolean value indicating whether a word vector is associated with the
| object.
+aside-code("Example").
doc = nlp(u'I like apples')
assert doc[1:].has_vector
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the span has a vector data attached.
+h(2, "vector") Span.vector
+tag property
+tag-model("vectors")
p
| A real-valued meaning representation. Defaults to an average of the
| token vectors.
+aside-code("Example").
doc = nlp(u'I like apples')
assert doc[1:].vector.dtype == 'float32'
assert doc[1:].vector.shape == (300,)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the span's semantics.
+h(2, "vector_norm") Span.vector_norm
+tag property
+tag-model("vectors")
p
| The L2 norm of the span's vector representation.
+aside-code("Example").
doc = nlp(u'I like apples')
doc[1:].vector_norm # 4.800883928527915
doc[2:].vector_norm # 6.895897646384268
assert doc[1:].vector_norm != doc[2:].vector_norm
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell float
+cell The L2 norm of the vector representation.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code sent]
+cell #[code Span]
+cell The sentence span that this span is a part of.
+row
+cell #[code start]
+cell int
+cell The token offset for the start of the span.
+row
+cell #[code end]
+cell int
+cell The token offset for the end of the span.
+row
+cell #[code start_char]
+cell int
+cell The character offset for the start of the span.
+row
+cell #[code end_char]
+cell int
+cell The character offset for the end of the span.
+row
+cell #[code text]
+cell unicode
+cell A unicode representation of the span text.
+row
+cell #[code text_with_ws]
+cell unicode
+cell
| The text content of the span with a trailing whitespace character
| if the last token has one.
+row
+cell #[code orth]
+cell int
+cell ID of the verbatim text content.
+row
+cell #[code orth_]
+cell unicode
+cell
| Verbatim text content (identical to #[code Span.text]). Exists
| mostly for consistency with the other attributes.
+row
+cell #[code label]
+cell int
+cell The span's label.
+row
+cell #[code label_]
+cell unicode
+cell The span's label.
+row
+cell #[code lemma_]
+cell unicode
+cell The span's lemma.
+row
+cell #[code ent_id]
+cell int
+cell The hash value of the named entity the token is an instance of.
+row
+cell #[code ent_id_]
+cell unicode
+cell The string ID of the named entity the token is an instance of.
+row
+cell #[code sentiment]
+cell float
+cell
| A scalar value indicating the positivity or negativity of the
| span.
+row
+cell #[code _]
+cell #[code Underscore]
+cell
| User space for adding custom
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].

View File

@ -1,239 +0,0 @@
//- 💫 DOCS > API > STRINGSTORE
include ../_includes/_mixins
p
| Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values
| instead of integer IDs. This ensures that strings always map to the
| same ID, even from different #[code StringStores].
+h(2, "init") StringStore.__init__
+tag method
p
| Create the #[code StringStore].
+aside-code("Example").
from spacy.strings import StringStore
stringstore = StringStore([u'apple', u'orange'])
+table(["Name", "Type", "Description"])
+row
+cell #[code strings]
+cell iterable
+cell A sequence of unicode strings to add to the store.
+row("foot")
+cell returns
+cell #[code StringStore]
+cell The newly constructed object.
+h(2, "len") StringStore.__len__
+tag method
p Get the number of strings in the store.
+aside-code("Example").
stringstore = StringStore([u'apple', u'orange'])
assert len(stringstore) == 2
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of strings in the store.
+h(2, "getitem") StringStore.__getitem__
+tag method
p Retrieve a string from a given hash, or vice versa.
+aside-code("Example").
stringstore = StringStore([u'apple', u'orange'])
apple_hash = stringstore[u'apple']
assert apple_hash == 8566208034543834098
assert stringstore[apple_hash] == u'apple'
+table(["Name", "Type", "Description"])
+row
+cell #[code string_or_id]
+cell bytes, unicode or uint64
+cell The value to encode.
+row("foot")
+cell returns
+cell unicode or int
+cell The value to be retrieved.
+h(2, "contains") StringStore.__contains__
+tag method
p Check whether a string is in the store.
+aside-code("Example").
stringstore = StringStore([u'apple', u'orange'])
assert u'apple' in stringstore
assert not u'cherry' in stringstore
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the store contains the string.
+h(2, "iter") StringStore.__iter__
+tag method
p
| Iterate over the strings in the store, in order. Note that a newly
| initialised store will always include an empty string #[code ''] at
| position #[code 0].
+aside-code("Example").
stringstore = StringStore([u'apple', u'orange'])
all_strings = [s for s in stringstore]
assert all_strings == [u'apple', u'orange']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell unicode
+cell A string in the store.
+h(2, "add") StringStore.add
+tag method
+tag-new(2)
p Add a string to the #[code StringStore].
+aside-code("Example").
stringstore = StringStore([u'apple', u'orange'])
banana_hash = stringstore.add(u'banana')
assert len(stringstore) == 3
assert banana_hash == 2525716904149915114
assert stringstore[banana_hash] == u'banana'
assert stringstore[u'banana'] == banana_hash
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to add.
+row("foot")
+cell returns
+cell uint64
+cell The string's hash value.
+h(2, "to_disk") StringStore.to_disk
+tag method
+tag-new(2)
p Save the current state to a directory.
+aside-code("Example").
stringstore.to_disk('/path/to/strings')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+h(2, "from_disk") StringStore.from_disk
+tag method
+tag-new(2)
p Loads state from a directory. Modifies the object in place and returns it.
+aside-code("Example").
from spacy.strings import StringStore
stringstore = StringStore().from_disk('/path/to/strings')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row("foot")
+cell returns
+cell #[code StringStore]
+cell The modified #[code StringStore] object.
+h(2, "to_bytes") StringStore.to_bytes
+tag method
p Serialize the current state to a binary string.
+aside-code("Example").
store_bytes = stringstore.to_bytes()
+table(["Name", "Type", "Description"])
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being serialized.
+row("foot")
+cell returns
+cell bytes
+cell The serialized form of the #[code StringStore] object.
+h(2, "from_bytes") StringStore.from_bytes
+tag method
p Load state from a binary string.
+aside-code("Example").
fron spacy.strings import StringStore
store_bytes = stringstore.to_bytes()
new_store = StringStore().from_bytes(store_bytes)
+table(["Name", "Type", "Description"])
+row
+cell #[code bytes_data]
+cell bytes
+cell The data to load from.
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being loaded.
+row("foot")
+cell returns
+cell #[code StringStore]
+cell The #[code StringStore] object.
+h(2, "util") Utilities
+h(3, "hash_string") strings.hash_string
+tag function
p Get a 64-bit hash for a given string.
+aside-code("Example").
from spacy.strings import hash_string
assert hash_string(u'apple') == 8566208034543834098
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to hash.
+row("foot")
+cell returns
+cell uint64
+cell The hash.

View File

@ -1,6 +0,0 @@
//- 💫 DOCS > API > TAGGER
include ../_includes/_mixins
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
!=partial("pipe", { subclass: "Tagger", pipeline_id: "tagger" })

View File

@ -1,19 +0,0 @@
//- 💫 DOCS > API > TEXTCATEGORIZER
include ../_includes/_mixins
p
| The model supports classification with multiple, non-mutually exclusive
| labels. You can change the model architecture rather easily, but by
| default, the #[code TextCategorizer] class uses a convolutional
| neural network to assign position-sensitive vectors to each word in the
| document. The #[code TextCategorizer] uses its own CNN model, to
| avoid sharing weights with the other pipeline components. The document
| tensor is then summarized by concatenating max and mean pooling, and a
| multilayer perceptron is used to predict an output vector of length
| #[code nr_class], before a logistic activation is applied elementwise.
| The value of each output neuron is the probability that some class is
| present.
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
!=partial("pipe", { subclass: "TextCategorizer", short: "textcat", pipeline_id: "textcat" })

View File

@ -1,890 +0,0 @@
//- 💫 DOCS > API > TOKEN
include ../_includes/_mixins
p An individual token — i.e. a word, punctuation symbol, whitespace, etc.
+h(2, "init") Token.__init__
+tag method
p Construct a #[code Token] object.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
token = doc[0]
assert token.text == u'Give'
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A storage container for lexical types.
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code offset]
+cell int
+cell The index of the token within the document.
+row("foot")
+cell returns
+cell #[code Token]
+cell The newly constructed object.
+h(2, "len") Token.__len__
+tag method
p The number of unicode characters in the token, i.e. #[code token.text].
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
token = doc[0]
assert len(token) == 4
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of unicode characters in the token.
+h(2, "set_extension") Token.set_extension
+tag classmethod
+tag-new(2)
p
| Define a custom attribute on the #[code Token] which becomes available
| via #[code Token._]. For details, see the documentation on
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
+aside-code("Example").
from spacy.tokens import Token
fruit_getter = lambda token: token.text in ('apple', 'pear', 'banana')
Token.set_extension('is_fruit', getter=fruit_getter)
doc = nlp(u'I have an apple')
assert doc[3]._.is_fruit
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell
| Name of the attribute to set by the extension. For example,
| #[code 'my_attr'] will be available as #[code token._.my_attr].
+row
+cell #[code default]
+cell -
+cell
| Optional default value of the attribute if no getter or method
| is defined.
+row
+cell #[code method]
+cell callable
+cell
| Set a custom method on the object, for example
| #[code token._.compare(other_token)].
+row
+cell #[code getter]
+cell callable
+cell
| Getter function that takes the object and returns an attribute
| value. Is called when the user accesses the #[code ._] attribute.
+row
+cell #[code setter]
+cell callable
+cell
| Setter function that takes the #[code Token] and a value, and
| modifies the object. Is called when the user writes to the
| #[code Token._] attribute.
+h(2, "get_extension") Token.get_extension
+tag classmethod
+tag-new(2)
p
| Look up a previously registered extension by name. Returns a 4-tuple
| #[code.u-break (default, method, getter, setter)] if the extension is
| registered. Raises a #[code KeyError] otherwise.
+aside-code("Example").
from spacy.tokens import Token
Token.set_extension('is_fruit', default=False)
extension = Token.get_extension('is_fruit')
assert extension == (False, None, None, None)
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| extension.
+h(2, "has_extension") Token.has_extension
+tag classmethod
+tag-new(2)
p Check whether an extension has been registered on the #[code Token] class.
+aside-code("Example").
from spacy.tokens import Token
Token.set_extension('is_fruit', default=False)
assert Token.has_extension('is_fruit')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the extension has been registered.
+h(2, "remove_extension") Token.remove_extension
+tag classmethod
+tag-new("2.0.11")
p Remove a previously registered extension.
+aside-code("Example").
from spacy.tokens import Token
Token.set_extension('is_fruit', default=False)
removed = Token.remove_extension('is_fruit')
assert not Token.has_extension('is_fruit')
+table(["Name", "Type", "Description"])
+row
+cell #[code name]
+cell unicode
+cell Name of the extension.
+row("foot")
+cell returns
+cell tuple
+cell
| A #[code.u-break (default, method, getter, setter)] tuple of the
| removed extension.
+h(2, "check_flag") Token.check_flag
+tag method
p Check the value of a boolean flag.
+aside-code("Example").
from spacy.attrs import IS_TITLE
doc = nlp(u'Give it back! He pleaded.')
token = doc[0]
assert token.check_flag(IS_TITLE) == True
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the flag is set.
+h(2, "similarity") Token.similarity
+tag method
+tag-model("vectors")
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
+aside-code("Example").
apples, _, oranges = nlp(u'apples and oranges')
apples_oranges = apples.similarity(oranges)
oranges_apples = oranges.similarity(apples)
assert apples_oranges == oranges_apples
+table(["Name", "Type", "Description"])
+row
+cell other
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+row("foot")
+cell returns
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "nbor") Token.nbor
+tag method
p Get a neighboring token.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
give_nbor = doc[0].nbor()
assert give_nbor.text == u'it'
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The relative position of the token to get. Defaults to #[code 1].
+row("foot")
+cell returns
+cell #[code Token]
+cell The token at position #[code self.doc[self.i+i]].
+h(2, "is_ancestor") Token.is_ancestor
+tag method
+tag-model("parse")
p
| Check whether this token is a parent, grandparent, etc. of another
| in the dependency tree.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
give = doc[0]
it = doc[1]
assert give.is_ancestor(it)
+table(["Name", "Type", "Description"])
+row
+cell descendant
+cell #[code Token]
+cell Another token.
+row("foot")
+cell returns
+cell bool
+cell Whether this token is the ancestor of the descendant.
+h(2, "ancestors") Token.ancestors
+tag property
+tag-model("parse")
p The rightmost token of this token's syntactic descendants.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
it_ancestors = doc[1].ancestors
assert [t.text for t in it_ancestors] == [u'Give']
he_ancestors = doc[4].ancestors
assert [t.text for t in he_ancestors] == [u'pleaded']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell
| A sequence of ancestor tokens such that
| #[code ancestor.is_ancestor(self)].
+h(2, "conjuncts") Token.conjuncts
+tag property
+tag-model("parse")
p A sequence of coordinated tokens, including the token itself.
+aside-code("Example").
doc = nlp(u'I like apples and oranges')
apples_conjuncts = doc[2].conjuncts
assert [t.text for t in apples_conjuncts] == [u'oranges']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A coordinated token.
+h(2, "children") Token.children
+tag property
+tag-model("parse")
p A sequence of the token's immediate syntactic children.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
give_children = doc[0].children
assert [t.text for t in give_children] == [u'it', u'back', u'!']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A child token such that #[code child.head==self].
+h(2, "lefts") Token.lefts
+tag property
+tag-model("parse")
p
| The leftward immediate children of the word, in the syntactic dependency
| parse.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
lefts = [t.text for t in doc[3].lefts]
assert lefts == [u'New']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A left-child of the token.
+h(2, "rights") Token.rights
+tag property
+tag-model("parse")
p
| The rightward immediate children of the word, in the syntactic
| dependency parse.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
rights = [t.text for t in doc[3].rights]
assert rights == [u'in']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A right-child of the token.
+h(2, "n_lefts") Token.n_lefts
+tag property
+tag-model("parse")
p
| The number of leftward immediate children of the word, in the syntactic
| dependency parse.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
assert doc[3].n_lefts == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of left-child tokens.
+h(2, "n_rights") Token.n_rights
+tag property
+tag-model("parse")
p
| The number of rightward immediate children of the word, in the syntactic
| dependency parse.
+aside-code("Example").
doc = nlp(u'I like New York in Autumn.')
assert doc[3].n_rights == 1
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of right-child tokens.
+h(2, "subtree") Token.subtree
+tag property
+tag-model("parse")
p A sequence containing the token and all the token's syntactic descendants.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
give_subtree = doc[0].subtree
assert [t.text for t in give_subtree] == [u'Give', u'it', u'back', u'!']
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Token]
+cell A descendant token such that #[code self.is_ancestor(token) or token == self].
+h(2, "is_sent_start") Token.is_sent_start
+tag property
+tag-new(2)
p
| A boolean value indicating whether the token starts a sentence.
| #[code None] if unknown.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
assert doc[4].is_sent_start
assert not doc[5].is_sent_start
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the token starts a sentence.
+infobox("Changed in v2.0", "⚠️")
| As of spaCy v2.0, the #[code Token.sent_start] property is deprecated and
| has been replaced with #[code Token.is_sent_start], which returns a
| boolean value instead of a misleading #[code 0] for #[code False] and
| #[code 1] for #[code True]. It also now returns #[code None] if the
| answer is unknown, and fixes a quirk in the old logic that would always
| set the property to #[code 0] for the first word of the document.
+code-wrapper
+code-new assert doc[4].is_sent_start == True
+code-old assert doc[4].sent_start == 1
+h(2, "has_vector") Token.has_vector
+tag property
+tag-model("vectors")
p
| A boolean value indicating whether a word vector is associated with the
| token.
+aside-code("Example").
doc = nlp(u'I like apples')
apples = doc[2]
assert apples.has_vector
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the token has a vector data attached.
+h(2, "vector") Token.vector
+tag property
+tag-model("vectors")
p A real-valued meaning representation.
+aside-code("Example").
doc = nlp(u'I like apples')
apples = doc[2]
assert apples.vector.dtype == 'float32'
assert apples.vector.shape == (300,)
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the token's semantics.
+h(2, "vector_norm") Token.vector_norm
+tag property
+tag-model("vectors")
p The L2 norm of the token's vector representation.
+aside-code("Example").
doc = nlp(u'I like apples and pasta')
apples = doc[2]
pasta = doc[4]
apples.vector_norm # 6.89589786529541
pasta.vector_norm # 7.759851932525635
assert apples.vector_norm != pasta.vector_norm
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell float
+cell The L2 norm of the vector representation.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code sent]
+tag-new("2.0.12")
+cell #[code Span]
+cell The sentence span that this token is a part of.
+row
+cell #[code text]
+cell unicode
+cell Verbatim text content.
+row
+cell #[code text_with_ws]
+cell unicode
+cell Text content, with trailing space character if present.
+row
+cell #[code whitespace_]
+cell unicode
+cell Trailing space character if present.
+row
+cell #[code orth]
+cell int
+cell ID of the verbatim text content.
+row
+cell #[code orth_]
+cell unicode
+cell
| Verbatim text content (identical to #[code Token.text]). Exists
| mostly for consistency with the other attributes.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocab object of the parent #[code Doc].
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code head]
+cell #[code Token]
+cell The syntactic parent, or "governor", of this token.
+row
+cell #[code left_edge]
+cell #[code Token]
+cell The leftmost token of this token's syntactic descendants.
+row
+cell #[code right_edge]
+cell #[code Token]
+cell The rightmost token of this token's syntactic descendants.
+row
+cell #[code i]
+cell int
+cell The index of the token within the parent document.
+row
+cell #[code ent_type]
+cell int
+cell Named entity type.
+row
+cell #[code ent_type_]
+cell unicode
+cell Named entity type.
+row
+cell #[code ent_iob]
+cell int
+cell
| IOB code of named entity tag. #[code "B"]
| means the token begins an entity, #[code "I"] means it is inside
| an entity, #[code "O"] means it is outside an entity, and
| #[code ""] means no entity tag is set.
+row
+cell #[code ent_iob_]
+cell unicode
+cell
| IOB code of named entity tag. #[code "B"]
| means the token begins an entity, #[code "I"] means it is inside
| an entity, #[code "O"] means it is outside an entity, and
| #[code ""] means no entity tag is set.
+row
+cell #[code ent_id]
+cell int
+cell
| ID of the entity the token is an instance of, if any. Currently
| not used, but potentially for coreference resolution.
+row
+cell #[code ent_id_]
+cell unicode
+cell
| ID of the entity the token is an instance of, if any. Currently
| not used, but potentially for coreference resolution.
+row
+cell #[code lemma]
+cell int
+cell
| Base form of the token, with no inflectional suffixes.
+row
+cell #[code lemma_]
+cell unicode
+cell Base form of the token, with no inflectional suffixes.
+row
+cell #[code norm]
+cell int
+cell
| The token's norm, i.e. a normalised form of the token text.
| Usually set in the language's
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions] or
| #[+a("/usage/adding-languages#norm-exceptions") norm exceptions].
+row
+cell #[code norm_]
+cell unicode
+cell
| The token's norm, i.e. a normalised form of the token text.
| Usually set in the language's
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions] or
| #[+a("/usage/adding-languages#norm-exceptions") norm exceptions].
+row
+cell #[code lower]
+cell int
+cell Lowercase form of the token.
+row
+cell #[code lower_]
+cell unicode
+cell
| Lowercase form of the token text. Equivalent to
| #[code Token.text.lower()].
+row
+cell #[code shape]
+cell int
+cell
| Transform of the tokens's string, to show orthographic features.
| For example, "Xxxx" or "dd".
+row
+cell #[code shape_]
+cell unicode
+cell
| Transform of the tokens's string, to show orthographic features.
| For example, "Xxxx" or "dd".
+row
+cell #[code prefix]
+cell int
+cell
| Hash value of a length-N substring from the start of the
| token. Defaults to #[code N=1].
+row
+cell #[code prefix_]
+cell unicode
+cell
| A length-N substring from the start of the token. Defaults to
| #[code N=1].
+row
+cell #[code suffix]
+cell int
+cell
| Hash value of a length-N substring from the end of the token.
| Defaults to #[code N=3].
+row
+cell #[code suffix_]
+cell unicode
+cell
| Length-N substring from the end of the token. Defaults to
| #[code N=3].
+row
+cell #[code is_alpha]
+cell bool
+cell
| Does the token consist of alphabetic characters? Equivalent to
| #[code token.text.isalpha()].
+row
+cell #[code is_ascii]
+cell bool
+cell
| Does the token consist of ASCII characters? Equivalent to
| #[code all(ord(c) &lt; 128 for c in token.text)].
+row
+cell #[code is_digit]
+cell bool
+cell
| Does the token consist of digits? Equivalent to
| #[code token.text.isdigit()].
+row
+cell #[code is_lower]
+cell bool
+cell
| Is the token in lowercase? Equivalent to
| #[code token.text.islower()].
+row
+cell #[code is_upper]
+cell bool
+cell
| Is the token in uppercase? Equivalent to
| #[code token.text.isupper()].
+row
+cell #[code is_title]
+cell bool
+cell
| Is the token in titlecase? Equivalent to
| #[code token.text.istitle()].
+row
+cell #[code is_punct]
+cell bool
+cell Is the token punctuation?
+row
+cell #[code is_left_punct]
+cell bool
+cell Is the token a left punctuation mark, e.g. #[code (]?
+row
+cell #[code is_right_punct]
+cell bool
+cell Is the token a right punctuation mark, e.g. #[code )]?
+row
+cell #[code is_space]
+cell bool
+cell
| Does the token consist of whitespace characters? Equivalent to
| #[code token.text.isspace()].
+row
+cell #[code is_bracket]
+cell bool
+cell Is the token a bracket?
+row
+cell #[code is_quote]
+cell bool
+cell Is the token a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the token a currency symbol?
+row
+cell #[code like_url]
+cell bool
+cell Does the token resemble a URL?
+row
+cell #[code like_num]
+cell bool
+cell Does the token represent a number? e.g. "10.9", "10", "ten", etc.
+row
+cell #[code like_email]
+cell bool
+cell Does the token resemble an email address?
+row
+cell #[code is_oov]
+cell bool
+cell Is the token out-of-vocabulary?
+row
+cell #[code is_stop]
+cell bool
+cell Is the token part of a "stop list"?
+row
+cell #[code pos]
+cell int
+cell Coarse-grained part-of-speech.
+row
+cell #[code pos_]
+cell unicode
+cell Coarse-grained part-of-speech.
+row
+cell #[code tag]
+cell int
+cell Fine-grained part-of-speech.
+row
+cell #[code tag_]
+cell unicode
+cell Fine-grained part-of-speech.
+row
+cell #[code dep]
+cell int
+cell Syntactic dependency relation.
+row
+cell #[code dep_]
+cell unicode
+cell Syntactic dependency relation.
+row
+cell #[code lang]
+cell int
+cell Language of the parent document's vocabulary.
+row
+cell #[code lang_]
+cell unicode
+cell Language of the parent document's vocabulary.
+row
+cell #[code prob]
+cell float
+cell Smoothed log probability estimate of token's type.
+row
+cell #[code idx]
+cell int
+cell The character offset of the token within the parent document.
+row
+cell #[code sentiment]
+cell float
+cell
| A scalar value indicating the positivity or negativity of the
| token.
+row
+cell #[code lex_id]
+cell int
+cell Sequential ID of the token's lexical type.
+row
+cell #[code rank]
+cell int
+cell
| Sequential ID of the token's lexical type, used to index into
| tables, e.g. for word vectors.
+row
+cell #[code cluster]
+cell int
+cell Brown cluster ID.
+row
+cell #[code _]
+cell #[code Underscore]
+cell
| User space for adding custom
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].

View File

@ -1,229 +0,0 @@
//- 💫 DOCS > API > TOKENIZER
include ../_includes/_mixins
p
| Segment text, and create #[code Doc] objects with the discovered segment
| boundaries.
+h(2, "init") Tokenizer.__init__
+tag method
p Create a #[code Tokenizer], to create #[code Doc] objects given unicode text.
+aside-code("Example").
# Construction 1
from spacy.tokenizer import Tokenizer
tokenizer = Tokenizer(nlp.vocab)
# Construction 2
from spacy.lang.en import English
tokenizer = English().Defaults.create_tokenizer(nlp)
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A storage container for lexical types.
+row
+cell #[code rules]
+cell dict
+cell Exceptions and special-cases for the tokenizer.
+row
+cell #[code prefix_search]
+cell callable
+cell
| A function matching the signature of
| #[code re.compile(string).search] to match prefixes.
+row
+cell #[code suffix_search]
+cell callable
+cell
| A function matching the signature of
| #[code re.compile(string).search] to match suffixes.
+row
+cell #[code infix_finditer]
+cell callable
+cell
| A function matching the signature of
| #[code re.compile(string).finditer] to find infixes.
+row
+cell #[code token_match]
+cell callable
+cell A boolean function matching strings to be recognised as tokens.
+row("foot")
+cell returns
+cell #[code Tokenizer]
+cell The newly constructed object.
+h(2, "call") Tokenizer.__call__
+tag method
p Tokenize a string.
+aside-code("Example").
tokens = tokenizer(u'This is a sentence')
assert len(tokens) == 4
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to tokenize.
+row("foot")
+cell returns
+cell #[code Doc]
+cell A container for linguistic annotations.
+h(2, "pipe") Tokenizer.pipe
+tag method
p Tokenize a stream of texts.
+aside-code("Example").
texts = [u'One document.', u'...', u'Lots of documents']
for doc in tokenizer.pipe(texts, batch_size=50):
pass
+table(["Name", "Type", "Description"])
+row
+cell #[code texts]
+cell -
+cell A sequence of unicode texts.
+row
+cell #[code batch_size]
+cell int
+cell The number of texts to accumulate in an internal buffer.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads to use, if the implementation supports
| multi-threading. The default tokenizer is single-threaded.
+row("foot")
+cell yields
+cell #[code Doc]
+cell A sequence of Doc objects, in order.
+h(2, "find_infix") Tokenizer.find_infix
+tag method
p Find internal split points of the string.
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to split.
+row("foot")
+cell returns
+cell list
+cell
| A list of #[code re.MatchObject] objects that have #[code .start()]
| and #[code .end()] methods, denoting the placement of internal
| segment separators, e.g. hyphens.
+h(2, "find_prefix") Tokenizer.find_prefix
+tag method
p
| Find the length of a prefix that should be segmented from the string, or
| #[code None] if no prefix rules match.
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to segment.
+row("foot")
+cell returns
+cell int
+cell The length of the prefix if present, otherwise #[code None].
+h(2, "find_suffix") Tokenizer.find_suffix
+tag method
p
| Find the length of a suffix that should be segmented from the string, or
| #[code None] if no suffix rules match.
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to segment.
+row("foot")
+cell returns
+cell int / #[code None]
+cell The length of the suffix if present, otherwise #[code None].
+h(2, "add_special_case") Tokenizer.add_special_case
+tag method
p
| Add a special-case tokenization rule. This mechanism is also used to add
| custom tokenizer exceptions to the language data. See the usage guide
| on #[+a("/usage/adding-languages#tokenizer-exceptions") adding languages]
| for more details and examples.
+aside-code("Example").
from spacy.attrs import ORTH, LEMMA
case = [{"don't": [{ORTH: "do"}, {ORTH: "n't", LEMMA: "not"}]}]
tokenizer.add_special_case(case)
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to specially tokenize.
+row
+cell #[code token_attrs]
+cell iterable
+cell
| A sequence of dicts, where each dict describes a token and its
| attributes. The #[code ORTH] fields of the attributes must
| exactly match the string when they are concatenated.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocab object of the parent #[code Doc].
+row
+cell #[code prefix_search]
+cell -
+cell
| A function to find segment boundaries from the start of a
| string. Returns the length of the segment, or #[code None].
+row
+cell #[code suffix_search]
+cell -
+cell
| A function to find segment boundaries from the end of a string.
| Returns the length of the segment, or #[code None].
+row
+cell #[code infix_finditer]
+cell -
+cell
| A function to find internal segment separators, e.g. hyphens.
| Returns a (possibly empty) list of #[code re.MatchObject]
| objects.

View File

@ -1,20 +0,0 @@
//- 💫 DOCS > API > TOP-LEVEL
include ../_includes/_mixins
+section("spacy")
//-+h(2, "spacy") spaCy
//- spacy/__init__.py
include _top-level/_spacy
+section("displacy")
+h(2, "displacy", "spacy/displacy") displaCy
include _top-level/_displacy
+section("util")
+h(2, "util", "spacy/util.py") Utility functions
include _top-level/_util
+section("compat")
+h(2, "compat", "spacy/compaty.py") Compatibility functions
include _top-level/_compat

View File

@ -1,476 +0,0 @@
//- 💫 DOCS > API > VECTORS
include ../_includes/_mixins
p
| Vectors data is kept in the #[code Vectors.data] attribute, which should
| be an instance of #[code numpy.ndarray] (for CPU vectors) or
| #[code cupy.ndarray] (for GPU vectors). Multiple keys can be mapped to
| the same vector, and not all of the rows in the table need to be
| assigned so #[code vectors.n_keys] may be greater or smaller than
| #[code vectors.shape[0]].
+h(2, "init") Vectors.__init__
+tag method
p
| Create a new vector store. You can set the vector values and keys
| directly on initialisation, or supply a #[code shape] keyword argument
| to create an empty table you can add vectors to later.
+aside-code("Example").
from spacy.vectors import Vectors
empty_vectors = Vectors(shape=(10000, 300))
data = numpy.zeros((3, 300), dtype='f')
keys = [u'cat', u'dog', u'rat']
vectors = Vectors(data=data, keys=keys)
+table(["Name", "Type", "Description"])
+row
+cell #[code data]
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell The vector data.
+row
+cell #[code keys]
+cell iterable
+cell A sequence of keys aligned with the data.
+row
+cell #[code shape]
+cell tuple
+cell
| Size of the table as #[code (n_entries, n_columns)], the number
| of entries and number of columns. Not required if you're
| initialising the object with #[code data] and #[code keys].
+row("foot")
+cell returns
+cell #[code Vectors]
+cell The newly created object.
+h(2, "getitem") Vectors.__getitem__
+tag method
p
| Get a vector by key. If the key is not found in the table, a
| #[code KeyError] is raised.
+aside-code("Example").
cat_id = nlp.vocab.strings[u'cat']
cat_vector = nlp.vocab.vectors[cat_id]
assert cat_vector == nlp.vocab[u'cat'].vector
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell int
+cell The key to get the vector for.
+row
+cell returns
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell The vector for the key.
+h(2, "setitem") Vectors.__setitem__
+tag method
p
| Set a vector for the given key.
+aside-code("Example").
cat_id = nlp.vocab.strings[u'cat']
vector = numpy.random.uniform(-1, 1, (300,))
nlp.vocab.vectors[cat_id] = vector
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell int
+cell The key to set the vector for.
+row
+cell #[code vector]
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell The vector to set.
+h(2, "iter") Vectors.__iter__
+tag method
p Iterate over the keys in the table.
+aside-code("Example").
for key in nlp.vocab.vectors:
print(key, nlp.vocab.strings[key])
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell int
+cell A key in the table.
+h(2, "len") Vectors.__len__
+tag method
p Return the number of vectors in the table.
+aside-code("Example").
vectors = Vectors(shape=(3, 300))
assert len(vectors) == 3
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of vectors in the table.
+h(2, "contains") Vectors.__contains__
+tag method
p
| Check whether a key has been mapped to a vector entry in the table.
+aside-code("Example").
cat_id = nlp.vocab.strings[u'cat']
nlp.vectors.add(cat_id, numpy.random.uniform(-1, 1, (300,)))
assert cat_id in vectors
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell int
+cell The key to check.
+row("foot")
+cell returns
+cell bool
+cell Whether the key has a vector entry.
+h(2, "add") Vectors.add
+tag method
p
| Add a key to the table, optionally setting a vector value as well. Keys
| can be mapped to an existing vector by setting #[code row], or a new
| vector can be added. When adding unicode keys, keep in mind that the
| #[code Vectors] class itself has no
| #[+api("stringstore") #[code StringStore]], so you have to store the
| hash-to-string mapping separately. If you need to manage the strings,
| you should use the #[code Vectors] via the
| #[+api("vocab") #[code Vocab]] class, e.g. #[code vocab.vectors].
+aside-code("Example").
vector = numpy.random.uniform(-1, 1, (300,))
cat_id = nlp.vocab.strings[u'cat']
nlp.vocab.vectors.add(cat_id, vector=vector)
nlp.vocab.vectors.add(u'dog', row=0)
+table(["Name", "Type", "Description"])
+row
+cell #[code key]
+cell unicode / int
+cell The key to add.
+row
+cell #[code vector]
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell An optional vector to add for the key.
+row
+cell #[code row]
+cell int
+cell An optional row number of a vector to map the key to.
+row("foot")
+cell returns
+cell int
+cell The row the vector was added to.
+h(2, "resize") Vectors.resize
+tag method
p
| Resize the underlying vectors array. If #[code inplace=True], the memory
| is reallocated. This may cause other references to the data to become
| invalid, so only use #[code inplace=True] if you're sure that's what you
| want. If the number of vectors is reduced, keys mapped to rows that have
| been deleted are removed. These removed items are returned as a list of
| #[code (key, row)] tuples.
+aside-code("Example").
removed = nlp.vocab.vectors.resize((10000, 300))
+table(["Name", "Type", "Description"])
+row
+cell #[code shape]
+cell tuple
+cell
| A #[code (rows, dims)] tuple describing the number of rows and
| dimensions.
+row
+cell #[code inplace]
+cell bool
+cell Reallocate the memory.
+row("foot")
+cell returns
+cell list
+cell The removed items as a list of #[code (key, row)] tuples.
+h(2, "keys") Vectors.keys
+tag method
p A sequence of the keys in the table.
+aside-code("Example").
for key in nlp.vocab.vectors.keys():
print(key, nlp.vocab.strings[key])
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell iterable
+cell The keys.
+h(2, "values") Vectors.values
+tag method
p
| Iterate over vectors that have been assigned to at least one key. Note
| that some vectors may be unassigned, so the number of vectors returned
| may be less than the length of the vectors table.
+aside-code("Example").
for vector in nlp.vocab.vectors.values():
print(vector)
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell A vector in the table.
+h(2, "items") Vectors.items
+tag method
p Iterate over #[code (key, vector)] pairs, in order.
+aside-code("Example").
for key, vector in nlp.vocab.vectors.items():
print(key, nlp.vocab.strings[key], vector)
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell tuple
+cell #[code (key, vector)] pairs, in order.
+h(2, "shape") Vectors.shape
+tag property
p
| Get #[code (rows, dims)] tuples of number of rows and number of
| dimensions in the vector table.
+aside-code("Example").
vectors = Vectors(shape(1, 300))
vectors.add(u'cat', numpy.random.uniform(-1, 1, (300,)))
rows, dims = vectors.shape
assert rows == 1
assert dims == 300
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell tuple
+cell A #[code (rows, dims)] pair.
+h(2, "size") Vectors.size
+tag property
p The vector size, i.e. #[code rows * dims].
+aside-code("Example").
vectors = Vectors(shape=(500, 300))
assert vectors.size == 150000
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The vector size.
+h(2, "is_full") Vectors.is_full
+tag property
p
| Whether the vectors table is full and has no slots are available for new
| keys. If a table is full, it can be resized using
| #[+api("vectors#resize") #[code Vectors.resize]].
+aside-code("Example").
vectors = Vectors(shape=(1, 300))
vectors.add(u'cat', numpy.random.uniform(-1, 1, (300,)))
assert vectors.is_full
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell bool
+cell Whether the vectors table is full.
+h(2, "n_keys") Vectors.n_keys
+tag property
p
| Get the number of keys in the table. Note that this is the number of
| #[em all] keys, not just unique vectors. If several keys are mapped
| are mapped to the same vectors, they will be counted individually.
+aside-code("Example").
vectors = Vectors(shape=(10, 300))
assert len(vectors) == 10
assert vectors.n_keys == 0
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of all keys in the table.
+h(2, "from_glove") Vectors.from_glove
+tag method
p
| Load #[+a("https://nlp.stanford.edu/projects/glove/") GloVe] vectors from
| a directory. Assumes binary format, that the vocab is in a
| #[code vocab.txt], and that vectors are named
| #[code vectors.{size}.[fd].bin], e.g. #[code vectors.128.f.bin] for 128d
| float32 vectors, #[code vectors.300.d.bin] for 300d float64 (double)
| vectors, etc. By default GloVe outputs 64-bit vectors.
+aside-code("Example").
vectors = Vectors()
vectors.from_glove('/path/to/glove_vectors')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode / #[code Path]
+cell The path to load the GloVe vectors from.
+h(2, "to_disk") Vectors.to_disk
+tag method
p Save the current state to a directory.
+aside-code("Example").
vectors.to_disk('/path/to/vectors')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode / #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being saved.
+h(2, "from_disk") Vectors.from_disk
+tag method
p Loads state from a directory. Modifies the object in place and returns it.
+aside-code("Example").
vectors = Vectors(StringStore())
vectors.from_disk('/path/to/vectors')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode / #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row("foot")
+cell returns
+cell #[code Vectors]
+cell The modified #[code Vectors] object.
+h(2, "to_bytes") Vectors.to_bytes
+tag method
p Serialize the current state to a binary string.
+aside-code("Example").
vectors_bytes = vectors.to_bytes()
+table(["Name", "Type", "Description"])
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being serialized.
+row("foot")
+cell returns
+cell bytes
+cell The serialized form of the #[code Vectors] object.
+h(2, "from_bytes") Vectors.from_bytes
+tag method
p Load state from a binary string.
+aside-code("Example").
fron spacy.vectors import Vectors
vectors_bytes = vectors.to_bytes()
new_vectors = Vectors(StringStore())
new_vectors.from_bytes(vectors_bytes)
+table(["Name", "Type", "Description"])
+row
+cell #[code data]
+cell bytes
+cell The data to load from.
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being loaded.
+row("foot")
+cell returns
+cell #[code Vectors]
+cell The #[code Vectors] object.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code data]
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell
| Stored vectors data. #[code numpy] is used for CPU vectors,
| #[code cupy] for GPU vectors.
+row
+cell #[code key2row]
+cell dict
+cell
| Dictionary mapping word hashes to rows in the
| #[code Vectors.data] table.
+row
+cell #[code keys]
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
+cell
| Array keeping the keys in order, such that
| #[code keys[vectors.key2row[key]] == key]

View File

@ -1,411 +0,0 @@
//- 💫 DOCS > API > VOCAB
include ../_includes/_mixins
p
| The #[code Vocab] object provides a lookup table that allows you to
| access #[+api("lexeme") #[code Lexeme]] objects, as well as the
| #[+api("stringstore") #[code StringStore]]. It also owns underlying
| C-data that is shared between #[code Doc] objects.
+h(2, "init") Vocab.__init__
+tag method
p Create the vocabulary.
+aside-code("Example").
from spacy.vocab import Vocab
vocab = Vocab(strings=[u'hello', u'world'])
+table(["Name", "Type", "Description"])
+row
+cell #[code lex_attr_getters]
+cell dict
+cell
| A dictionary mapping attribute IDs to functions to compute them.
| Defaults to #[code None].
+row
+cell #[code tag_map]
+cell dict
+cell
| A dictionary mapping fine-grained tags to coarse-grained
| parts-of-speech, and optionally morphological attributes.
+row
+cell #[code lemmatizer]
+cell object
+cell A lemmatizer. Defaults to #[code None].
+row
+cell #[code strings]
+cell #[code StringStore] or list
+cell
| A #[+api("stringstore") #[code StringStore]] that maps
| strings to hash values, and vice versa, or a list of strings.
+row("foot")
+cell returns
+cell #[code Vocab]
+cell The newly constructed object.
+h(2, "len") Vocab.__len__
+tag method
p Get the current number of lexemes in the vocabulary.
+aside-code("Example").
doc = nlp(u'This is a sentence.')
assert len(nlp.vocab) > 0
+table(["Name", "Type", "Description"])
+row("foot")
+cell returns
+cell int
+cell The number of lexems in the vocabulary.
+h(2, "getitem") Vocab.__getitem__
+tag method
p
| Retrieve a lexeme, given an int ID or a unicode string. If a previously
| unseen unicode string is given, a new lexeme is created and stored.
+aside-code("Example").
apple = nlp.vocab.strings['apple']
assert nlp.vocab[apple] == nlp.vocab[u'apple']
+table(["Name", "Type", "Description"])
+row
+cell #[code id_or_string]
+cell int / unicode
+cell The hash value of a word, or its unicode string.
+row("foot")
+cell returns
+cell #[code Lexeme]
+cell The lexeme indicated by the given ID.
+h(2, "iter") Vocab.__iter__
+tag method
p Iterate over the lexemes in the vocabulary.
+aside-code("Example").
stop_words = (lex for lex in nlp.vocab if lex.is_stop)
+table(["Name", "Type", "Description"])
+row("foot")
+cell yields
+cell #[code Lexeme]
+cell An entry in the vocabulary.
+h(2, "contains") Vocab.__contains__
+tag method
p
| Check whether the string has an entry in the vocabulary. To get the ID
| for a given string, you need to look it up in
| #[+api("vocab#attributes") #[code vocab.strings]].
+aside-code("Example").
apple = nlp.vocab.strings['apple']
oov = nlp.vocab.strings['dskfodkfos']
assert apple in nlp.vocab
assert oov not in nlp.vocab
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The ID string.
+row("foot")
+cell returns
+cell bool
+cell Whether the string has an entry in the vocabulary.
+h(2, "add_flag") Vocab.add_flag
+tag method
p
| Set a new boolean flag to words in the vocabulary. The #[code flag_getter]
| function will be called over the words currently in the vocab, and then
| applied to new words as they occur. You'll then be able to access the flag
| value on each token, using #[code token.check_flag(flag_id)].
+aside-code("Example").
def is_my_product(text):
products = [u'spaCy', u'Thinc', u'displaCy']
return text in products
MY_PRODUCT = nlp.vocab.add_flag(is_my_product)
doc = nlp(u'I like spaCy')
assert doc[2].check_flag(MY_PRODUCT) == True
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_getter]
+cell dict
+cell A function #[code f(unicode) -> bool], to get the flag value.
+row
+cell #[code flag_id]
+cell int
+cell
| An integer between 1 and 63 (inclusive), specifying the bit at
| which the flag will be stored. If #[code -1], the lowest
| available bit will be chosen.
+row("foot")
+cell returns
+cell int
+cell The integer ID by which the flag value can be checked.
+h(2, "reset_vectors") Vocab.reset_vectors
+tag method
+tag-new(2)
p
| Drop the current vector table. Because all vectors must be the same
| width, you have to call this to change the size of the vectors. Only
| one of the #[code width] and #[code shape] keyword arguments can be
| specified.
+aside-code("Example").
nlp.vocab.reset_vectors(width=300)
+table(["Name", "Type", "Description"])
+row
+cell #[code width]
+cell int
+cell The new width (keyword argument only).
+row
+cell #[code shape]
+cell int
+cell The new shape (keyword argument only).
+h(2, "prune_vectors") Vocab.prune_vectors
+tag method
+tag-new(2)
p
| Reduce the current vector table to #[code nr_row] unique entries. Words
| mapped to the discarded vectors will be remapped to the closest vector
| among those remaining. For example, suppose the original table had
| vectors for the words:
| #[code.u-break ['sat', 'cat', 'feline', 'reclined']]. If we prune the
| vector table to, two rows, we would discard the vectors for "feline"
| and "reclined". These words would then be remapped to the closest
| remaining vector so "feline" would have the same vector as "cat",
| and "reclined" would have the same vector as "sat". The similarities are
| judged by cosine. The original vectors may be large, so the cosines are
| calculated in minibatches, to reduce memory usage.
+aside-code("Example").
nlp.vocab.prune_vectors(10000)
assert len(nlp.vocab.vectors) &lt;= 1000
+table(["Name", "Type", "Description"])
+row
+cell #[code nr_row]
+cell int
+cell The number of rows to keep in the vector table.
+row
+cell #[code batch_size]
+cell int
+cell
| Batch of vectors for calculating the similarities. Larger batch
| sizes might be faster, while temporarily requiring more memory.
+row("foot")
+cell returns
+cell dict
+cell
| A dictionary keyed by removed words mapped to
| #[code (string, score)] tuples, where #[code string] is the entry
| the removed word was mapped to, and #[code score] the similarity
| score between the two words.
+h(2, "get_vector") Vocab.get_vector
+tag method
+tag-new(2)
p
| Retrieve a vector for a word in the vocabulary. Words can be looked up
| by string or hash value. If no vectors data is loaded, a
| #[code ValueError] is raised.
+aside-code("Example").
nlp.vocab.get_vector(u'apple')
+table(["Name", "Type", "Description"])
+row
+cell #[code orth]
+cell int / unicode
+cell The hash value of a word, or its unicode string.
+row("foot")
+cell returns
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell
| A word vector. Size and shape are determined by the
| #[code Vocab.vectors] instance.
+h(2, "set_vector") Vocab.set_vector
+tag method
+tag-new(2)
p
| Set a vector for a word in the vocabulary. Words can be referenced by
| by string or hash value.
+aside-code("Example").
nlp.vocab.set_vector(u'apple', array([...]))
+table(["Name", "Type", "Description"])
+row
+cell #[code orth]
+cell int / unicode
+cell The hash value of a word, or its unicode string.
+row
+cell #[code vector]
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
+cell The vector to set.
+h(2, "has_vector") Vocab.has_vector
+tag method
+tag-new(2)
p
| Check whether a word has a vector. Returns #[code False] if no vectors
| are loaded. Words can be looked up by string or hash value.
+aside-code("Example").
if nlp.vocab.has_vector(u'apple'):
vector = nlp.vocab.get_vector(u'apple')
+table(["Name", "Type", "Description"])
+row
+cell #[code orth]
+cell int / unicode
+cell The hash value of a word, or its unicode string.
+row("foot")
+cell returns
+cell bool
+cell Whether the word has a vector.
+h(2, "to_disk") Vocab.to_disk
+tag method
+tag-new(2)
p Save the current state to a directory.
+aside-code("Example").
nlp.vocab.to_disk('/path/to/vocab')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory, which will be created if it doesn't exist.
| Paths may be either strings or #[code Path]-like objects.
+h(2, "from_disk") Vocab.from_disk
+tag method
+tag-new(2)
p Loads state from a directory. Modifies the object in place and returns it.
+aside-code("Example").
from spacy.vocab import Vocab
vocab = Vocab().from_disk('/path/to/vocab')
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell unicode or #[code Path]
+cell
| A path to a directory. Paths may be either strings or
| #[code Path]-like objects.
+row("foot")
+cell returns
+cell #[code Vocab]
+cell The modified #[code Vocab] object.
+h(2, "to_bytes") Vocab.to_bytes
+tag method
p Serialize the current state to a binary string.
+aside-code("Example").
vocab_bytes = nlp.vocab.to_bytes()
+table(["Name", "Type", "Description"])
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being serialized.
+row("foot")
+cell returns
+cell bytes
+cell The serialized form of the #[code Vocab] object.
+h(2, "from_bytes") Vocab.from_bytes
+tag method
p Load state from a binary string.
+aside-code("Example").
fron spacy.vocab import Vocab
vocab_bytes = nlp.vocab.to_bytes()
vocab = Vocab()
vocab.from_bytes(vocab_bytes)
+table(["Name", "Type", "Description"])
+row
+cell #[code bytes_data]
+cell bytes
+cell The data to load from.
+row
+cell #[code **exclude]
+cell -
+cell Named attributes to prevent from being loaded.
+row("foot")
+cell returns
+cell #[code Vocab]
+cell The #[code Vocab] object.
+h(2, "attributes") Attributes
+aside-code("Example").
apple_id = nlp.vocab.strings['apple']
assert type(apple_id) == int
PERSON = nlp.vocab.strings['PERSON']
assert type(PERSON) == int
+table(["Name", "Type", "Description"])
+row
+cell #[code strings]
+cell #[code StringStore]
+cell A table managing the string-to-int mapping.
+row
+cell #[code vectors]
+tag-new(2)
+cell #[code Vectors]
+cell A table associating word IDs to word vectors.
+row
+cell #[code vectors_length]
+cell int
+cell Number of dimensions for each word vector.

View File

@ -1,28 +0,0 @@
//- 💫 CSS > BASE > ANIMATIONS
//- Fade in
@keyframes fadeIn
from
opacity: 0
to
opacity: 1
//- Element slides in from the top
@keyframes slideInDown
from
transform: translate3d(0, -100%, 0)
visibility: visible
to
transform: translate3d(0, 0, 0)
//- Element rotates
@keyframes rotate
to
transform: rotate(360deg)

View File

@ -1,27 +0,0 @@
//- 💫 CSS > BASE > FONTS
// HK Grotesk
@font-face
font-family: "HK Grotesk"
font-style: normal
font-weight: 500
src: url("/assets/fonts/hkgrotesk-semibold.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-semibold.woff") format("woff")
@font-face
font-family: "HK Grotesk"
font-style: italic
font-weight: 500
src: url("/assets/fonts/hkgrotesk-semibolditalic.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-semibolditalic.woff") format("woff")
@font-face
font-family: "HK Grotesk"
font-style: normal
font-weight: 600
src: url("/assets/fonts/hkgrotesk-bold.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-bold.woff") format("woff")
@font-face
font-family: "HK Grotesk"
font-style: italic
font-weight: 600
src: url("/assets/fonts/hkgrotesk-bolditalic.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-bolditalic.woff") format("woff")

View File

@ -1,59 +0,0 @@
//- 💫 CSS > BASE > GRID
//- Grid container
.o-grid
display: flex
flex-wrap: wrap
@include breakpoint(min, sm)
flex-direction: row
align-items: stretch
justify-content: space-between
&.o-grid--center
align-items: center
justify-content: center
&.o-grid--vcenter
align-items: center
&.o-grid--space
justify-content: space-between
&.o-grid--nowrap
flex-wrap: nowrap
//- Grid column
.o-grid__col
$grid-gutter: 2rem
margin-top: $grid-gutter
min-width: 0 // hack to prevent overflow
@include breakpoint(min, lg)
display: flex
flex: 0 0 100%
flex-direction: column
flex-wrap: wrap
@each $mode, $count in $grid
&.o-grid__col--#{$mode}
$percentage: calc(#{100% / $count} - #{$grid-gutter})
flex: 0 0 $percentage
max-width: $percentage
@include breakpoint(max, md)
flex: 0 0 100%
flex-flow: column wrap
&.o-grid__col--no-gutter
margin-top: 0
// Fix overflow issue in old browsers
& > *
flex-shrink: 1
max-width: 100%

View File

@ -1,43 +0,0 @@
//- 💫 CSS > BASE > LAYOUT
//- HTML
html
font-size: $type-base
//- Body
body
animation: fadeIn 0.25s ease
background: $color-back
color: $color-front
//- Paragraphs
p
@extend .o-block, .u-text
p:empty
margin-bottom: 0
//- Links
main p a,
main table a,
main > *:not(footer) li a,
main aside a
@extend .u-link
a:focus
outline: 1px dotted $color-theme
//- Selection
::selection
background: $color-theme
color: $color-back
text-shadow: none

View File

@ -1,249 +0,0 @@
//- 💫 CSS > BASE > OBJECTS
//- Main container
.o-main
padding: $nav-height 0 0 0
max-width: 100%
min-height: 100vh
@include breakpoint(min, md)
&.o-main--sidebar
margin-left: $sidebar-width
&.o-main--aside
margin-right: $aside-width
position: relative
&:after
@include position(absolute, top, left, 0, 100%)
@include size($aside-width, 100%)
content: ""
display: block
background: $pattern
z-index: -1
min-height: 100vh
//- Content container
.o-content
padding: 3rem 7.5rem
margin: 0 auto
width: $content-width
max-width: 100%
@include breakpoint(max, sm)
padding: 3rem
//- Footer
.o-footer
position: relative
padding: 2.5rem 0
overflow: auto
background: $color-subtle-light
.o-main &
border-top-left-radius: $border-radius
//- Blocks
.o-section
width: 100%
max-width: 100%
&:not(:last-child)
margin-bottom: 7rem
padding-bottom: 4rem
border-bottom: 1px dotted $color-subtle
&.o-section--small
overflow: auto
&:not(:last-child)
margin-bottom: 3.5rem
padding-bottom: 2rem
.o-block
margin-bottom: 4rem
.o-block-small
margin-bottom: 2rem
.o-no-block.o-no-block
margin-bottom: 0
.o-card
background: $color-back
border-radius: $border-radius
box-shadow: $box-shadow
//- Accordion
.o-accordion
&:not(:last-child)
margin-bottom: 2rem
.o-accordion__content
margin-top: 3rem
.o-accordion__button
font: inherit
border-radius: $border-radius
width: 100%
padding: 1.5rem 2rem
background: $color-subtle-light
&[aria-expanded="true"]
border-bottom: 3px solid $color-subtle
border-bottom-left-radius: 0
border-bottom-right-radius: 0
.o-accordion__hide
display: none
&:focus:not([aria-expanded="true"])
background: $color-subtle
.o-accordion__icon
@include size(2.5rem)
background: $color-theme
color: $color-back
border-radius: 50%
padding: 0.35rem
pointer-events: none
//- Box
.o-box
background: $color-subtle-light
padding: 2rem
border-radius: $border-radius
.o-box__logos
padding-bottom: 1rem
//- Icons
.o-icon
vertical-align: middle
&.o-icon--inline
margin: 0 0.5rem 0 0.1rem
&.o-icon--tag
vertical-align: bottom
height: 100%
position: relative
top: 1px
.o-emoji
margin-right: 0.75rem
vertical-align: text-bottom
.o-badge
border-radius: 1em
.o-thumb
@include size(100px)
overflow: hidden
border-radius: 50%
&.o-thumb--small
@include size(35px)
//- SVG
.o-svg
height: auto
//- Inline List
.o-inline-list > *
display: inline
&:not(:last-child)
margin-right: 3rem
//- Logo
.o-logo
@include size($logo-width, $logo-height)
fill: currentColor
vertical-align: middle
margin: 0 0.5rem
//- Embeds
.o-chart
max-width: 100%
.cp_embed_iframe
border: 1px solid $color-subtle
border-radius: $border-radius
//- Responsive Video embeds
.o-video
position: relative
height: 0
@each $ratio1, $ratio2 in (16, 9), (4, 3)
&.o-video--#{$ratio1}x#{$ratio2}
padding-bottom: (100% * $ratio2 / $ratio1)
.o-video__iframe
@include position(absolute, top, left, 0, 0)
@include size(100%)
border-radius: var(--border-radius)
//- Form fields
.o-field
background: $color-back
padding: 0 0.25em
border-radius: 2em
border: 1px solid $color-subtle
margin-bottom: 0.25rem
.o-field__input,
.o-field__button
padding: 0 0.35em
.o-field__input
width: 100%
.o-field__select
background: transparent
color: $color-dark
height: 1.4em
border: none
text-align-last: center
width: 100%
//- Abbreviations
.o-abbr
+breakpoint(min, md)
cursor: help
border-bottom: 2px dotted $color-theme
padding-bottom: 3px
+breakpoint(max, sm)
&[data-tooltip]:before
content: none
&:after
content: " (" attr(aria-label) ")"
color: $color-subtle-dark

View File

@ -1,103 +0,0 @@
//- 💫 CSS > BASE > RESET
*, *:before, *:after
box-sizing: border-box
padding: 0
margin: 0
border: 0
outline: 0
html
font-family: sans-serif
text-rendering: optimizeSpeed
-ms-text-size-adjust: 100%
-webkit-text-size-adjust: 100%
-webkit-font-smoothing: antialiased
-moz-osx-font-smoothing: grayscale
body
margin: 0
article, aside, details, figcaption, figure, footer, header, main, menu, nav,
section, summary, progress
display: block
a
background-color: transparent
color: inherit
text-decoration: none
&:active,
&:hover
outline: 0
abbr[title]
border-bottom: none
text-decoration: underline
text-decoration: underline dotted
b, strong
font-weight: inherit
font-weight: bolder
small
font-size: 80%
sub, sup
position: relative
font-size: 65%
line-height: 0
vertical-align: baseline
sup
top: -0.5em
sub
bottom: -0.15em
img
border: 0
height: auto
max-width: 100%
svg
max-width: 100%
color-interpolation-filters: sRGB
fill: currentColor
&:not(:root)
overflow: hidden
hr
box-sizing: content-box
overflow: visible
height: 0
pre
overflow: auto
code, pre
font-family: monospace, monospace
font-size: 1em
table
text-align: left
width: 100%
max-width: 100%
border-collapse: collapse
td, th
vertical-align: top
ul, ol
list-style: none
input, button
appearance: none
background: transparent
button
cursor: pointer
progress
appearance: none

View File

@ -1,267 +0,0 @@
//- 💫 CSS > BASE > UTILITIES
//- Text
.u-text,
.u-text-small,
.u-text-tiny
font-family: $font-primary
.u-text
font-size: 1.35rem
line-height: 1.5
.u-text-small
font-size: 1.3rem
line-height: 1.375
.u-text-tiny
font-size: 1.1rem
line-height: 1.375
//- Labels & Tags
.u-text-label
font: normal 600 1.4rem/#{1.5} $font-secondary
text-transform: uppercase
&.u-text-label--light,
&.u-text-label--dark
display: inline-block
border-radius: 1em
padding: 0 1rem 0.15rem
&.u-text-label--dark
background: $color-dark
box-shadow: inset 1px 1px 1px rgba($color-front, 0.25)
color: $color-back
margin: 1.5rem 0 0 2rem
&.u-text-label--light
background: $color-back
color: $color-theme
margin-bottom: 1rem
.u-text-tag
display: inline-block
font: 600 1.1rem/#{1} $font-secondary
background: $color-theme
color: $color-back
padding: 2px 6px 4px
border-radius: 1em
text-transform: uppercase
vertical-align: middle
&.u-text-tag--spaced
margin-left: 0.75em
margin-right: 0.5em
//- Headings
.u-heading
margin-bottom: 1em
@include breakpoint(max, md)
word-wrap: break-word
&:not(:first-child)
padding-top: 3.5rem
&.u-heading--title:after
content: ""
display: block
width: 10%
min-width: 6rem
height: 6px
background: $color-theme
margin-top: 3rem
.u-heading-0
font: normal 600 7rem/#{1} $font-secondary
@include breakpoint(max, sm)
font-size: 6rem
@each $level, $size in $headings
.u-heading-#{$level}
font: normal 500 #{$size}rem/#{1.1} $font-secondary
.u-heading__teaser
margin-top: 2rem
font-weight: normal
//- Links
.u-link
color: $color-theme
border-bottom: 1px solid
transition: color 0.2s ease
&:hover
color: $color-theme-dark
.u-hand
cursor: pointer
.u-hide-link.u-hide-link
border: none
color: inherit
&:hover
color: inherit
.u-permalink
position: relative
&:before
content: "\00b6"
font-size: 0.9em
font-weight: normal
color: $color-subtle
@include position(absolute, top, left, 0.15em, -2.85rem)
opacity: 0
transition: opacity 0.2s ease
&:hover:before
opacity: 1
&:active:before
color: $color-theme
&:target
display: inline-block
&:before
bottom: 0.15em
top: initial
[id]:target
padding-top: $nav-height * 1.25
//- Layout
.u-width-full
width: 100%
.u-float-left
float: left
margin-right: 1rem
.u-float-right
float: right
margin-left: 1rem
.u-text-center
text-align: center
.u-text-right
text-align: right
.u-padding
padding: 5rem
.u-padding-small
padding: 0.5em 0.75em
.u-padding-medium
padding: 1.8rem
.u-padding-top
padding-top: 2rem
.u-inline-block
display: inline-block
.u-flex-full
flex: 1
.u-nowrap
white-space: nowrap
.u-wrap
white-space: pre-wrap
.u-break.u-break
word-wrap: break-word
white-space: initial
&.u-break--all
word-break: break-all
.u-no-border
border: none
.u-border
border: 1px solid $color-subtle
border-radius: 2px
.u-border-dotted
border-bottom: 1px dotted $color-subtle
@each $name, $color in (theme: $color-theme, dark: $color-dark, subtle: $color-subtle-dark, light: $color-back, red: $color-red, green: $color-green, yellow: $color-yellow)
.u-color-#{$name}
color: $color
.u-grayscale
filter: grayscale(100%)
transition: filter 0.15s ease
user-select: none
&:hover
filter: none
.u-pattern
background: $pattern
//- Loaders
.u-loading,
[data-loading]
$spinner-size: 75px
$spinner-bar: 8px
min-height: $spinner-size * 2
position: relative
& > *
opacity: 0.35
&:before
@include position(absolute, top, left, 0, 0)
@include size($spinner-size)
right: 0
bottom: 0
margin: auto
content: ""
border: $spinner-bar solid $color-subtle
border-right: $spinner-bar solid $color-theme
border-radius: 50%
animation: rotate 1s linear infinite
z-index: 10
//- Hidden elements
.u-hidden,
[v-cloak]
display: none !important
@each $breakpoint in (xs, sm, md)
.u-hidden-#{$breakpoint}.u-hidden-#{$breakpoint}
@include breakpoint(max, $breakpoint)
display: none
//- Transitions
.u-fade-enter-active
transition: opacity 0.5s
.u-fade-enter
opacity: 0

View File

@ -1,43 +0,0 @@
//- 💫 CSS > COMPONENTS > ASIDES
//- Aside container
.c-aside
position: relative
//- Aside content
.c-aside__content
background: $color-front
border-top-left-radius: $border-radius
border-bottom-left-radius: $border-radius
z-index: 10
@include breakpoint(min, md)
@include position(absolute, top, left, -3rem, calc(100% + 5.5rem))
width: calc(#{$aside-width} + 2rem)
// Banner effect
&:after
$triangle-size: 2rem
@include position(absolute, bottom, left, -$triangle-size / 2, $border-radius / 2)
@include size(0)
border-color: transparent
border-style: solid
border-top-color: $color-dark
border-width: $triangle-size / 2 0 0 calc(#{$triangle-size} - #{$border-radius / 2})
content: ""
@include breakpoint(max, sm)
display: block
margin: 2rem 0
//- Aside text
.c-aside__text
color: $color-back
padding: 1.5rem 2.5rem 3rem 2rem

View File

@ -1,52 +0,0 @@
//- 💫 CSS > COMPONENTS > BUTTONS
.c-button
display: inline-block
font-weight: bold
padding: 0.8em 1.1em 1em
margin-bottom: 1px
border: 2px solid $color-theme
border-radius: 2em
text-align: center
transition: background-color, color 0.25s ease
&:hover
border-color: $color-theme-dark
&.c-button--small
font-size: 1.1rem
padding: 0.65rem 1.1rem 0.825rem
&.c-button--primary
background: $color-theme
color: $color-back
&:hover
background: $color-theme-dark
&.c-button--secondary
background: $color-back
color: $color-theme
&:hover
color: $color-theme-dark
&.c-button--secondary-light
background: transparent
color: $color-back
border-color: $color-back
.c-icon-button
@include size(35px)
background: $color-subtle-light
color: $color-subtle-dark
border-radius: 50%
padding: 0.5rem
transition: color 0.2s ease
&:hover
color: $color-theme
&.c-icon-button--right
float: right
margin-left: 3rem

View File

@ -1,105 +0,0 @@
//- 💫 CSS > COMPONENTS > CHAT
.c-chat
@include position(fixed, top, left, 0, 60%)
bottom: 0
right: 0
display: flex
flex-flow: column nowrap
background: $color-back
transition: transform 0.3s cubic-bezier(0.16, 0.22, 0.22, 1.7)
box-shadow: -0.25rem 0 1rem 0 rgba($color-front, 0.25)
z-index: 100
@include breakpoint(min, md)
left: calc(100% - #{$aside-width} - #{$aside-padding})
@include breakpoint(max, sm)
left: 50%
@include breakpoint(max, xs)
left: 0
&.is-collapsed:not(.is-loading)
transform: translateX(110%)
&:before
@include position(absolute, top, left, 1.25rem, 2rem)
content: attr(data-title)
font: bold 1.4rem $font-secondary
text-transform: uppercase
color: $color-back
&:after
@include position(absolute, top, left, 0, 100%)
content: ""
z-index: -1
bottom: 0
right: -100%
background: $color-back
& > iframe
width: 100%
flex: 1 1 calc(100% - #{$nav-height})
border: 0
.gitter-chat-embed-loading-wrapper
@include position(absolute, top, left, 0, 0)
right: 0
bottom: 0
display: none
justify-content: center
align-items: center
.is-loading &
display: flex
.gitter-chat-embed-action-bar,
.gitter-chat-embed-action-bar-item
display: flex
.gitter-chat-embed-action-bar
align-items: center
justify-content: flex-end
background: $color-theme
padding: 0 1rem 0 2rem
flex: 0 0 $nav-height
.gitter-chat-embed-action-bar-item
@include size(40px)
padding: 0
opacity: 0.75
background-position: 50%
background-repeat: no-repeat
background-size: 22px 22px
border: 0
cursor: pointer
transition: all 0.2s ease
&:focus,
&:hover
opacity: 1
&.gitter-chat-embed-action-bar-item-pop-out
background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyMCIgaGVpZ2h0PSIyMCIgdmlld0JveD0iMCAwIDIwIDIwIj48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMTYgMmgtOC4wMjFjLTEuMDk5IDAtMS45NzkgMC44OC0xLjk3OSAxLjk4djguMDIwYzAgMS4xIDAuOSAyIDIgMmg4YzEuMSAwIDItMC45IDItMnYtOGMwLTEuMS0wLjktMi0yLTJ6TTE2IDEyaC04di04aDh2OHpNNCAxMGgtMnY2YzAgMS4xIDAuOSAyIDIgMmg2di0yaC02di02eiI+PC9wYXRoPjwvc3ZnPg==)
margin-right: -4px
&.gitter-chat-embed-action-bar-item-collapse-chat
background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0Ij48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMTguOTg0IDYuNDIybC01LjU3OCA1LjU3OCA1LjU3OCA1LjU3OC0xLjQwNiAxLjQwNi01LjU3OC01LjU3OC01LjU3OCA1LjU3OC0xLjQwNi0xLjQwNiA1LjU3OC01LjU3OC01LjU3OC01LjU3OCAxLjQwNi0xLjQwNiA1LjU3OCA1LjU3OCA1LjU3OC01LjU3OHoiPjwvcGF0aD48L3N2Zz4=)
.c-chat__button
@include position(fixed, bottom, right, 1.5rem, 1.5rem)
z-index: 5
color: $color-back
background: $color-front
border-radius: 1em
padding: 0.5rem 1.15rem 0.35rem
opacity: 0.7
transition: opacity 0.2s ease
&:hover
opacity: 1
.gitter-open-chat-button
display: none

View File

@ -1,202 +0,0 @@
//- 💫 CSS > COMPONENTS > CODE
//- Code block
.c-code-block,
.juniper-cell
background: $color-front
color: darken($color-back, 20)
padding: 0.75em 0
border-radius: $border-radius
overflow: auto
width: 100%
max-width: 100%
white-space: pre
direction: ltr
.c-code-block--has-icon
padding: 0
display: flex
border-top-left-radius: 0
border-bottom-left-radius: 0
.c-code-block__icon
padding: 0 0 0 1rem
display: flex
justify-content: center
align-items: center
&.c-code-block__icon--border
border-left: 6px solid
//- Code block content
.c-code-block__content,
.juniper-input,
.jp-OutputArea
display: block
font: normal normal 1.1rem/#{1.9} $font-code
padding: 1em 2em
.c-code-block__content[data-prompt]:before,
content: attr(data-prompt)
margin-right: 0.65em
display: inline-block
vertical-align: middle
opacity: 0.5
//- Juniper
[data-executable]
margin-bottom: 0
.juniper-cell
border: 0
.juniper-input
padding: 0
.juniper-output
color: inherit
background: inherit
padding: 0
.jp-OutputArea
&:not(:empty)
padding: 2rem 2rem 1rem
border-top: 1px solid $color-dark
margin-top: 2rem
.entities, svg
white-space: initial
font-family: inherit
.entities
font-size: 1.35rem
.jp-OutputArea pre
font: inherit
.jp-OutputPrompt.jp-OutputArea-prompt
padding-top: 0.5em
margin-right: 1rem
font-family: inherit
font-weight: bold
.juniper-button
@extend .u-text-label, .u-text-label--dark
position: static
.juniper-wrapper
position: relative
.juniper-wrapper__text
@include position(absolute, top, right, 1.25rem, 1.25rem)
color: $color-subtle-dark
z-index: 10
//- Code
code, .CodeMirror, .jp-RenderedText, .jp-OutputArea
-webkit-font-smoothing: subpixel-antialiased
-moz-osx-font-smoothing: auto
//- Inline code
*:not(a):not(.c-code-block) > code
color: $color-dark
*:not(.c-code-block) > code
font-size: 90%
background-color: $color-subtle-light
padding: 0.2rem 0.4rem
border-radius: 0.25rem
font-family: $font-code
margin: 0
box-decoration-break: clone
white-space: nowrap
.c-aside__content &
background: lighten($color-front, 10)
color: $color-back
text-shadow: none
//- Syntax Highlighting (Prism)
[class*="language-"] .token
&.comment, &.prolog, &.doctype, &.cdata, &.punctuation
color: map-get($syntax-highlighting, comment)
&.property, &.tag, &.constant, &.symbol, &.deleted
color: map-get($syntax-highlighting, tag)
&.boolean, &.number
color: map-get($syntax-highlighting, number)
&.selector, &.attr-name, &.string, &.char, &.builtin, &.inserted
color: map-get($syntax-highlighting, selector)
@at-root .language-css .token.string,
&.operator, &.entity, &.url, &.variable
color: map-get($syntax-highlighting, operator)
&.atrule, &.attr-value, &.function
color: map-get($syntax-highlighting, function)
&.regex, &.important
color: map-get($syntax-highlighting, regex)
&.keyword
color: map-get($syntax-highlighting, keyword)
&.italic
font-style: italic
//- Syntax Highlighting (CodeMirror)
.CodeMirror.cm-s-default
background: $color-front
color: darken($color-back, 20)
.CodeMirror-selected
background: $color-theme
color: $color-back
.CodeMirror-cursor
border-left-color: currentColor
.cm-variable-2
color: inherit
font-style: italic
.cm-comment
color: map-get($syntax-highlighting, comment)
.cm-keyword, .cm-builtin
color: map-get($syntax-highlighting, keyword)
.cm-operator
color: map-get($syntax-highlighting, operator)
.cm-string
color: map-get($syntax-highlighting, selector)
.cm-number
color: map-get($syntax-highlighting, number)
.cm-def
color: map-get($syntax-highlighting, function)
//- Syntax highlighting (Jupyter)
.jp-RenderedText pre
.ansi-cyan-fg
color: map-get($syntax-highlighting, function)
.ansi-green-fg
color: $color-green
.ansi-red-fg
color: map-get($syntax-highlighting, operator)

View File

@ -1,63 +0,0 @@
//- 💫 CSS > COMPONENTS > LANDING
.c-landing
background: $color-theme
padding-top: $nav-height * 1.5
width: 100%
.c-landing__wrapper
background: $pattern
width: 100%
.c-landing__content
background: $pattern-overlay
width: 100%
min-height: 573px
.c-landing__headlines
position: relative
top: -1.5rem
left: 1rem
.c-landing__title
color: $color-back
text-align: center
margin-bottom: 0.75rem
.c-landing__blocks
@include breakpoint(min, sm)
position: relative
top: -25rem
margin-bottom: -25rem
.c-landing__card
padding: 3rem 2.5rem
.c-landing__banner
background: $color-theme
.c-landing__banner__content
@include breakpoint(min, md)
border: 4px solid
padding: 1rem 6.5rem 2rem 4rem
.c-landing__banner__text
font-weight: 500
strong
font-weight: 800
p
font-size: 1.5rem
@include breakpoint(min, md)
padding-top: 7rem
.c-landing__badge
transform: rotate(7deg)
display: block
text-align: center
@include breakpoint(min, md)
@include position(absolute, top, right, 16rem, 6rem)

View File

@ -1,39 +0,0 @@
//- 💫 CSS > COMPONENTS > LISTS
//- List Container
.c-list
@each $type, $counter in (numbers: decimal, letters: upper-latin, roman: lower-roman)
&.c-list--#{$type}
counter-reset: li
.c-list__item:before
content: counter(li, #{$counter}) '.'
font-size: 1em
padding-right: 1rem
//- List Item
.c-list__item
padding-left: 2rem
margin-bottom: 0.5em
margin-left: 1.25rem
&:before
content: '\25CF'
display: inline-block
font-size: 0.6em
font-weight: bold
padding-right: 1em
margin-left: -3.75rem
text-align: right
width: 2.5rem
counter-increment: li
box-sizing: content-box
//- List icon
.c-list__icon
margin-right: 1rem

View File

@ -1,68 +0,0 @@
//- 💫 CSS > COMPONENTS > MISC
.x-terminal
background: $color-subtle-light
color: $color-front
padding: $border-radius
border-radius: 1em
width: 100%
position: relative
&.x-terminal--small
background: $color-dark
color: $color-subtle
border-radius: 4px
margin-bottom: 4rem
.x-terminal__icons
display: none
position: absolute
padding: 10px
@include breakpoint(min, sm)
display: block
&:before,
&:after,
span
@include size(15px)
display: inline-block
float: left
border-radius: 50%
margin-right: 10px
&:before
content: ""
background: $color-red
span
background: $color-green
&:after
content: ""
background: $color-yellow
&.x-terminal__icons--small
&:before,
&:after,
span
@include size(10px)
.x-terminal__code
margin: 0
border: none
border-bottom-left-radius: 5px
border-bottom-right-radius: 5px
width: 100%
max-width: 100%
white-space: pre-wrap
.x-terminal__button.x-terminal__button
@include position(absolute, bottom, right, 2.65rem, 2.6rem)
background: $color-dark
border-color: $color-dark
&:hover
background: darken($color-dark, 5)
border-color: darken($color-dark, 5)

View File

@ -1,61 +0,0 @@
//- 💫 CSS > COMPONENTS > NAVIGATION
.c-nav
@include position(fixed, top, left, 0, 0)
@include size(100%, $nav-height)
background: $color-back
color: $color-theme
align-items: center
display: flex
justify-content: space-between
flex-flow: row nowrap
padding: 0 0 0 1rem
z-index: 30
width: 100%
box-shadow: $box-shadow
&.is-fixed
animation: slideInDown 0.5s ease-in-out
position: fixed
.c-nav__menu
@include size(100%)
display: flex
flex-flow: row nowrap
border-color: inherit
flex: 1
@include breakpoint(max, sm)
@include scroll-shadow-base($color-front)
overflow-x: auto
overflow-y: hidden
-webkit-overflow-scrolling: touch
@include breakpoint(min, md)
justify-content: flex-end
.c-nav__menu__item
display: flex
align-items: center
height: 100%
text-transform: uppercase
font-family: $font-secondary
font-size: 1.6rem
font-weight: bold
color: $color-theme
&:not(:first-child)
margin-left: 2em
&:last-child
@include scroll-shadow-cover(right, $color-back)
padding-right: 2rem
&:first-child
@include scroll-shadow-cover(left, $color-back)
padding-left: 2rem
&.is-active
color: $color-dark
pointer-events: none

View File

@ -1,100 +0,0 @@
//- 💫 CSS > COMPONENTS > QUICKSTART
.c-quickstart
border-radius: $border-radius
display: none
background: $color-subtle-light
&:not([style]) + .c-quickstart__info
display: none
.c-code-block
border-top-left-radius: 0
border-top-right-radius: 0
.c-quickstart__content
padding: 2rem 3rem
.c-quickstart__input
@include size(0)
opacity: 0
position: absolute
left: -9999px
.c-quickstart__label
cursor: pointer
background: $color-back
border: 1px solid $color-subtle
border-radius: 2px
display: inline-block
padding: 0.75rem 1.25rem
margin: 0 0.5rem 0.5rem 0
font-weight: bold
&:hover
background: lighten($color-theme-light, 5)
.c-quickstart__input:focus + &
border: 1px solid $color-theme
.c-quickstart__input--radio:checked + &
color: $color-back
border-color: $color-theme
background: $color-theme
.c-quickstart__input--check + &:before
content: ""
background: $color-back
display: inline-block
width: 20px
height: 20px
border: 1px solid $color-subtle
vertical-align: middle
margin-right: 1rem
cursor: pointer
border-radius: 2px
.c-quickstart__input--check:checked + &:before
background: $color-theme url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0Ij4gICAgPHBhdGggZmlsbD0iI2ZmZiIgZD0iTTkgMTYuMTcybDEwLjU5NC0xMC41OTQgMS40MDYgMS40MDYtMTIgMTItNS41NzgtNS41NzggMS40MDYtMS40MDZ6Ii8+PC9zdmc+)
background-size: contain
border-color: $color-theme
.c-quickstart__label__meta
font-weight: normal
color: $color-subtle-dark
.c-quickstart__group
@include breakpoint(min, md)
display: flex
flex-flow: row nowrap
&:not(:last-child)
margin-bottom: 1rem
.c-quickstart__fields
flex: 100%
.c-quickstart__legend
margin-right: 2rem
padding-top: 0.75rem
flex: 1 1 35%
font-weight: bold
.c-quickstart__line
display: block
&:before
color: $color-theme
margin-right: 1em
&.c-quickstart__line--bash:before
content: "$"
&.c-quickstart__line--python:before
content: ">>>"
&.c-quickstart__line--divider
padding: 1.5rem 0
.c-quickstart__code
font-size: 1.4rem

View File

@ -1,95 +0,0 @@
//- 💫 CSS > COMPONENTS > SIDEBAR
//- Sidebar container
.c-sidebar
overflow-y: auto
@include breakpoint(min, md)
@include position(fixed, top, left, 0, 0)
@include size($sidebar-width, calc(100vh - 3px))
@include scroll-shadow($color-back, $color-front, $nav-height)
flex: 0 0 $sidebar-width
padding: calc(#{$nav-height} + 1.5rem) 0 0
z-index: 10
@include breakpoint(max, sm)
flex: 100%
width: 100%
margin-top: $nav-height
display: flex
flex-flow: row wrap
width: 100%
//- Sidebar section
.c-sidebar__section
& > *
padding: 0 2rem 0.35rem
@include breakpoint(max, sm)
flex: 1 1 0
padding: 1.25rem 0
border-bottom: 1px solid $color-subtle
margin: 0
&:not(:last-child)
border-right: 1px solid $color-subtle
.c-sidebar__item
color: $color-theme
&:hover
color: $color-theme-dark
& > .is-active
font-weight: bold
color: $color-dark
margin-top: 1rem
//- Sidebar subsections
$crumb-bullet: 14px
$crumb-bar: 2px
.c-sidebar__crumb
display: block
padding-top: 1rem
padding-left: 1rem
position: relative
.c-sidebar__crumb__item
margin-bottom: $crumb-bullet / 2
position: relative
padding-left: 2rem
color: $color-theme
font-size: 1.2rem
&:hover
color: $color-theme-dark
&:after
@include size($crumb-bullet)
@include position(absolute, top, left, $crumb-bullet / 4, 0)
content: ""
border-radius: 50%
background: $color-theme
z-index: 10
&:not(:last-child):before
@include size($crumb-bar, 100%)
@include position(absolute, top, left, $crumb-bullet, ($crumb-bullet - $crumb-bar) / 2)
content: ""
background: $color-subtle
&:first-child:before
height: calc(100% + #{$crumb-bullet * 2})
top: -$crumb-bullet / 2
&.is-active
color: $color-dark
&:after
background: $color-dark

View File

@ -1,86 +0,0 @@
//- 💫 CSS > COMPONENTS > TABLES
//- Table container
.c-table
vertical-align: top
//- Table row
.c-table__row
&:nth-child(odd):not(.c-table__row--head)
background: rgba($color-subtle-light, 0.35)
&.c-table__row--foot
background: $color-theme-light
border-top: 2px solid $color-theme
.c-table__cell:first-child
@extend .u-text-label
color: $color-theme
&.c-table__row--divider
border-top: 2px solid $color-theme
//- Table cell
.c-table__cell
padding: 1rem
&:not(:last-child)
border-right: 1px solid $color-subtle
&.c-table__cell--num
text-align: right
font-feature-settings: "tnum"
font-variant-numeric: tabular-nums
& > strong
font-feature-settings: initial
font-variant-numeric: initial
//- Table head cell
.c-table__head-cell
font-weight: bold
color: $color-theme
padding: 1rem 0.5rem
border-bottom: 2px solid $color-theme
//- Responsive table
//- Shadows adapted from "CSS only Responsive Tables" by David Bushell
//- http://codepen.io/dbushell/pen/wGaamR
@include breakpoint(max, md)
.c-table
@include scroll-shadow-base($color-front)
display: inline-block
overflow-x: auto
overflow-y: hidden
width: auto
-webkit-overflow-scrolling: touch
.c-table__cell,
.c-table__head-cell
&:first-child
@include scroll-shadow-cover(left, $color-back)
&:last-child
@include scroll-shadow-cover(right, $color-back)
&:first-child:last-child
@include scroll-shadow-cover(both, $color-back)
.c-table__row--foot .c-table__cell
&:first-child
@include scroll-shadow-cover(left, lighten($color-subtle-light, 2))
&:last-child
@include scroll-shadow-cover(right, lighten($color-subtle-light, 2))
&:first-child:last-child
@include scroll-shadow-cover(both, lighten($color-subtle-light, 2))

View File

@ -1,39 +0,0 @@
//- 💫 CSS > COMPONENTS > TOOLTIPS
[data-tooltip]
position: relative
@include breakpoint(min, sm)
&[data-tooltip-style="code"]:before
-webkit-font-smoothing: subpixel-antialiased
-moz-osx-font-smoothing: auto
padding: 0.35em 0.85em 0.45em
font: normal 1rem/#{1.25} $font-code
white-space: nowrap
min-width: auto
&:before
@include position(absolute, top, left, 125%, 50%)
display: inline-block
content: attr(data-tooltip)
background: $color-front
border-radius: $border-radius
border: 1px solid rgba($color-subtle-dark, 0.5)
color: $color-back
font: normal 1.2rem/#{1.25} $font-primary
text-transform: none
text-align: left
opacity: 0
transform: translateX(-50%) translateY(-2px)
transition: opacity 0.1s ease-out, transform 0.1s ease-out
visibility: hidden
max-width: 300px
min-width: 200px
padding: 0.75em 1em 1em
z-index: 200
white-space: pre-wrap
&:hover:before
opacity: 1
transform: translateX(-50%) translateY(0)
visibility: visible

View File

@ -1,80 +0,0 @@
//- 💫 CSS > MIXINS
// Helper for position
// $position - valid position value (static, absolute, fixed, relative)
// $pos-y - position direction Y (top, bottom)
// $pos-x - position direction X (left, right)
// $pos-y-value - value of position Y direction
// $pos-x-value - value of position X direction
@mixin position($position, $pos-y, $pos-x, $pos-y-value, $pos-x-value)
position: $position
#{$pos-y}: $pos-y-value
#{$pos-x}: $pos-x-value
// Helper for width and height
// $width - width of element
// $height - height of element (default: $width)
@mixin size($width, $height: $width)
width: $width
height: $height
//- Responsive Breakpoint utility
@mixin breakpoint($limit, $size)
$breakpoints-max: ( xs: map-get($breakpoints, sm) - 1, sm: map-get($breakpoints, md) - 1, md: map-get($breakpoints, lg) - 1 )
@if $limit == "min"
@media(min-width: #{map-get($breakpoints, $size)})
@content
@else if $limit == "max"
@media(max-width: #{map-get($breakpoints-max, $size)})
@content
// Scroll shadows for reponsive tables
// adapted from David Bushell, http://codepen.io/dbushell/pen/wGaamR
// $scroll-shadow-color - color of shadow
// $scroll-shadow-side - side to cover shadow (left or right)
// $scroll-shadow-background - original background color to match
@function scroll-shadow-gradient($scroll-gradient-direction, $scroll-shadow-background)
@return linear-gradient(to #{$scroll-gradient-direction}, rgba($scroll-shadow-background, 1) 50%, rgba($scroll-shadow-background, 0) 100%)
@mixin scroll-shadow-base($scroll-shadow-color, $scroll-shadow-intensity: 0.2)
background: radial-gradient(ellipse at 0 50%, rgba($scroll-shadow-color, $scroll-shadow-intensity) 0%, rgba(0,0,0,0) 75%) 0 center, radial-gradient(ellipse at 100% 50%, rgba($scroll-shadow-color, $scroll-shadow-intensity) 0%, transparent 75%) 100% center
background-attachment: scroll, scroll
background-repeat: no-repeat
background-size: 10px 100%, 10px 100%
@mixin scroll-shadow-cover($scroll-shadow-side, $scroll-shadow-background)
$scroll-gradient-direction: right !default
background-repeat: no-repeat
@if $scroll-shadow-side == right
$scroll-gradient-direction: left
background-position: 100% 0
@if $scroll-shadow-side == both
background-image: scroll-shadow-gradient(left, $scroll-shadow-background), scroll-shadow-gradient(right, $scroll-shadow-background)
background-position: 100% 0, 0 0
background-size: 20px 100%, 20px 100%
@else
background-image: scroll-shadow-gradient($scroll-gradient-direction, $scroll-shadow-background)
background-size: 20px 100%
// Full vertical scroll shadows
// adapted from: https://codepen.io/laustdeleuran/pen/DBaAu
@mixin scroll-shadow($background-color, $shadow-color, $shadow-offset: 0, $shadow-intensity: 0.4, $cover-size: 40px, $shadow-size: 15px)
background: linear-gradient($background-color 30%, rgba($background-color,0)) 0 $shadow-offset, linear-gradient(rgba($background-color,0), $background-color 70%) 0 100%, radial-gradient(50% 0, farthest-side, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) 0 $shadow-offset, radial-gradient(50% 100%,farthest-side, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) 0 100%
background: linear-gradient($background-color 30%, rgba($background-color,0)) 0 $shadow-offset, linear-gradient(rgba($background-color,0), $background-color 70%) 0 100%, radial-gradient(farthest-side at 50% 0, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) -20px $shadow-offset, radial-gradient(farthest-side at 50% 100%, rgba($shadow-color, $shadow-intensity), rgba($shadow-color,0)) 0 100%
background-repeat: no-repeat
background-color: $background-color
background-size: 100% $cover-size, 100% $cover-size, 100% $shadow-size, 100% $shadow-size
background-attachment: local, local, scroll, scroll

View File

@ -1,51 +0,0 @@
//- 💫 CSS > VARIABLES
// Settings and Sizes
$type-base: 11px
$nav-height: 55px
$content-width: 1250px
$sidebar-width: 235px
$aside-width: 27.5vw
$aside-padding: 25px
$border-radius: 6px
$logo-width: 85px
$logo-height: 27px
$grid: ( quarter: 4, third: 3, half: 2, two-thirds: 1.5, three-quarters: 1.33 )
$breakpoints: ( sm: 768px, md: 992px, lg: 1200px )
$headings: (1: 4.4, 2: 3.4, 3: 2.6, 4: 2.2, 5: 1.8)
// Fonts
$font-primary: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol" !default
$font-secondary: "HK Grotesk", Roboto, Helvetica, Arial, sans-serif !default
$font-code: Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace !default
// Colors
$colors: ( blue: #09a3d5, green: #05b083, purple: #6542d1 )
$color-back: #fff !default
$color-front: #1a1e23 !default
$color-dark: lighten($color-front, 20) !default
$color-theme: map-get($colors, $theme)
$color-theme-dark: darken(map-get($colors, $theme), 10)
$color-theme-light: rgba($color-theme, 0.05)
$color-subtle: #ddd !default
$color-subtle-light: #f6f6f6 !default
$color-subtle-dark: #949e9b !default
$color-red: #ef476f
$color-green: #7ddf64
$color-yellow: #f4c025
$syntax-highlighting: ( comment: #949e9b, tag: #b084eb, number: #b084eb, selector: #ffb86c, operator: #ff2c6d, function: #35b3dc, keyword: #ff2c6d, regex: #f4c025 )
$pattern: $color-theme url("/assets/img/pattern_#{$theme}.jpg") center top repeat
$pattern-overlay: transparent url("/assets/img/pattern_landing.jpg") center -138px no-repeat
$box-shadow: 0 1px 5px rgba(0, 0, 0, 0.2)

View File

@ -1,37 +0,0 @@
//- 💫 STYLESHEET
$theme: blue !default
// Variables
@import variables
@import mixins
// Base
@import _base/reset
@import _base/fonts
@import _base/animations
@import _base/grid
@import _base/layout
@import _base/objects
@import _base/utilities
// Components
@import _components/asides
@import _components/buttons
@import _components/chat
@import _components/code
@import _components/landing
@import _components/lists
@import _components/misc
@import _components/navigation
@import _components/progress
@import _components/sidebar
@import _components/tables
@import _components/quickstart
@import _components/tooltips

View File

@ -1,4 +0,0 @@
//- 💫 STYLESHEET (GREEN)
$theme: green
@import style

View File

@ -1,4 +0,0 @@
//- 💫 STYLESHEET (PURPLE)
$theme: purple
@import style

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

View File

@ -1 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 308.5 595.3 213"><path fill="#09a3d5" d="M73.7 395.2c-13.5-1.6-14.5-19.7-31.8-18.1-8.4 0-16.2 3.5-16.2 11.2 0 11.6 17.9 12.7 28.7 15.6 18.4 5.6 36.2 9.4 36.2 29.4 0 25.4-19.9 34.2-46.2 34.2-22 0-44.3-7.8-44.3-28 0-5.6 5.4-10 10.6-10 6.6 0 8.9 2.8 11.2 7.4 5.1 9 10.8 13.8 25 13.8 9 0 18.2-3.4 18.2-11.2 0-11.1-11.3-13.5-23-16.2-20.7-5.8-38.5-8.8-40.6-31.8-2.2-39.2 79.5-40.7 84.2-6.3-.1 6.2-5.9 10-12 10zm97.2-34.4c28.7 0 45 24 45 53.6 0 29.7-15.8 53.6-45 53.6-16.2 0-26.3-6.9-33.6-17.5v39.2c0 11.8-3.8 17.5-12.4 17.5-10.5 0-12.4-6.7-12.4-17.5v-114c0-9.3 3.9-15 12.4-15 8 0 12.4 6.3 12.4 15v3.2c8.1-10.2 17.4-18.1 33.6-18.1zm-6.8 86.8c16.8 0 24.3-15.5 24.3-33.6 0-17.7-7.6-33.6-24.3-33.6-17.5 0-25.6 14.4-25.6 33.6 0 18.7 8.2 33.6 25.6 33.6zm71.3-58.8c0-20.6 23.7-28 46.7-28 32.3 0 45.6 9.4 45.6 40.6v30c0 7.1 4.4 21.3 4.4 25.6 0 6.5-6 10.6-12.4 10.6-7.1 0-12.4-8.4-16.2-14.4-10.5 8.4-21.6 14.4-38.6 14.4-18.8 0-33.6-11.1-33.6-29.4 0-16.2 11.6-25.5 25.6-28.7 0 .1 45-10.6 45-10.7 0-13.8-4.9-19.9-19.4-19.9-12.8 0-19.3 3.5-24.3 11.2-4 5.8-3.5 9.3-11.2 9.3-6.2-.1-11.6-4.3-11.6-10.6zm38.4 61.9c19.7 0 28-10.4 28-31.1v-4.4c-5.3 1.8-26.7 7.1-32.5 8-6.2 1.2-12.4 5.8-12.4 13.1.2 8 8.4 14.4 16.9 14.4zm144.7-129c27.8 0 57.9 16.6 57.9 43 0 6.8-5.1 12.4-11.8 12.4-9.1 0-10.4-4.9-14.4-11.8-6.7-12.3-14.6-20.5-31.8-20.5-26.6-.2-38.5 22.6-38.5 51 0 28.6 9.9 49.2 37.4 49.2 18.3 0 28.4-10.6 33.6-24.3 2.1-6.3 5.9-12.4 13.8-12.4 6.2 0 12.4 6.3 12.4 13.1 0 28-28.6 47.4-58 47.4-32.2 0-50.4-13.6-60.4-36.2-4.9-10.8-8-22-8-37.4-.2-43.4 25.1-73.5 67.8-73.5zm159 39.1c7.1 0 11.2 4.6 11.2 11.8 0 2.9-2.3 8.7-3.2 11.8l-34.2 89.9c-7.6 19.5-13.3 33-39.2 33-12.3 0-23-1.1-23-11.8 0-6.2 4.7-9.3 11.2-9.3 1.2 0 3.2.6 4.4.6 1.9 0 3.2.6 4.4.6 13 0 14.8-13.3 19.4-22.5l-33-81.7c-1.9-4.4-3.2-7.4-3.2-10 0-7.2 5.6-12.4 13.1-12.4 8.4 0 11.7 6.6 13.8 13.8l21.8 64.8 21.8-59.9c3.3-9.3 3.6-18.7 14.7-18.7z"/></svg>

Before

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 227 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 182 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 204 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 374 KiB

Some files were not shown because too many files have changed in this diff Show More