Update to new website

This commit is contained in:
Ines Montani 2016-10-31 19:04:15 +01:00
parent 06ff6dc0d3
commit 7615b41bff
119 changed files with 6161 additions and 3777 deletions

2
.gitignore vendored
View File

@ -107,3 +107,5 @@ website/demos/sense2vec/
# Website
website/_deploy.sh
website/package.json
website/blog/announcement.jade

View File

@ -1,7 +1,11 @@
//- ----------------------------------
//- 💫 404 ERROR
//- ----------------------------------
include _includes/_mixins
p.u-text-large.u-text-center Ooops, this page does not exist. Click #[a(href="javascript:history.go(-1)") here] to go back.
+landing-header
h1.c-landing__title.u-heading-0
| Ooops, this page#[br]
| does not exist!
h2.c-landing__title.u-heading-3.u-padding-small
a(href="javascript:history.go(-1)") Click here to go back.

View File

@ -1,13 +1,13 @@
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
# Source files for the spacy.io website and docs
# spacy.io website and docs
The [spacy.io](https://spacy.io) website is implemented in [Jade (aka Pug)](https://www.jade-lang.org), and is built or served by [Harp](https://harpjs.com). Jade is an extensible templating language with a readable syntax, that compiles to HTML.
The website source makes extensive use of Jade mixins, so that the design system is abstracted away from the content you're
writing. You can read more about our approach in our blog post, ["Rebuilding a Website with Modular Markup"](https://explosion.ai/blog/modular-markup).
## Building the site
## Viewing the site locally
```bash
sudo npm install --global harp
@ -17,3 +17,102 @@ harp server
```
This will serve the site on [http://localhost:9000](http://localhost:9000).
## Making changes to the site
The docs can always use another example or more detail, and they should always be up to date and not misleading. If you see something, say something we always appreciate a [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the correct file to edit, simply click on the "Suggest edits" button at the bottom of a page.
### File structure
While all page content lives in the `.jade` files, article meta (page titles, sidebars etc.) is stored as JSON. Each folder contains a `_data.json` with all required meta for its files.
For simplicity, all sites linked in the [tutorials](https://spacy.io/docs/usage/tutorials) and [showcase](https://spacy.io/docs/usage/showcase) are also stored as JSON. So in order to edit those pages, there's no need to dig into the Jade files simply edit the [`_data.json`](website/docs/usage/_data.json).
### Markup language and conventions
Jade/Pug is a whitespace-sensitive markup language that compiles to HTML. Indentation is used to nest elements, and for template logic, like `if`/`else` or `for`, mainly used to iterate over objects and arrays in the meta data. It also allows inline JavaScript expressions.
For an overview of Harp and Jade, see [this blog post](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade). For more info on the Jade/Pug syntax, check out their [documentation](https://pugjs.org).
In the [spacy.io](https://spacy.io) source, we use 4 spaces to indent and hard-wrap at 80 characters.
```pug
p This is a very short paragraph. It stays inline.
p
| This is a much longer paragraph. It's hard-wrapped at 80 characters to
| make it easier to read on GitHub and in editors that do not have soft
| wrapping enabled. To prevent Jade from interpreting each line as a new
| element, it's prefixed with a pipe and two spaces. This ensures that no
| spaces are dropped for example, if your editor strips out trailing
| whitespace by default. Inline links are added using the inline syntax,
| like this: #[+a("https://google.com") Google].
```
Note that for external links, `+a("...")` is used instead of `a(href="...")` it's a mixin that takes care of adding all required attributes.
### Mixins
Each file includes a collection of [custom mixins](website/_includes/_mixins.jade) that make it easier to add content components no HTML or class names required.
For example:
```pug
//- Bulleted list
+list
+item This is a list item.
+item This is another list item.
//- Table with header
+table([ "Header one", "Header two" ])
+row
+cell Table cell
+cell Another one
+row
+cell And one more.
+cell And the last one.
//- Headlines with optional permalinks
+h(2, "link-id") Headline 2 with link to #link-id
```
Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace:
```pug
+code("This is a label").
import spacy
en_nlp = spacy.load('en')
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
```
You can find the documentation for the available mixins in [`_includes/_mixins.jade`](website/_includes/_mixins.jade).
### Linking to the Github repo
Since GitHub links can be long and tricky, you can use the `gh()` function to generate them automatically for spaCy and all repositories owned by [explosion](https://github.com/explosion):
```pug
//- Syntax: gh(repo, [file], [branch])
+src(gh("spaCy", "spacy/matcher.pyx"))
//- https://github.com/explosion/spaCy/blob/master/spacy/matcher.pyx
```
`+src()` creates a link with a little source icon to indicate it's linking to a code source.
### Most common causes of compile errors
| Problem | Fix |
| --- | --- |
| JSON formatting errors | make sure last elements of objects don't end with commas and/or use a JSON linter |
| unescaped characters like `<` or `>` and sometimes `'` in inline elements | replace with encoded version: `&lt;`, `&gt;` etc. |
| "Cannot read property 'call' of undefined" / "foo is not a function" | make sure mixin names are spelled correctly and mixins file is included with the correct path |
| "no closing bracket found" | make sure inline elements end with a `]`, like `#[code spacy.load('en')]` and for nested inline elements, make sure they're all on the same line and contain spaces between them (**bad:** `#[+api("doc")#[code Doc]]`) |
If Harp fails and throws a Jade error, don't take the reported line number at face value it's often wrong, as the page is compiled from templates and several files.

View File

@ -2,27 +2,27 @@
"index": {
"landing": true,
"logos": [
[
["chartbeat", "https://chartbeat.com"],
["socrata", "https://www.socrata.com"],
["chattermill", "https://chattermill.io"],
["cytora", "http://www.cytora.com"],
["signaln", "http://signaln.com"],
["duedil", "https://www.duedil.com/"],
["spyjack", "https://spyjack.io"]
],
[
["keyreply", "https://keyreply.com/"],
["dato", "https://dato.com"],
["kip", "http://kipthis.com"],
["wonderflow", "http://www.wonderflow.co"],
["foxtype", "https://foxtype.com"]
],
[
["synapsify", "http://www.gosynapsify.com"],
["stitchfix", "https://www.stitchfix.com/"],
["wayblazer", "http://wayblazer.com"]
]
{
"chartbeat": "https://chartbeat.com",
"cytora": "http://www.cytora.com",
"duedil": "https://www.duedil.com",
"socrata": "https://www.socrata.com",
"indico": "https://indico.io",
"signaln": "http://signaln.com"
},
{
"keyreply": "https://keyreply.com",
"dato": "https://dato.com",
"kip": "http://kipthis.com",
"wonderflow": "http://www.wonderflow.co",
"foxtype": "https://foxtype.com"
},
{
"synapsify": "http://www.gosynapsify.com",
"stitchfix": "https://www.stitchfix.com",
"wayblazer": "http://wayblazer.com",
"chattermill": "https://chattermill.io"
}
]
},
@ -32,28 +32,10 @@
"404": {
"title": "404 Error",
"asides": false
"landing": true
},
"styleguide": {
"title" : "Styleguide",
"asides": true,
"sidebar": {
"About": [
["Introduction", "#section-introduction", "introduction"]
],
"Design": [
["Colors", "#section-colors", "colors"],
["Logo", "#section-logo", "logo"],
["Typography", "#section-typography", "typography"],
["Grid", "#section-grid", "grid"],
["Elements", "#section-elements", "elements"],
["Components", "#section-components", "components"]
],
"Code": [
["Source", "#section-source", "source"]
]
}
"announcement" : {
"title": "Important Announcement"
}
}

View File

@ -1,10 +1,10 @@
{
"globals": {
"title": "spaCy.io",
"title": "spaCy",
"description": "spaCy is a free open-source library featuring state-of-the-art speed and accuracy and a powerful Python API.",
"SITENAME": "spaCy",
"SLOGAN": "Industrial-strength Natural Language Processing",
"SLOGAN": "Industrial-strength Natural Language Processing in Python",
"SITE_URL": "https://spacy.io",
"EMAIL": "contact@explosion.ai",
@ -12,6 +12,8 @@
"COMPANY_URL": "https://explosion.ai",
"DEMOS_URL": "https://demos.explosion.ai",
"SPACY_VERSION": "1.1",
"SOCIAL": {
"twitter": "spacy_io",
"github": "explosion",
@ -21,9 +23,39 @@
"SCRIPTS" : [ "main", "prism" ],
"DEFAULT_SYNTAX" : "python",
"ANALYTICS": "UA-58931649-1",
"MAILCHIMP": {
"user": "spacy.us12",
"id": "83b0498b1e7fa3c91ce68c3f1",
"list": "89ad33e698"
},
"NAVIGATION": {
"Home": "/",
"Docs": "/docs",
"Demos": "/docs/usage/showcase",
"Blog": "https://explosion.ai/blog"
},
"FOOTER": {
"spaCy": {
"Usage": "/docs/usage",
"API Reference": "/docs/api",
"Tutorials": "/docs/usage/tutorials",
"Showcase": "/docs/usage/showcase"
},
"Support": {
"Issue Tracker": "https://github.com/explosion/spaCy/issues",
"StackOverflow": "http://stackoverflow.com/questions/tagged/spacy",
"Reddit usergroup": "https://www.reddit.com/r/spacynlp/",
"Gitter chat": "https://gitter.im/explosion/spaCy"
},
"Connect": {
"Twitter": "https://twitter.com/spacy_io",
"GitHub": "https://github.com/explosion/spaCy",
"Blog": "https://explosion.ai/blog",
"Contact": "mailto:contact@explosion.ai"
}
}
"SPACY_VERSION": "1.0",
"SPACY_STARS": "2500",
"GITHUB": { "user": "explosion", "repo": "spacy" }
}
}

View File

@ -1,17 +1,30 @@
//- ----------------------------------
//- 💫 INCLUDES > FOOTER
//- ----------------------------------
include _mixins
footer.o-footer.o-inline-list.u-pattern.u-text-center.u-text-label.u-text-strong
span &copy; #{new Date().getFullYear()} #[+a(COMPANY_URL, true)=COMPANY]
footer.o-footer.u-text.u-border-dotted
+grid.o-content
each group, label in FOOTER
+grid-col("quarter")
ul
li.u-text-label.u-color-subtle=label
+a(COMPANY_URL + "/legal", true) Legal / Imprint
a(href="mailto:#{EMAIL}") #[+icon("mail", 16)]
each url, item in group
li
+a(url)(target=url.includes("http") ? "_blank" : "")=item
+a("https://twitter.com/" + SOCIAL.twitter)(aria-label="Twitter")
+icon("twitter", 20)
if SECTION != "docs"
+grid-col("quarter")
include _newsletter
+a("https://github.com/" + SOCIAL.github + "/spaCy")(aria-label="GitHub")
+icon("github", 20)
if SECTION == "docs"
.o-content.o-block.u-border-dotted
include _newsletter
.o-inline-list.u-text-center.u-text-tiny.u-color-subtle
span &copy; #{new Date().getFullYear()} #[+a(COMPANY_URL, true)=COMPANY]
+a(COMPANY_URL, true)
+svg("graphics", "explosion", 45).o-icon.u-color-theme.u-grayscale
+a(COMPANY_URL + "/legal", true) Legal / Imprint

View File

@ -1,6 +1,11 @@
//- ----------------------------------
//- 💫 INCLUDES > FUNCTIONS
//- ----------------------------------
//- More descriptive variables for current.path and current.source
- CURRENT = current.source
- SECTION = current.path[0]
- SUBSECTION = current.path[1]
//- Add prefixes to items of an array (for modifier CSS classes)
@ -9,3 +14,10 @@
- return prefix + '--' + arg;
- }).join(' ');
- }
//- Generate GitHub links
- function gh(repo, filepath, branch) {
- return 'https://github.com/' + SOCIAL.github + '/' + repo + (filepath ? '/blob/' + (branch || 'master') + '/' + filepath : '' );
- }

View File

@ -1,6 +0,0 @@
//- ----------------------------------
//- 💫 INCLUDES > LOGO
//- ----------------------------------
svg.o-logo(class=(logo_size) ? "o-logo--" + logo_size : "" viewBox="0 0 675 215" width="500")
path(d="M83.6 83.3C68.3 81.5 67.2 61 47.5 62.8c-9.5 0-18.4 4-18.4 12.7 0 13.2 20.3 14.4 32.5 17.7 20.9 6.3 41 10.7 41 33.3 0 28.8-22.6 38.8-52.4 38.8-24.9 0-50.2-8.9-50.2-31.8 0-6.4 6.1-11.3 12-11.3 7.5 0 10.1 3.2 12.7 8.4 5.8 10.2 12.3 15.6 28.3 15.6 10.2 0 20.6-3.9 20.6-12.7 0-12.6-12.8-15.3-26.1-18.4-23.5-6.6-43.6-10-46-36.1C-1 34.5 91.7 32.9 97 71.9c.1 7.1-6.5 11.4-13.4 11.4zm110.2-39c32.5 0 51 27.2 51 60.8 0 33.7-17.9 60.8-51 60.8-18.4 0-29.8-7.8-38.1-19.8v44.5c0 13.4-4.3 19.8-14.1 19.8-11.9 0-14.1-7.6-14.1-19.8V61.3c0-10.6 4.4-17 14.1-17 9.1 0 14.1 7.2 14.1 17v3.6c9.2-11.6 19.7-20.6 38.1-20.6zm-7.7 98.4c19.1 0 27.6-17.6 27.6-38.1 0-20.1-8.6-38.1-27.6-38.1-19.8 0-29 16.3-29 38.1 0 21.2 9.2 38.1 29 38.1zM266.9 76c0-23.4 26.9-31.7 52.9-31.7 36.6 0 51.7 10.7 51.7 46v34c0 8.1 5 24.1 5 29 0 7.4-6.8 12-14.1 12-8.1 0-14.1-9.5-18.4-16.3-11.9 9.5-24.5 16.3-43.8 16.3-21.3 0-38.1-12.6-38.1-33.3 0-18.4 13.2-28.9 29-32.5 0 .1 51-12 51-12.1 0-15.7-5.5-22.6-22-22.6-14.5 0-21.9 4-27.5 12.7-4.5 6.6-4 10.6-12.7 10.6-6.9-.1-13-4.9-13-12.1zm43.6 70.2c22.3 0 31.8-11.8 31.8-35.3v-5c-6 2-30.3 8-36.8 9.1-7 1.4-14.1 6.6-14.1 14.9.1 9.1 9.4 16.3 19.1 16.3zM474.5 0c31.5 0 65.7 18.8 65.7 48.8 0 7.7-5.8 14.1-13.4 14.1-10.3 0-11.8-5.5-16.3-13.4-7.6-13.9-16.5-23.3-36.1-23.3-30.2-.2-43.7 25.6-43.7 57.8 0 32.4 11.2 55.8 42.4 55.8 20.7 0 32.2-12 38.1-27.6 2.4-7.1 6.7-14.1 15.6-14.1 7 0 14.1 7.2 14.1 14.8 0 31.8-32.4 53.8-65.8 53.8-36.5 0-57.2-15.4-68.5-41-5.5-12.2-9.1-24.9-9.1-42.4-.1-49.2 28.6-83.3 77-83.3zm180.3 44.3c8 0 12.7 5.2 12.7 13.4 0 3.3-2.6 9.9-3.6 13.4L625.1 173c-8.6 22.1-15.1 37.4-44.5 37.4-14 0-26.1-1.2-26.1-13.4 0-7 5.3-10.6 12.7-10.6 1.4 0 3.6.7 5 .7 2.1 0 3.6.7 5 .7 14.7 0 16.8-15.1 22-25.5l-37.4-92.6c-2.1-5-3.6-8.4-3.6-11.3 0-8.2 6.4-14.1 14.8-14.1 9.5 0 13.3 7.5 15.6 15.6l24.7 73.5L638 65.5c3.9-10.5 4.2-21.2 16.8-21.2z")

View File

@ -0,0 +1,102 @@
//- 💫 MIXINS > BASE
//- Aside wrapper
mixin aside-wrapper(label)
aside.c-aside
.c-aside__content(role="complementary")&attributes(attributes)
if label
h4.u-text-label.u-text-label--dark=label
block
//- Date
input - [string] date in the format YYYY-MM-DD
mixin date(input)
- var date = new Date(input)
- var months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]
time(datetime=JSON.parse(JSON.stringify(date)))&attributes(attributes)=months[date.getMonth()] + ' ' + date.getDate() + ', ' + date.getFullYear()
//- SVG from map
mixin svg(file, name, width, height)
svg(aria-hidden="true" viewBox="0 0 #{width} #{height || width}" width=width height=(height || width))&attributes(attributes)
use(xlink:href="/assets/img/#{file}.svg##{name}")
//- Icon
mixin icon(name, size)
+svg("icons", "icon-" + name, size || 20).o-icon&attributes(attributes)
//- Pro/Con/Neutral icon
mixin procon(icon)
- colors = { pro: "green", con: "red" }
+icon(icon)(class="u-color-#{colors[icon] || 'subtle'}" aria-label=icon)&attributes(attributes)
//- Headlines Helper Mixin
mixin headline(level)
if level == 1
h1.u-heading-1&attributes(attributes)
block
else if level == 2
h2.u-heading-2&attributes(attributes)
block
else if level == 3
h3.u-heading-3&attributes(attributes)
block
else if level == 4
h4.u-heading-4&attributes(attributes)
block
else if level == 5
h5.u-heading-5&attributes(attributes)
block
//- Permalink rendering
mixin permalink(id)
if id
a.u-permalink(id=id href="##{id}")
+icon("anchor").u-permalink__icon
block
else
block
//- Terminal-style code window
mixin terminal(label)
.x-terminal
.x-terminal__icons: span
.u-padding-small.u-text-label.u-text-center=label
+code.x-terminal__code
block
//- Logo
mixin logo()
+svg("graphics", "spacy", 500).o-logo&attributes(attributes)
//- Landing
mixin landing-header()
header.c-landing
.c-landing__wrapper
.c-landing__content
block

View File

@ -1,9 +1,255 @@
//- ----------------------------------
//- 💫 INCLUDES > MIXINS
//- ----------------------------------
include _functions
include _mixins-base
include _mixins/_base
include _mixins/_components
include _mixins/_headlines
//- Headlines
level - [integer] headline level, corresponds to h1, h2, h3 etc.
id - [string] unique identifier, creates permalink (optional)
mixin h(level, id)
+headline(level).u-heading&attributes(attributes)
+permalink(id)
block
//- External links
url - [string] link href
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
info: https://mathiasbynens.github.io/rel-noopener/
mixin a(url, trusted)
a(href=url target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
block
//- Source link (with added icon for "code")
url - [string] link href, can also be gh() function to generate GitHub link
see _functions.jade for more info
mixin src(url)
+a(url)
block
| #[+icon("code", 16).u-color-subtle]
//- API link (with added tag and automatically generated path)
path - [string] path to API docs page relative to /docs/api/
mixin api(path)
+a("/docs/api/" + path, true)(target="_self").u-no-border.u-inline-block
block
| #[+icon("book", 18).o-help-icon.u-color-subtle]
//- Aside for text
label - [string] aside title (optional)
mixin aside(label)
+aside-wrapper(label)
.c-aside__text.u-text-small
block
//- Aside for code
label - [string] aside title (optional or false for no label)
language - [string] language for syntax highlighting (default: "python")
supports basic relevant languages available for PrismJS
mixin aside-code(label, language)
+aside-wrapper(label)
+code(false, language).o-no-block
block
//- Link button
url - [string] link href
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
info: https://mathiasbynens.github.io/rel-noopener/
...style - all other arguments are added as class names c-button--argument
see assets/css/_components/_buttons.sass
mixin button(url, trusted, ...style)
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
block
//- Code block
label - [string] aside title (optional or false for no label)
language - [string] language for syntax highlighting (default: "python")
supports basic relevant languages available for PrismJS
mixin code(label, language)
pre.c-code-block.o-block(class="lang-#{(language || DEFAULT_SYNTAX)}")&attributes(attributes)
if label
h4.u-text-label.u-text-label--dark=label
code.c-code-block__content
block
//- Images / figures
url - [string] url or path to image
width - [integer] image width in px, for better rendering (default: 500)
caption - [string] image caption
alt - [string] alternative image text, defaults to caption
mixin image(url, width, caption, alt)
figure.o-block&attributes(attributes)
img(src=url alt=(alt || caption) width="#{width || 500}")
if caption
+image-caption=caption
else
block
//- Image caption
mixin image-caption()
figcaption.u-text-small.u-color-subtle.u-padding-small&attributes(attributes)
block
//- Label
mixin label()
.u-text-label.u-color-subtle&attributes(attributes)
block
//- Tag
mixin tag()
span.u-text-tag.u-text-tag--spaced(aria-hidden="true")
block
//- List
type - [string] "numbers", "letters", "roman" (bulleted list if none set)
start - [integer] start number
mixin list(type, start)
if type
ol.c-list.o-block.u-text(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : "")&attributes(attributes)
block
else
ul.c-list.c-list--bullets.o-block.u-text&attributes(attributes)
block
//- List item (only used within +list)
mixin item(procon)
if procon
li&attributes(attributes)
+procon(procon).c-list__icon
block
else
li.c-list__item&attributes(attributes)
block
//- Table
head - [array] table headings (should match number of columns)
mixin table(head)
table.c-table.o-block&attributes(attributes)
if head
+row
each column in head
th.c-table__head-cell.u-text-label=column
block
//- Table row (only used within +table)
mixin row()
tr.c-table__row&attributes(attributes)
block
//- Footer table row (only ued within +table)
mixin footrow()
tr.c-table__row.c-table__row--foot&attributes(attributes)
block
//- Table cell (only used within +row in +table)
mixin cell()
td.c-table__cell.u-text&attributes(attributes)
block
//- Grid Container
mixin grid()
.o-grid.o-block&attributes(attributes)
block
//- Grid Column (only used within +grid)
width - [string] "quarter", "third", "half", "two-thirds", "three-quarters"
see $grid in assets/css/_variables.sass
mixin grid-col(width)
.o-grid__col(class="o-grid__col--#{width}")&attributes(attributes)
block
//- Card (only used within +grid)
title - [string] card title
details - [object] url, image, author, description, tags etc.
(see /docs/usage/_data.json)
mixin card(title, details)
+grid-col("half").u-border.u-padding-medium.u-text&attributes(attributes)
if details.image
+a(details.url).o-block-small
img(src=details.image alt=title width="300" role="presentation")
if title
+a(details.url)
+h(3)=title
if details.author
.u-text-small.u-color-subtle by #{details.author}
if details.description || details.tags
ul
if details.description
li=details.description
if details.tags
li
each tag in details.tags
span.u-text-tag #{tag}
| &nbsp;
block
//- Simpler card list item (only used within +list)
title - [string] card title
details - [object] url, image, author, description, tags etc.
(see /docs/usage/_data.json)
mixin card-item(title, details)
+item&attributes(attributes)
+a(details.url)=title
if details.description
br
span=details.description
if details.author
br
span.u-text-small.u-color-subtle by #{details.author}

View File

@ -1,42 +0,0 @@
//- ----------------------------------
//- 💫 MIXINS > BASE
//- ----------------------------------
//- External Link
mixin a(url, trusted)
a(href=url target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
block
//- Sections for content pages
id - [string] id, can be headline id as it's being prefixed (optional)
block - section content (block and inline elements)
mixin section(id)
section.o-section(id=(id) ? 'section-' + id : '')&attributes(attributes)
block
//- Date
input - [string] date in the format YYYY-MM-DD
mixin date(input)
- var date = new Date(input)
- var months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]
time(datetime=JSON.parse(JSON.stringify(date)))&attributes(attributes)=months[date.getMonth()] + ' ' + date.getDate() + ', ' + date.getFullYear()
//- Grid Container
mixin grid(...style)
.o-grid.o-block(class=prefixArgs(style, "o-grid"))&attributes(attributes)
block
//- Grid Column
mixin grid-col(...style)
.o-grid__col(class=prefixArgs(style, "o-grid__col"))&attributes(attributes)
block

View File

@ -1,112 +0,0 @@
//- ----------------------------------
//- 💫 MIXINS > COMPONENTS
//- ----------------------------------
//- Aside
mixin aside(label)
span.c-aside.u-text-small(role="complementary")&attributes(attributes)
span.c-aside__label.u-text-label.u-text-strong.u-color-theme=label
block
//- Button
mixin button(url, trusted, ...style)
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
block
//- Code
mixin code(language, label, small)
pre.c-code-block(class="lang-#{(language || DEFAULT_SYNTAX)} #{small ? '' : 'o-block'}")&attributes(attributes)
if label
span.c-code-block__label.u-text-label.u-text-strong=label
code.c-code-block__content(class=small ? "u-code-small" : "u-code-regular")
block
//- Icon
mixin icon(name, size)
- var size = size || 20
svg.o-icon(aria-hidden="true" viewBox="0 0 #{size} #{size}" width=size height=size)&attributes(attributes)
use(xlink:href="/assets/img/icons.svg#icon-#{name}")
//- Image for illustration purposes
file - [string] file name (in /assets/img)
alt - [string] descriptive alt text (optional)
caption - [string] image caption (optional)
mixin image(file, alt, caption)
figure.o-block&attributes(attributes)
img(src="/assets/img/#{file}" alt=(alt || caption) width="800")
if caption
figcaption.u-text-small=caption
block
//- Label
mixin label()
.u-text-label.u-text-strong.u-color-theme&attributes(attributes)
block
//- List
mixin list(type, start)
if type
ol.c-list.o-block(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : "")&attributes(attributes)
block
else
ul.c-list.c-list--bullets.o-block&attributes(attributes)
block
//- List item
mixin item()
li.c-list__item.u-text-regular&attributes(attributes)
block
//- Table
mixin table(head)
table.c-table.o-block.has-aside&attributes(attributes)
if head
+row
each column in head
th.c-table__head-cell.u-text-label.u-text-strong=column
block
//- Table row
mixin row(...style)
tr.c-table__row(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
block
//- Table cell
mixin cell(...style)
td.c-table__cell.u-text-regular.has-aside(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
block
//- Tag
mixin tag()
span.u-text-tag.u-text-label.u-color-theme.u-text-strong.u-padding-small
block

View File

@ -1,49 +0,0 @@
//- ----------------------------------
//- 💫 MIXINS > HEADLINES
//- ----------------------------------
//- Headlines Helper Mixin
mixin headline(level)
if level == 1
h1.u-heading-1&attributes(attributes)
block
else if level == 2
h2.u-heading-2&attributes(attributes)
block
else if level == 3
h3.u-heading-3&attributes(attributes)
block
else if level == 4
h4.u-heading-4&attributes(attributes)
block
else if level == 5
h5.u-heading-5&attributes(attributes)
block
//- Permalink rendering
mixin permalink(id)
if id
a.u-permalink(id=id href="##{id}")
+icon("link").u-permalink__icon
block
else
block
//- Headlines
mixin h(level, id, source)
+headline(level)&attributes(attributes)
+permalink(id)
block
if source
+button(source, false, "secondary").u-text-small.u-float-right Source

View File

@ -1,26 +1,17 @@
//- ----------------------------------
//- 💫 INCLUDES > TOP NAVIGATION
//- ----------------------------------
include _mixins
nav.c-nav.u-text-label.js-nav
nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : "")
a(href='/') #[+logo]
a(href='/')
!=partial("_includes/_logo", { logo_size: 'small' })
if SUBSECTION != "index"
.u-text-label.u-padding-small=SUBSECTION
ul.c-nav__menu
li.c-nav__menu__item(class=(current.path[0] == 'index') ? "is-active" : "")
a(href='/') Home
li.c-nav__menu__item(class=(current.path[0] == 'docs') ? "is-active" : "")
a(href="/docs") Docs
each url, item in NAVIGATION
li.c-nav__menu__item
a(href=url target=url.includes("http") ? "_blank" : "")=item
li.c-nav__menu__item
a(href="https://demos.explosion.ai" target="_blank") Demos
li.c-nav__menu__item
a(href="https://explosion.ai/blog" target="_blank") Blog
li.c-nav__menu__item
a(href="https://github.com/" + SOCIAL.github + "/spaCy" target="_blank") #[+icon("github", 18)] #[span.u-hidden-sm GitHub]
+a(gh("spaCy"))(aria-label="GitHub").u-hidden-xs #[+icon("github", 20)]

View File

@ -1,20 +1,16 @@
//- ----------------------------------
//- 💫 INCLUDES > NEWSLETTER SIGNUP
//- ----------------------------------
//- 💫 INCLUDES > NEWSLETTER
include _mixins
ul.o-block
li.u-text-label.u-color-subtle Stay in the loop!
li Receive updates about new releases, tutorials and more.
.o-block.u-text-center.u-padding.u-border-top
form.o-grid#mc-embedded-subscribe-form(action="//#{MAILCHIMP.user}.list-manage.com/subscribe/post?u=#{MAILCHIMP.id}&amp;id=#{MAILCHIMP.list}" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
+label Sign up for the spaCy newsletter
h3.u-heading-1 Stay in the loop!
p.u-text-large Receive updates about new releases, tutorials and more.
//- MailChimp spam protection
div(style="position: absolute; left: -5000px;" aria-hidden="true")
input(type="text" name="b_#{MAILCHIMP.id}_#{MAILCHIMP.list}" tabindex="-1" value="")
form#mc-embedded-subscribe-form.o-inline-list(action="https://spacy.us12.list-manage.com/subscribe/post?u=83b0498b1e7fa3c91ce68c3f1&amp;id=89ad33e698" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
input#mce-EMAIL.u-border.u-padding-small.u-text-regular(type="email" name="EMAIL" placeholder="Your email address")
.o-grid-col.u-border.u-padding-small
input#mce-EMAIL.u-text(type="email" name="EMAIL" placeholder="Your email")
//- Spam bot protection
div(style="position: absolute; left: -5000px;" aria-hidden="true")
input(type="text" name="b_83b0498b1e7fa3c91ce68c3f1_89ad33e698" tabindex="-1" value="")
button#mc-embedded-subscribe.c-button.c-button--primary.u-text-label(type="submit" name="subscribe") Sign up
button#mc-embedded-subscribe.u-text-label.u-color-theme(type="submit" name="subscribe") Sign up

View File

@ -0,0 +1,27 @@
//- 💫 INCLUDES > DOCS PAGE TEMPLATE
- sidebar_content = (SUBSECTION != "index") ? public.docs[SUBSECTION]._data.sidebar : public.docs._data.sidebar || FOOTER
include _sidebar
main.o-main.o-main--sidebar.o-main--aside
article.o-content
+h(1)=title
if tag
+tag=tag
!=yield
+grid.o-content.u-text
+grid-col("half")
if next && public.docs[SUBSECTION]._data[next]
- data = public.docs[SUBSECTION]._data[next]
.o-inline-list
span #[strong.u-text-label Read next:] #[a(href=next).u-link=data.title]
+grid-col("half").u-text-right
.o-inline-list
+button(gh("spacy", "website/" + current.path.join('/') + ".jade"), false, "secondary").u-text-tag Suggest edits #[+icon("code", 14)]
include _footer

View File

@ -1,14 +0,0 @@
//- ----------------------------------
//- 💫 INCLUDES > SCRIPTS
//- ----------------------------------
each script in SCRIPTS
script(src="/assets/js/" + script + ".js", type="text/javascript")
if landing
script(async src="https://platform.twitter.com/widgets.js" charset="utf-8")
if environment == "deploy"
script window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
script(async src="https://www.google-analytics.com/analytics.js")

View File

@ -1,13 +1,13 @@
//- ----------------------------------
//- 💫 INCLUDES > SIDEBAR
//- ----------------------------------
include _mixins
nav.c-sidebar.js-sidebar
.c-sidebar__body.u-text-regular
each items, menu in sidebar
ul.o-block-small
menu.c-sidebar.js-sidebar.u-text
if sidebar_content
each items, menu in sidebar_content
ul.c-sidebar__section.o-block
li.u-text-label.u-color-subtle=menu
each item in items
li: a(href=item[1] data-section=(item[2]) ? "section-" + item[2] : "")=item[0]
each url, item in items
li(class=(CURRENT == url || (CURRENT == "index" && url == "./")) ? "is-active" : "")
+a(url)(target=url.includes("http") ? "_blank" : "")=item

View File

@ -1,13 +1,19 @@
//- ----------------------------------
//- 💫 GLOBAL LAYOUT
//- ----------------------------------
include _includes/_mixins
doctype html
html(lang="en")
title=(current.path[0] == "index") ? SITENAME + " | " + SLOGAN : title + " | " + SITENAME
title
if SECTION == "docs" && SUBSECTION && SUBSECTION != "index"
| #{title} | #{SITENAME} #{SUBSECTION == "api" ? "API" : "Usage"} Documentation
else if SECTION != "index"
| #{title} | #{SITENAME}
else
| #{SITENAME} - #{SLOGAN}
meta(charset="utf-8")
meta(name="viewport" content="width=device-width, initial-scale=1.0")
@ -19,41 +25,40 @@ html(lang="en")
meta(property="og:url" content="#{SITE_URL}/#{current.path.join('/')}")
meta(property="og:title" content=title)
meta(property="og:description" content=description)
meta(property="og:image" content="/assets/img/social.png")
meta(property="og:image" content="#{SITE_URL}/assets/img/social#{(SECTION == 'docs') ? '_docs' : ''}.jpg")
meta(name="twitter:card" content="summary_large_image")
meta(name="twitter:site" content="@" + SOCIAL.twitter)
meta(name="twitter:title" content=title)
meta(name="twitter:description" content=description)
meta(name="twitter:image" content="/assets/img/social.jpg")
meta(name="twitter:image" content="#{SITE_URL}/assets/img/social#{(SECTION == 'docs') ? '_docs' : ''}.jpg")
link(rel="shortcut icon" href="/assets/img/favicon.ico")
link(rel="icon" type="image/x-icon" href="/assets/img/favicon.ico")
link(href="/assets/css/style.css" rel="stylesheet")
if SUBSECTION == "usage"
link(href="/assets/css/style_red.css?v1" rel="stylesheet")
else
link(href="/assets/css/style.css?v1" rel="stylesheet")
body
include _includes/_navigation
if !landing
header.o-header.u-pattern.u-text-center
if current.path[1] == "tutorials"
h2.u-heading-1.u-text-shadow Tutorials
else
+h(1).u-text-shadow=title
if sidebar
include _includes/_sidebar
main.o-content(class="#{(sidebar) ? 'o-content--sidebar' : '' } #{((current.path[0] == 'docs' && asides != false) || asides) ? 'o-content--asides' : '' } #{(current.path[1] == 'tutorials') ? 'o-content--article' : '' }")
if current.path[1] == "tutorials"
+h(1)=title
!=yield
if SECTION == "docs"
include _includes/_page-docs
else
!=yield
main!=yield
include _includes/_footer
include _includes/_footer
each script in SCRIPTS
script(src="/assets/js/" + script + ".js?v1", type="text/javascript")
include _includes/_scripts
if environment == "deploy"
script
| window.ga=window.ga||function(){
| (ga.q=ga.q||[]).push(arguments)}; ga.l=+new Date;
| ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
script(async src="https://www.google-analytics.com/analytics.js")

14
website/announcement.jade Normal file
View File

@ -0,0 +1,14 @@
//- 💫 SPACY ANNOUNCEMENT FROM 2016-08-09 (needs to stay for reference)
include _includes/_mixins
.o-content.u-padding
+h(1)
+label #[+date("2016-08-09")]
| Dear spaCy users,
p Unfortunately, we (Henning Peters and Matthew Honnibal) are parting ways. Breaking up is never easy, and it's taken us a while to get our stuff together. Hopefully, you didn't notice anything was up — if you did, we hope you haven't been inconvenienced.
p Here's how this is going to work: Matt will continue to develop and maintain spaCy and all related projects under his name. Nothing will change for you. Henning will take over our legal structure and start a new business under a new name.
p Sincerely,#[br] Henning Peters and Matthew Honnibal

View File

@ -1,6 +1,4 @@
//- ----------------------------------
//- 💫 BASE > ANIMATIONS
//- ----------------------------------
//- 💫 CSS > BASE > ANIMATIONS
//- Fade in

View File

@ -1,6 +1,4 @@
//- ----------------------------------
//- 💫 BASE > FONTS
//- ----------------------------------
//- 💫 CSS > BASE > FONTS
// Source Sans Pro

View File

@ -1,6 +1,4 @@
//- ----------------------------------
//- 💫 BASE > GRID
//- ----------------------------------
//- 💫 CSS > BASE > GRID
//- Grid container

View File

@ -1,21 +1,14 @@
//- ----------------------------------
//- 💫 BASE > LAYOUT
//- ----------------------------------
//- 💫 CSS > BASE > LAYOUT
//- HTML
html
@include breakpoint(min, lg)
font-size: $type-base
font-size: $type-base
@include breakpoint(max, md)
font-size: $type-base * 0.8
//- Body
body
display: flex
flex-flow: row wrap
animation: fadeIn 0.25s ease
background: $color-back
color: $color-front
@ -24,15 +17,12 @@ body
//- Paragraphs
p
@extend .o-block, .u-text-regular, .has-aside
.o-content--article &:not([class])
@extend .u-text-medium
@extend .o-block, .u-text
//- Links
main p a, main table a, main li a, .c-aside a
main p a, main table a, main > *:not(footer) li a, .c-aside a
@extend .u-link
@ -41,4 +31,3 @@ main p a, main table a, main li a, .c-aside a
::selection
background: $color-theme
color: $color-back
text-shadow: none

View File

@ -1,67 +1,80 @@
//- ----------------------------------
//- 💫 BASE > OBJECTS
//- ----------------------------------
//- 💫 CSS > BASE > OBJECTS
//- Containers
//- Main container
.o-content
flex: 1 1 auto
padding: $nav-height 4rem 8rem
width: $content-width - $aside-width
.o-main
padding: $nav-height 0 0 0
max-width: 100%
min-height: 100vh
@include breakpoint(min, md)
&.o-content--asides
padding-left: 5rem
padding-right: $aside-width + $aside-padding * 2
&.o-main--sidebar
margin-left: $sidebar-width
//- Header
&.o-main--aside
margin-right: $aside-width
position: relative
.o-header
display: flex
justify-content: center
flex-flow: column nowrap
padding: 3rem 5rem
margin-top: $nav-height
width: 100%
min-height: 250px
&:after
@include position(absolute, top, left, 0, 100%)
@include size($aside-width, 100%)
content: ""
display: block
background: $pattern
z-index: -1
min-height: 100vh
//- Content container
.o-content
padding: 3rem 7.5rem
margin: 0 auto
width: $content-width
max-width: 100%
@include breakpoint(max, sm)
padding: 3rem
//- Footer
.o-footer
position: relative
padding: 5rem 0
padding: 2.5rem 0
overflow: auto
width: 100%
z-index: 200
//- Blocks
.o-block
margin-bottom: 5rem
margin-bottom: 3rem
.o-block-small
margin-bottom: 2rem
.o-section
margin-bottom: 12.5rem
.o-no-block
margin-bottom: 0
.o-responsive
overflow: auto
width: 100%
max-width: 100%
.o-card
background: $color-back
border-radius: 2px
//- Icons
.o-icon
vertical-align: middle
.o-help-icon
cursor: help
margin: 0 0.5rem 0 0.25rem
//- Inline List
.o-inline-list > *
display: inline
margin-bottom: 3rem
&:not(:last-child)
margin-right: 3rem
@ -70,9 +83,7 @@
//- Logo
.o-logo
@include size(100%, auto)
@include size($logo-width, auto)
fill: currentColor
@each $name, $size in $logo-sizes
&.o-logo--#{$name}
width: $size
vertical-align: middle
margin: 0 0.5rem

View File

@ -1,9 +1,4 @@
//- ----------------------------------
//- 💥 BASE > RESET
//- ----------------------------------
//- adapted from "normalize.css" by Nicolas Gallagher & Jonathan Neal
//- https://github.com/necolas/normalize.css
//- 💫 CSS > BASE > RESET
*
box-sizing: border-box
@ -11,12 +6,14 @@
margin: 0
border: 0
outline: 0
-webkit-font-smoothing: antialiased
html
font-family: sans-serif
text-rendering: optimizeSpeed
-ms-text-size-adjust: 100%
-webkit-text-size-adjust: 100%
-webkit-font-smoothing: antialiased
-moz-osx-font-smoothing: grayscale
body
margin: 0
@ -64,6 +61,7 @@ img
max-width: 100%
svg
max-width: 100%
color-interpolation-filters: sRGB
fill: currentColor
@ -88,17 +86,15 @@ table
max-width: 100%
border-collapse: collapse
td,
th
td, th
vertical-align: top
ul,
ol
ul, ol
list-style: none
input,
button
input, button
appearance: none
button
background: transparent
cursor: pointer

View File

@ -1,80 +1,84 @@
//- ----------------------------------
//- 💫 BASE > UTILITIES
//- ----------------------------------
//- 💫 CSS > BASE > UTILITIES
//- Text
%text
font-family: $font-primary
line-height: 1.5
.u-text-regular
@extend %text
font-size: 1.6rem
.u-text-medium
@extend %text
font-size: 2rem
.u-text
font: 1.5rem/#{1.55} $font-primary
.u-text-small
@extend %text
font-size: 1.2rem
font: 1.4rem/#{1.375} $font-primary
.u-text-large
@extend %text
font-size: 2.8rem
.u-text-tiny
font: 1.1rem/#{1.375} $font-primary
//- Labels & Tags
.u-text-label
@extend %text
font-size: 1.4rem
font-weight: normal
font: normal 600 1.4rem/#{1.5} $font-code
text-transform: uppercase
.u-text-strong
font-weight: bold
&.u-text-label--dark
display: inline-block
background: $color-dark
box-shadow: inset 1px 1px 1px rgba($color-front, 0.25)
color: $color-back
padding: 0 0.75rem
margin: 1.5rem 0 0 2rem
border-radius: 2px
.u-code-regular
font: normal normal 1.3rem/#{2} $font-code
.u-text-tag
display: inline-block
font: 600 1.1rem/#{1} $font-code
background: $color-theme
color: $color-back
padding: 0.15em 0.25em
border-radius: 2px
text-transform: uppercase
vertical-align: middle
.u-code-small
font: normal normal 0.85em $font-code
line-height: inherit
.u-link
color: $color-theme
border-bottom: 1px solid
&.u-text-tag--spaced
margin-left: 0.75em
//- Headings
.u-heading
margin-bottom: 2rem
@include breakpoint(max, md)
word-wrap: break-word
&:not(:first-child)
padding-top: 3.5rem
.u-heading-0
font: normal bold 7rem/#{1} $font-primary
@each $level, $size in (1: 5.5, 2: 3, 3: 2.6, 4: 2, 5: 1.8)
@each $level, $size in $headings
.u-heading-#{$level}
font: normal bold #{$size}rem/#{1.25} $font-primary
margin-bottom: 2rem
.u-heading-label
@extend .u-text-label
margin-bottom: 1rem
//- Permalinks
//- Links
.u-link
color: $color-theme
border-bottom: 1px solid
.u-permalink
position: relative
&:target
display: inline-block
padding-top: $nav-height * 1.5
padding-top: $nav-height * 1.25
& + *
margin-top: $nav-height * 1.5
margin-top: $nav-height * 1.25
.u-permalink__icon
@include position(absolute, bottom, left, 0.25em, -3.25rem)
@include size(2rem)
@include position(absolute, bottom, left, 0.35em, -2.75rem)
@include size(1.5rem)
color: $color-subtle
.u-permalink:hover &
@ -89,46 +93,56 @@
.u-text-center
text-align: center
.u-float-right
float: right
.u-text-right
text-align: right
.u-padding
padding: 5rem
.u-padding-small
padding: 0.5em 0.75em
.u-padding-medium
padding: 2rem
padding: 2.5rem
.u-padding
padding: 5rem
.u-inline-block
display: inline-block
.u-no-border
border: none
.u-border
border: 1px solid $color-subtle
border-radius: 3px
.u-border-top
border-top: 1px solid $color-subtle
border-radius: 2px
.u-border-bottom
border-bottom: 1px solid $color-subtle
border: 1px solid $color-subtle
.u-color-theme
color: $color-theme
.u-border-dotted
border-top: 1px dotted $color-subtle
.u-color-subtle
color: $color-subtle-dark
@each $name, $color in (theme: $color-theme, subtle: $color-subtle-dark, light: $color-back, red: $color-red, green: $color-green, yellow: $color-yellow)
.u-color-#{$name}
color: $color
.u-text-shadow
text-shadow: 2px 2px $color-theme-dark
.u-grayscale
filter: grayscale(100%)
transition: filter 0.15s ease
user-select: none
&:hover
filter: none
.u-pattern
background: $color-theme url("/assets/img/pattern.jpg")
color: $color-back
background: $pattern
//- Hidden elements
.u-hidden
display: none
@each $breakpoint in (sm, md)
.u-hidden-#{$breakpoint}
@each $breakpoint in (xs, sm, md)
.u-hidden-#{$breakpoint}.u-hidden-#{$breakpoint}
@include breakpoint(max, $breakpoint)
display: none

View File

@ -1,37 +1,41 @@
//- ----------------------------------
//- 💫 COMPONENTS > ASIDES
//- ----------------------------------
//- 💫 CSS > COMPONENTS > ASIDES
//- Aside
//- Aside container
.c-aside
@include breakpoint(min, md)
@include position(absolute, top, left, 0, calc(100% + #{$aside-padding}))
border-left: 1px solid $color-subtle
opacity: 0.5
transition: opacity 0.25s ease
padding: 0 $aside-padding
width: $aside-width
position: relative
&:hover
opacity: 1
//- Aside content
.c-aside__content
background: $color-front
z-index: 10
@include breakpoint(min, md)
@include position(absolute, top, left, -3rem, calc(100% + 5.5rem))
width: calc(#{$aside-width} + 2rem)
// Banner effect
&:after
$triangle-size: 2rem
@include position(absolute, bottom, left, -$triangle-size / 2, 0)
@include size(0)
border-color: transparent
border-style: solid
border-top-color: $color-dark
border-width: $triangle-size / 2 0 0 $triangle-size
content: ""
@include breakpoint(max, sm)
display: block
margin: type(5) 0
margin: 2rem 0
//- Aside label
//- Aside text
.c-aside__label
display: block
margin-bottom: 1rem
// Aside container
.has-aside
position: relative
&:hover > .c-aside
opacity: 1
.c-aside__text
color: $color-back
padding: 1.5rem 2.5rem 3rem 2rem

View File

@ -1,23 +1,23 @@
//- ----------------------------------
//- 💫 COMPONENTS > BUTTONS
//- ----------------------------------
//- 💫 CSS > COMPONENTS > BUTTONS
.c-button
display: inline-block
font-weight: bold
padding: 0.5em 0.75em
padding: 0.75em 1em
border: 2px solid
border-radius: 3px
transition: opacity 0.25s ease
&:hover
opacity: 0.8
border-radius: 2px
text-align: center
transition: background 0.25s ease
&.c-button--primary
background: $color-theme
color: $color-back
border-color: $color-theme
&:hover
background: $color-theme-dark
border-color: $color-theme-dark
&.c-button--secondary
background: $color-back
color: $color-theme

View File

@ -1,52 +1,40 @@
//- ----------------------------------
//- 💫 COMPONENTS > CODE
//- ----------------------------------
//- 💫 CSS > COMPONENTS > CODE
//- Code block
.c-code-block
background: $color-subtle-light
padding: 1em 0
border-left: 5px solid $color-theme
background: $color-front
color: $color-back
padding: 0.75em 0
border-radius: 2px
overflow: auto
width: 100%
max-width: 100%
white-space: pre
direction: ltr
:not(.o-block)
margin-bottom: 2rem
//- Code block content
.c-code-block__content
display: block
padding: 2em 2.5em
//- Code block label
.c-code-block__label
display: inline-block
background: $color-theme
color: $color-back
padding: 1rem
margin-bottom: 1.5rem
font: normal normal 1.1rem/#{2} $font-code
padding: 1em 2em
//- Inline code
:not(.c-code-block) > code
@extend .u-code-small
*:not(.c-code-block) > code
font: normal 600 0.8em/#{1} $font-code
background: $color-subtle-light
box-shadow: 1px 1px 0 $color-subtle
color: $color-front
padding: 0.15em 0.5em
margin: 0 0.25em
border-radius: 2px
text-shadow: 1px 1px 0 $color-back
padding: 0.1em 0.5em
margin: 0
border-radius: 1px
.c-aside__content &
background: $color-dark
color: $color-back
//- Syntax Highlighting

View File

@ -0,0 +1,20 @@
//- 💫 CSS > COMPONENTS > LANDING
.c-landing
background: $color-theme
padding-top: 5rem
width: 100%
.c-landing__wrapper
background: $pattern
padding-bottom: 6rem
width: 100%
.c-landing__content
background: $pattern-overlay
width: 100%
min-height: 573px
.c-landing__title
color: $color-back
text-align: center

View File

@ -1,6 +1,4 @@
//- ----------------------------------
//- 💫 COMPONENTS > LISTS
//- ----------------------------------
//- 💫 CSS > COMPONENTS > LISTS
//- List Container
@ -17,16 +15,22 @@
.c-list__item
padding-left: 2rem
margin-bottom: 1em
margin-bottom: 0.5em
margin-left: 1.25rem
&:before
content: '\25CF'
display: inline-block
font-size: 1.25em
font-size: 1em
font-weight: bold
padding-right: 1.25rem
margin-left: -3.75rem
text-align: right
width: 2.5rem
counter-increment: li
//- List icon
.c-list__icon
margin-right: 1rem

View File

@ -1,11 +1,11 @@
//- ----------------------------------
//- 💫 COMPONENTS > MISC
//- ----------------------------------
//- 💫 CSS > COMPONENTS > MISC
.x-terminal
background: $color-subtle
background: $color-subtle-light
color: $color-front
border-radius: 10px
padding: 4px
border: 1px dotted $color-subtle
border-radius: 5px
width: 100%
.x-terminal__icons
@ -23,22 +23,20 @@
&:before
content: ""
background: #e4514f
background: $color-red
span
background: #3ec930
background: $color-green
&:after
content: ""
background: #f4c025
background: $color-yellow
.x-terminal__code
background: $color-front
color: $color-back
margin: 0
border: none
border-bottom-left-radius: 10px
border-bottom-right-radius: 10px
border-bottom-left-radius: 5px
border-bottom-right-radius: 5px
width: 100%
max-width: 100%
white-space: pre-wrap

View File

@ -1,29 +1,26 @@
//- ----------------------------------
//- 💫 COMPONENTS > NAVIGATION
//- ----------------------------------
//- 💫 CSS > COMPONENTS > NAVIGATION
.c-nav
@include position(absolute, top, left, 0, 0)
@include size(100%, $nav-height)
align-items: center
background: $color-back
border-color: $color-back
color: $color-theme
align-items: center
display: flex
justify-content: space-between
padding: 0 2rem
z-index: 10
padding: 0 2rem 0 1rem
z-index: 20
width: 100%
border-bottom: 1px solid $color-subtle
&.c-nav--theme
background: $color-theme
color: $color-back
border-bottom: none
&.is-fixed
animation: slideInDown 0.5s ease-in-out
position: fixed
background: $color-theme
color: $color-back
border-color: $color-theme
@include breakpoint(min, sm)
height: $nav-height * 0.8
.c-nav__menu
@include size(100%)
@ -36,17 +33,7 @@
display: flex
align-items: center
height: 100%
text-transform: uppercase
&:not(:last-child)
margin-right: 1em
&.is-active
position: relative
font-weight: bold
border-color: inherit
&:after
$triangle: 8px
@include triangle-down($triangle)
@include position(absolute, top, left, 100%, calc(50% - #{$triangle}))

View File

@ -1,40 +1,40 @@
//- ----------------------------------
//- 💫 COMPONENTS > SIDEBAR
//- ----------------------------------
//- 💫 CSS > COMPONENTS > SIDEBAR
//- Sidebar container
.c-sidebar
@include breakpoint(min, md)
flex: 0 0 $sidebar-width
margin-right: 6rem
margin-left: 4rem
padding-top: $nav-height
width: $sidebar-width
background: $color-subtle-light
overflow-y: auto
&.is-fixed .c-sidebar__body
@include position(fixed, top, left, $nav-height, 4rem)
@include size($sidebar-width, calc(100vh - #{$nav-height}))
overflow: auto
transition: none
@include breakpoint(min, md)
@include position(fixed, top, left, 0, 0)
@include size($sidebar-width, 100vh)
flex: 0 0 $sidebar-width
padding: calc(#{$nav-height} + 1.5rem) 2rem 2rem
z-index: 10
border-right: 1px solid $color-subtle
@include breakpoint(max, sm)
flex: 100%
width: 100%
.c-sidebar__body
display: flex
flex-flow: row wrap
width: 100%
& > *
flex: 1 1 0
padding: 1rem
border-bottom: 1px solid $color-subtle
margin-top: $nav-height
display: flex
flex-flow: row wrap
width: 100%
&:not(:last-child)
border-right: 1px solid $color-subtle
//- Sidebar section
.c-sidebar__section
@include breakpoint(max, sm)
flex: 1 1 0
padding: 1.25rem
border-bottom: 1px solid $color-subtle
margin: 0
&:not(:last-child)
border-right: 1px solid $color-subtle
.c-sidebar__body
.is-active
font-weight: bold
color: $color-theme

View File

@ -1,44 +1,68 @@
//- ----------------------------------
//- 💫 COMPONENTS > TABLES
//- ----------------------------------
//- 💫 CSS > COMPONENTS > TABLES
// Shadows adapted from "CSS only Responsive Tables" by David Bushell
// http://codepen.io/dbushell/pen/wGaamR
//- Table Container
//- Table container
.c-table
vertical-align: top
@include breakpoint(max, md)
//- Table row
.c-table__row
&:nth-child(odd)
background: lighten($color-subtle-light, 2)
&.c-table__row--foot
background: $color-subtle-light
border-top: 2px solid $color-theme
.c-table__cell:first-child
@extend .u-text-label
color: $color-theme
//- Table cell
.c-table__cell
padding: 1rem
&:not(:last-child)
border-right: 1px solid $color-subtle
//- Table head cell
.c-table__head-cell
font-weight: bold
color: $color-theme
background: $color-back
padding: 1rem 0.5rem
border-bottom: 2px solid $color-theme
//- Responsive table
//- Shadows adapted from "CSS only Responsive Tables" by David Bushell
//- http://codepen.io/dbushell/pen/wGaamR
@include breakpoint(max, md)
.c-table
@include scroll-shadow-base($color-front)
display: inline-block
overflow-x: auto
width: auto
-webkit-overflow-scrolling: touch
//- Table Cell
.c-table__cell
padding: 1rem
border: 1px solid $color-subtle
&.c-table__cell--highlight
border: 2px solid $color-theme
@include breakpoint(max, md)
.c-table__cell,
.c-table__head-cell
&:first-child
@include scroll-shadow-cover(left, $color-back)
&:last-child
@include scroll-shadow-cover(right, $color-back)
.c-table__row--foot .c-table__cell
&:first-child
@include scroll-shadow-cover(left, lighten($color-subtle-light, 2))
//- Table Head Cell
.c-table__head-cell
background: $color-theme
color: $color-back
padding: 1rem
border: 1px solid $color-theme
&:last-child
@include scroll-shadow-cover(right, lighten($color-subtle-light, 2))

View File

@ -1,6 +1,4 @@
//- ----------------------------------
//- 💫 MIXINS
//- ----------------------------------
//- 💫 CSS > MIXINS
// Helper for position
// $position - valid position value (static, absolute, fixed, relative)
@ -38,18 +36,6 @@
@content
// Triangle pointing down
// $triangle-size - width of the triangle
@mixin triangle-down($triangle-size)
@include size(0)
border-color: transparent
border-style: solid
border-top-color: inherit
border-width: $triangle-size $triangle-size 0 $triangle-size
content: ""
// Scroll shadows for reponsive tables
// adapted from David Bushell, http://codepen.io/dbushell/pen/wGaamR
// $scroll-shadow-color - color of shadow

View File

@ -1,20 +1,20 @@
//- ----------------------------------
//- 💫 VARIABLES
//- ----------------------------------
//- 💫 CSS > VARIABLES
// Settings and Sizes
$type-base: 11px
$nav-height: 55px
$content-width: 800px
$sidebar-width: 230px
$aside-width: 300px
$nav-height: 45px
$content-width: 1250px
$sidebar-width: 200px
$aside-width: 500px
$aside-padding: 25px
$logo-sizes: ( large: 500px, medium: 250px, small: 100px, tiny: 65px )
$grid: ( third: 3, half: 2, two-thirds: 1.5 )
$logo-width: 85px
$grid: ( quarter: 4, third: 3, half: 2, two-thirds: 1.5, three-quarters: 1.33 )
$breakpoints: ( sm: 768px, md: 992px, lg: 1200px )
$headings: (1: 3, 2: 2.6, 3: 2, 4: 1.8, 5: 1.5)
// Fonts
@ -25,13 +25,24 @@ $font-code: 'Source Code Pro', Consolas, 'Andale Mono', Menlo, Monaco, Courier,
// Colors
$color-theme: #09a3d5
$color-theme-dark: #008ebc
$color-back: #fff
$color-front: #222
$colors: ( blue: #09a3d5, red: #d9515d )
$color-subtle: #ddd
$color-subtle-light: #f6f6f6
$color-subtle-dark: #999
$color-back: #fff !default
$color-front: #1a1e23 !default
$color-dark: lighten($color-front, 20) !default
$syntax-highlighting: ( comment: #999, tag: #3ec930, number: #8130c9, selector: #09a3d5, operator: #e4514f, function: #09a3d5, keyword: #e4514f, regex: #f4c025 )
$color-theme: map-get($colors, $theme)
$color-theme-dark: darken(map-get($colors, $theme), 5)
$color-subtle: #ddd !default
$color-subtle-light: #f6f6f6 !default
$color-subtle-dark: #949e9b !default
$color-red: #d9515d
$color-green: #3ec930
$color-yellow: #f4c025
$syntax-highlighting: ( comment: #949e9b, tag: #3ec930, number: #B084EB, selector: #FFB86C, operator: #FF2C6D, function: #09a3d5, keyword: #45A9F9, regex: #f4c025 )
$pattern: $color-theme url("/assets/img/pattern_#{$theme}.jpg") center top repeat
$pattern-overlay: transparent url("/assets/img/pattern_landing.jpg") center -138px no-repeat

View File

@ -1,6 +1,6 @@
//- ----------------------------------
//- 💫 STYLE
//- ----------------------------------
//- 💫 STYLESHEET
$theme: blue !default
// Variables
@ -25,6 +25,7 @@
@import _components/asides
@import _components/buttons
@import _components/code
@import _components/landing
@import _components/lists
@import _components/misc
@import _components/navigation

View File

@ -0,0 +1,4 @@
//- 💫 STYLESHEET (RED)
$theme: red
@import style

View File

@ -0,0 +1,68 @@
<svg style="position: absolute; width: 0; height: 0;" width="0" height="0" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<symbol id="brain" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>brain</title>
<path stroke-width="4" stroke-miterlimit="10" fill="none" stroke="currentColor" d="M187.2 76.1h-5c-1.6 0-2.9-1.3-2.9-2.9V62.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 3-2.9 3zM221.1 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM221.1 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM263.2 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM191.5 54.3L207.8 34M195.5 61.1l12.3-4M191.5 80.1l16.3 20.4M195.5 73.3l12.3 4.1M236 39.1l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6L243.4 98c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L232 58.8c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1 0-6.3 4.7-3.7 7.8z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M96.1 124.1H63v-11.7c0-12.6-3.7-25-10.7-35.5l-5.9-8.8c-3.2-4.8-4.9-10.4-4.9-16.1 0-22.3 18.1-40.4 40.4-40.4 17.6 0 33.1 11.4 38.5 28.1l10.8 33.8h-11v16.9c0 3.7-3 6.7-6.7 6.7h-12v12.3H77.3V90.2c0-.8-.2-1.6-.5-2.3l-4.5-11.3c-1.7-4.1 1.4-8.6 5.8-8.6 2 0 4-1 5.1-2.7L91.8 53h15.6c0-14-11.3-25.3-25.3-25.3h-.3c-14 0-25.3 11.3-25.3 25.3v1c0 4 3.2 7.2 7.2 7.2 2.4 0 4.6-1.2 6-3.2l11.2-16.8h10.8M139 68.7h29.4"
/>
</symbol>
<symbol id="computer" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>computer</title>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M56.2 87.7h-5c-1.6 0-2.9-1.3-2.9-2.9V74.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM90.1 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM90.1 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM132.2 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM132.2 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM60.5 66l16.3-20.3M64.5 72.8l12.3-4.1M60.5 91.8l16.3 20.4M64.5 85l12.3 4.1M105 50.8l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6l11.4 13.6c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L101 70.5c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1-.1-6.3 4.7-3.7 7.8z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M195.1 42.4h49v40.5h-49z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M251.9 116.7h-64.6c-2.2 0-4-1.8-4-4V34.6c0-2.2 1.8-4 4-4h64.6c2.2 0 4 1.8 4 4v78.1c0 2.2-1.8 4-4 4z" />
<path fill="currentColor" d="M191.8 103.2h6.8v6.8h-6.8zM235.6 91.3v3.4h-21.9v5.1h21.9v3.4h11.9V91.3" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M245.5 130.2h-51.8c-3.9 0-7-3.1-7-7v-6.5h65.8v6.5c0 3.8-3.1 7-7 7zM146 79.6h25.3" />
</symbol>
<symbol id="eye" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>eye</title>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M40.7 37.1h95.7v71.7H40.7z" />
<path fill="currentColor" d="M30.4 43.9h10.2v13.7H30.4z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M30.4 64.4h10.2v13.7H30.4zM30.4 88.3h10.2V102H30.4zM146 59.3h-9.7V45.6h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.2-2.5 5.7-5.7 5.7zM146 96.9h-9.7V83.2h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.1-2.5 5.7-5.7 5.7zM59.5 108.8v15.4M117.5 108.8v15.4M40.7 98.3h72V70.6H125M40.7 50.8h53.6M55.3 68.2h10.8v8.7H55.3zM74.7 68.2h10.8v8.7H74.7z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M101.3 77h-3.6c-2 0-3.6-1.6-3.6-3.6v-1.5c0-2 1.6-3.6 3.6-3.6h3.6c2 0 3.6 1.6 3.6 3.6v1.5c0 1.9-1.6 3.6-3.6 3.6zM40.7 88.3h58.8v-7M80.1 88.3v-7M60.7 88.3v-7M80.1 61.7V50.8M60.7 50.8v10.9M104.1 47.8c2.8 5.1-2.4 10.3-7.6 7.6-.7-.4-1.3-1-1.7-1.7-2.8-5.1 2.4-10.3 7.6-7.6.7.4 1.3 1 1.7 1.7z"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M104.9 50.8H125V37.1M136.3 90H123" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
<path fill="currentColor" d="M249.5 73c-4.4 0-8-3.6-8-8 0-2.3 1-4.3 2.5-5.8-.8-.1-1.6-.2-2.5-.2-7.8 0-14 6.3-14 14 0 7.8 6.3 14 14 14s14-6.3 14-14c0-.9-.1-1.7-.2-2.5-1.4 1.5-3.5 2.5-5.8 2.5z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M195 73h-36.7" />
</symbol>
<symbol id="bubble" viewBox="0 0 300 150">
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
<title>bubble</title>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M153.5 69h-32.2M88.2 68.6c1.2 9.2-6.6 17-15.8 15.8-6.3-.8-11.4-5.9-12.2-12.2C59 63 66.8 55.2 76 56.4c6.3.9 11.4 5.9 12.2 12.2z" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M88.2 68.6c1.2 9.2-6.6 17-15.8 15.8-6.3-.8-11.4-5.9-12.2-12.2C59 63 66.8 55.2 76 56.4c6.3.9 11.4 5.9 12.2 12.2z" />
<path fill="currentColor" d="M77.7 70.5c-1.9 0-3.5-1.6-3.5-3.5 0-1 .4-1.9 1.1-2.5-.4-.1-.7-.1-1.1-.1-3.4 0-6.2 2.8-6.2 6.2 0 3.4 2.8 6.2 6.2 6.2s6.2-2.8 6.2-6.2c0-.4 0-.7-.1-1.1-.7.5-1.6 1-2.6 1z" />
<path d="M43.9 38.3h60.5v62.6H43.9z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" />
<path d="M43.9 112.3c0 5.8 4.7 10.5 10.5 10.5h39.4c5.8 0 10.5-4.7 10.5-10.5v-11.4H43.9v11.4zM93.9 20.2H54.5c-5.8 0-10.5 4.7-10.5 10.5v7.6h60.5v-7.6c-.1-5.8-4.8-10.5-10.6-10.5z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10"
/>
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M79.3 110.3c.8 3.8-2.5 7.1-6.3 6.3-1.9-.4-3.5-2-3.9-3.9-.8-3.8 2.5-7.1 6.3-6.3 1.9.5 3.4 2 3.9 3.9zM69.3 30.1h9.8" />
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M264 41h-93c-2.3 0-4.2 1.9-4.2 4.2v42.3c0 2.3 1.9 4.2 4.2 4.2h7v22.5l22.5-22.5H264c2.3 0 4.2-1.9 4.2-4.2V45.2c0-2.3-1.9-4.2-4.2-4.2z" />
<path fill="currentColor" d="M183.4 53.8c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.5.2 1 .6 1.2 1.2zM189.4 52.2h16.9v5.6h-16.9zM211.9 52.2h33.8v5.6h-33.8zM178.1 74.8h5.6v5.6h-5.6zM189.4 74.8h33.8v5.6h-33.8zM240.1 74.8H257v5.6h-16.9zM251.3 52.2h5.6v5.6h-5.6zM178.1 63.5h22.5v5.6h-22.5zM217.5 63.5h12.7v5.6h-12.7zM234.4 63.5h22.5v5.6h-22.5zM209.2 69.1h-.3c-1.5 0-2.7-1.2-2.7-2.7v-.3c0-1.5 1.2-2.7 2.7-2.7h.3c1.5 0 2.7 1.2 2.7 2.7v.3c0 1.5-1.2 2.7-2.7 2.7zM234.1 76.3c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.6.2 1 .7 1.2 1.2z"
/>
</symbol>
<symbol id="spacy" viewBox="0 0 675 215">
<title>spacy</title>
<path fill="currentColor" d="M83.6 83.3C68.3 81.5 67.2 61 47.5 62.8c-9.5 0-18.4 4-18.4 12.7 0 13.2 20.3 14.4 32.5 17.7 20.9 6.3 41 10.7 41 33.3 0 28.8-22.6 38.8-52.4 38.8-24.9 0-50.2-8.9-50.2-31.8 0-6.4 6.1-11.3 12-11.3 7.5 0 10.1 3.2 12.7 8.4 5.8 10.2 12.3 15.6 28.3 15.6 10.2 0 20.6-3.9 20.6-12.7 0-12.6-12.8-15.3-26.1-18.4-23.5-6.6-43.6-10-46-36.1C-1 34.5 91.7 32.9 97 71.9c.1 7.1-6.5 11.4-13.4 11.4zm110.2-39c32.5 0 51 27.2 51 60.8 0 33.7-17.9 60.8-51 60.8-18.4 0-29.8-7.8-38.1-19.8v44.5c0 13.4-4.3 19.8-14.1 19.8-11.9 0-14.1-7.6-14.1-19.8V61.3c0-10.6 4.4-17 14.1-17 9.1 0 14.1 7.2 14.1 17v3.6c9.2-11.6 19.7-20.6 38.1-20.6zm-7.7 98.4c19.1 0 27.6-17.6 27.6-38.1 0-20.1-8.6-38.1-27.6-38.1-19.8 0-29 16.3-29 38.1 0 21.2 9.2 38.1 29 38.1zM266.9 76c0-23.4 26.9-31.7 52.9-31.7 36.6 0 51.7 10.7 51.7 46v34c0 8.1 5 24.1 5 29 0 7.4-6.8 12-14.1 12-8.1 0-14.1-9.5-18.4-16.3-11.9 9.5-24.5 16.3-43.8 16.3-21.3 0-38.1-12.6-38.1-33.3 0-18.4 13.2-28.9 29-32.5 0 .1 51-12 51-12.1 0-15.7-5.5-22.6-22-22.6-14.5 0-21.9 4-27.5 12.7-4.5 6.6-4 10.6-12.7 10.6-6.9-.1-13-4.9-13-12.1zm43.6 70.2c22.3 0 31.8-11.8 31.8-35.3v-5c-6 2-30.3 8-36.8 9.1-7 1.4-14.1 6.6-14.1 14.9.1 9.1 9.4 16.3 19.1 16.3zM474.5 0c31.5 0 65.7 18.8 65.7 48.8 0 7.7-5.8 14.1-13.4 14.1-10.3 0-11.8-5.5-16.3-13.4-7.6-13.9-16.5-23.3-36.1-23.3-30.2-.2-43.7 25.6-43.7 57.8 0 32.4 11.2 55.8 42.4 55.8 20.7 0 32.2-12 38.1-27.6 2.4-7.1 6.7-14.1 15.6-14.1 7 0 14.1 7.2 14.1 14.8 0 31.8-32.4 53.8-65.8 53.8-36.5 0-57.2-15.4-68.5-41-5.5-12.2-9.1-24.9-9.1-42.4-.1-49.2 28.6-83.3 77-83.3zm180.3 44.3c8 0 12.7 5.2 12.7 13.4 0 3.3-2.6 9.9-3.6 13.4L625.1 173c-8.6 22.1-15.1 37.4-44.5 37.4-14 0-26.1-1.2-26.1-13.4 0-7 5.3-10.6 12.7-10.6 1.4 0 3.6.7 5 .7 2.1 0 3.6.7 5 .7 14.7 0 16.8-15.1 22-25.5l-37.4-92.6c-2.1-5-3.6-8.4-3.6-11.3 0-8.2 6.4-14.1 14.8-14.1 9.5 0 13.3 7.5 15.6 15.6l24.7 73.5L638 65.5c3.9-10.5 4.2-21.2 16.8-21.2z" />
</symbol>
<symbol id="explosion" viewBox="0 0 500 500">
<title>explosion</title>
<path fill="currentColor" d="M111.7 74.9L91.2 93.1l9.1 10.2 17.8-15.8 7.4 8.4-17.8 15.8 10.1 11.4 20.6-18.2 7.7 8.7-30.4 26.9-41.9-47.3 30.3-26.9 7.6 8.6zM190.8 59.6L219 84.3l-14.4 4.5-20.4-18.2-6.4 26.6-14.4 4.5 8.9-36.4-26.9-24.1 14.3-4.5L179 54.2l5.7-25.2 14.3-4.5-8.2 35.1zM250.1 21.2l27.1 3.4c6.1.8 10.8 3.1 14 7.2 3.2 4.1 4.5 9.2 3.7 15.5-.8 6.3-3.2 11-7.4 14.1-4.1 3.1-9.2 4.3-15.3 3.5L258 63.2l-2.8 22.3-13-1.6 7.9-62.7zm11.5 13l-2.2 17.5 12.6 1.6c5.1.6 9.1-2 9.8-7.6.7-5.6-2.5-9.2-7.6-9.9l-12.6-1.6zM329.1 95.4l23.8 13.8-5.8 10L312 98.8l31.8-54.6 11.3 6.6-26 44.6zM440.5 145c-1.3 8.4-5.9 15.4-13.9 21.1s-16.2 7.7-24.6 6.1c-8.4-1.6-15.3-6.3-20.8-14.1-5.5-7.9-7.6-16-6.4-24.4 1.3-8.5 6-15.5 14-21.1 8-5.6 16.2-7.7 24.5-6 8.4 1.6 15.4 6.3 20.9 14.2 5.5 7.6 7.6 15.7 6.3 24.2zM412 119c-5.1-.8-10.3.6-15.6 4.4-5.2 3.7-8.4 8.1-9.4 13.2-1 5.2.2 10.1 3.5 14.8 3.4 4.8 7.5 7.5 12.7 8.2 5.2.8 10.4-.7 15.6-4.4 5.3-3.7 8.4-8.1 9.4-13.2 1.1-5.1-.1-9.9-3.4-14.7-3.4-4.8-7.6-7.6-12.8-8.3zM471.5 237.9c-2.8 4.8-7.1 7.6-13 8.7l-2.6-13.1c5.3-.9 8.1-5 7.2-11-.9-5.8-4.3-8.8-8.9-8.2-2.3.3-3.7 1.4-4.5 3.3-.7 1.9-1.4 5.2-1.7 10.1-.8 7.5-2.2 13.1-4.3 16.9-2.1 3.9-5.7 6.2-10.9 7-6.3.9-11.3-.5-15.2-4.4-3.9-3.8-6.3-9-7.3-15.7-1.1-7.4-.2-13.7 2.6-18.8 2.8-5.1 7.4-8.2 13.7-9.2l2.6 13c-5.6 1.1-8.7 6.6-7.7 13.4 1 6.6 3.9 9.5 8.6 8.8 4.4-.7 5.7-4.5 6.7-14.1.3-3.5.7-6.2 1.1-8.4.4-2.2 1.2-4.4 2.2-6.8 2.1-4.7 6-7.2 11.8-8.1 5.4-.8 10.3.4 14.5 3.7 4.2 3.3 6.9 8.5 8 15.6.9 6.9-.1 12.6-2.9 17.3zM408.6 293.5l2.4-12.9 62 11.7-2.4 12.9-62-11.7zM419.6 396.9c-8.3 2-16.5.3-24.8-5-8.2-5.3-13.2-12.1-14.9-20.5-1.6-8.4.1-16.6 5.3-24.6 5.2-8.1 11.9-13.1 20.2-15.1 8.4-1.9 16.6-.3 24.9 5 8.2 5.3 13.2 12.1 14.8 20.5 1.7 8.4 0 16.6-5.2 24.7-5.2 8-12 13-20.3 15zm13.4-36.3c-1.2-5.1-4.5-9.3-9.9-12.8s-10.6-4.7-15.8-3.7-9.3 4-12.4 8.9-4.1 9.8-2.8 14.8c1.2 5.1 4.5 9.3 9.9 12.8 5.5 3.5 10.7 4.8 15.8 3.7 5.1-.9 9.2-3.8 12.3-8.7s4.1-9.9 2.9-15zM303.6 416.5l9.6-5.4 43.3 20.4-19.2-34 11.4-6.4 31 55-9.6 5.4-43.4-20.5 19.2 34.1-11.3 6.4-31-55zM238.2 468.8c-49 0-96.9-17.4-134.8-49-38.3-32-64-76.7-72.5-125.9-2-11.9-3.1-24-3.1-35.9 0-36.5 9.6-72.6 27.9-104.4 2.1-3.6 6.7-4.9 10.3-2.8 3.6 2.1 4.9 6.7 2.8 10.3-16.9 29.5-25.9 63.1-25.9 96.9 0 11.1 1 22.3 2.9 33.4 7.9 45.7 31.8 87.2 67.3 116.9 35.2 29.3 79.6 45.5 125.1 45.5 11.1 0 22.3-1 33.4-2.9 4.1-.7 8 2 8.7 6.1.7 4.1-2 8-6.1 8.7-11.9 2-24 3.1-36 3.1z"/>
</symbol>
<symbol id="matt-signature" viewBox="0 0 500 250">
<title>matt-signature</title>
<path fill="currentColor" d="M18.6 207c-.3-18.8-.8-37.5-1.4-56.2-.6-18.7-1-37.5-1-56.2v-7.2c0-3.5 0-7 .2-11v-18c.8-2.7 1.8-5 3-6.5 1.6-2 3.6-3 6.4-3 3 0 5.4 1 7.6 2 2.2 2 4 4 5.3 6l36.6 71 1.8 3c1 1 2 3 3 3h1l1 1 1-3 22-76c2-3 3-5 4-8l2-9c1-3 2-6 4-8 1-3 4-5 7-7h2c5 0 8 1 10 4 3 2 4 5 5 9 1 3 2 7 1 12v11l1 7c0 3 0 7 1 12 0 4 1 9 1 14l1 14.2 1 12 .6 6v1l1 7.5 1 11.6 1.4 12 1.4 8 1 4 1.7 5.5 1.7 6c.7 1.7 1 3 1.5 3.6-.5 4-1.5 7-3 9-1 2-4 3-8 3h-6l-3-3c-1-1.4-2-2.3-2-3l-4-14-7.6-58V88c0-3.5-1-7-2-10l-2 1.7-18 74v6c0 2-.2 4-1 6 0 2-1 3.5-3 5-1 1.3-3 2-5 2.2-1 0-2 0-3-1l-3.4-2-3-3c-1-1-1.7-2-2-3l-35-52-5.3-10.6v22c0 10.2.2 20.3.6 30.2.4 10 .6 20 .6 30.2v22c0 2-1 4-3 5.4s-3 3-5 3c-3 0-5 0-7-1-1-1-3-3-4-5zm205-63.2c-1.6 2.7-3.4 6-5.3 9.8l-6.2 12.2c-2 4.3-4 8.6-7 13-2 4.2-5 8.2-8 11.7s-5 6.6-9 9c-3 2.5-6 4-9 4.4-1 0-3-1-4-1l-5-2c-1-1-3-2-4-3s-1-3-1-5c1-18 2-33 4-47s6-27 11-38 12-20 20-27 18-12 29-15l2-1h2c5 0 9 2 11 7s4 12 5 23c1 10 2 24 2 40 1 16 2 36 3 59l1 4v5c0 2.6-1 4.5-2 6s-3 2-5 2c-5 0-8-1.7-10-4s-3-6.6-4-11v-4l-1-9s-1-6.7-1-10l-1-8.5v-1l-.2-6-1-7-.5-8.6-1-1zM218 93.5c-4.7 3.4-9.2 8-13.6 13.7-4.4 5.8-7.5 11.3-9.4 16.8-.8 2.5-1.8 6-2.8 10.4-1 4.4-2 8.8-2.7 13l-2 12-.7 7c.2 0 .4-.2.6-.5l.6-1c10.5-10 18-21 22.2-33 4.6-12 7-25 7.7-39zm72 47c-2.3 0-4.4.6-6.2 1.8-2 1.2-4 1.8-6.6 1.8h-5.4c-.7-1-1.4-1-2.3-2l-2.5-2c-.8 0-1.6-1-2.2-2-.6-1-1-2-1-3 0-2 1-4 3-6 2-1 4.5-3 7.2-4l8.3-3s5-2 6.7-3v-11c0-12-.6-25-1.8-38-1.2-12-1.8-25-1.8-37 0-3 .8-6 2.5-7 1-1 4-1 6-1 3 0 6 1 7 3s2 4 3 7c0 3 1 6 1 9v20l1 18 1 18 1 12 4-1 6-2 6-2 4-1 14-6c4-2.3 9-3.4 14-3.4 3 0 6 1 7 3.5s3 5 3 8c0 2-1 4-3 5l-6 3-46 17-1.5 1s-1 0-1.5 1v8c0 6 0 12 .5 18s1 12.3 2 18.3l3 15c1 5 1.4 10 1.4 15 0 1.4-.6 3.5-1.6 6s-2 4-4.7 4c-5 0-8.7-1.6-11.6-4-3-3-4.3-6.6-4.6-11l-2.2-29-2.7-30h-1zm112 0c-2.4 0-4.5.6-6.3 1.8-2 1.2-4 1.8-6.6 1.8h-5c0-1-1-1-2-2l-2-2c-1 0-1-1-2-2 0-1-1-2-1-3 0-2 1-4 3-6 2-1 5-3 7-4l8-3s5-2 7-3v-11c0-12 0-25-2-38-1-12-1-25-1-37 0-3 1-6 3-7s4-1 7-1c4 0 6 1 8 3s3 4 3 7c1 3 1 6 1 9s0 6 1 8v11l1 18 1 18 1 12 4-1 6-2 6-2 4-1 14-6c4-2 9-4 14-4 4 0 6 1 8 4s3 5 3 8c0 2-1 4-2 5l-5.3 3-49 13.8-1.5 1s-1 .5-1.5 1V157l1 18.3c0 5 1 10 2 15s1 10 1 15c0 1.5-1 3.6-2 6s-3 4-5 4c-5 0-9-1.5-12-4.2s-5-6-5-11l-3-28.3-3-30.3h-1z"/>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -1,29 +1,32 @@
<svg style="position: absolute; width: 0; height: 0;" width="0" height="0" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<symbol id="icon-mail" viewBox="0 0 32 32">
<title>mail</title>
<path class="path1" d="M29 4h-26c-1.657 0-3 1.343-3 3v18c0 1.656 1.343 3 3 3h26c1.657 0 3-1.344 3-3v-18c0-1.657-1.343-3-3-3zM2.741 25.99l-0.731-0.732 8.249-8.248 0.731 0.732-8.249 8.248zM29.259 25.99l-8.249-8.248 0.731-0.732 8.249 8.248-0.731 0.732zM17 19.325v0.675h-2v-0.675l-12.997-12.050 1.272-1.272 12.725 11.798 12.725-11.798 1.272 1.272-12.997 12.050z"></path>
</symbol>
<symbol id="icon-menu" viewBox="0 0 24 24">
<title>menu</title>
<path class="path1" d="M3 5h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293zM3 17h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293zM3 11h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293z"></path>
</symbol>
<symbol id="icon-link" viewBox="0 0 32 32">
<title>link</title>
<path class="path1" d="M13.757 19.868c-0.416 0-0.832-0.159-1.149-0.476-2.973-2.973-2.973-7.81 0-10.783l6-6c1.44-1.44 3.355-2.233 5.392-2.233s3.951 0.793 5.392 2.233c2.973 2.973 2.973 7.81 0 10.783l-2.743 2.743c-0.635 0.635-1.663 0.635-2.298 0s-0.635-1.663 0-2.298l2.743-2.743c1.706-1.706 1.706-4.481 0-6.187-0.826-0.826-1.925-1.281-3.094-1.281s-2.267 0.455-3.094 1.281l-6 6c-1.706 1.706-1.706 4.481 0 6.187 0.635 0.635 0.635 1.663 0 2.298-0.317 0.317-0.733 0.476-1.149 0.476z"></path>
<path class="path2" d="M8 31.625c-2.037 0-3.952-0.793-5.392-2.233-2.973-2.973-2.973-7.81 0-10.783l2.743-2.743c0.635-0.635 1.664-0.635 2.298 0s0.635 1.663 0 2.298l-2.743 2.743c-1.706 1.706-1.706 4.481 0 6.187 0.826 0.826 1.925 1.281 3.094 1.281s2.267-0.455 3.094-1.281l6-6c1.706-1.706 1.706-4.481 0-6.187-0.635-0.635-0.635-1.663 0-2.298s1.663-0.635 2.298 0c2.973 2.973 2.973 7.81 0 10.783l-6 6c-1.44 1.44-3.355 2.233-5.392 2.233z"></path>
</symbol>
<symbol id="icon-github" viewBox="0 0 27 32">
<title>github</title>
<path class="path1" d="M13.714 2.286q3.732 0 6.884 1.839t4.991 4.991 1.839 6.884q0 4.482-2.616 8.063t-6.759 4.955q-0.482 0.089-0.714-0.125t-0.232-0.536q0-0.054 0.009-1.366t0.009-2.402q0-1.732-0.929-2.536 1.018-0.107 1.83-0.321t1.679-0.696 1.446-1.188 0.946-1.875 0.366-2.688q0-2.125-1.411-3.679 0.661-1.625-0.143-3.643-0.5-0.161-1.446 0.196t-1.643 0.786l-0.679 0.429q-1.661-0.464-3.429-0.464t-3.429 0.464q-0.286-0.196-0.759-0.482t-1.491-0.688-1.518-0.241q-0.804 2.018-0.143 3.643-1.411 1.554-1.411 3.679 0 1.518 0.366 2.679t0.938 1.875 1.438 1.196 1.679 0.696 1.83 0.321q-0.696 0.643-0.875 1.839-0.375 0.179-0.804 0.268t-1.018 0.089-1.17-0.384-0.991-1.116q-0.339-0.571-0.866-0.929t-0.884-0.429l-0.357-0.054q-0.375 0-0.518 0.080t-0.089 0.205 0.161 0.25 0.232 0.214l0.125 0.089q0.393 0.179 0.777 0.679t0.563 0.911l0.179 0.411q0.232 0.679 0.786 1.098t1.196 0.536 1.241 0.125 0.991-0.063l0.411-0.071q0 0.679 0.009 1.58t0.009 0.973q0 0.321-0.232 0.536t-0.714 0.125q-4.143-1.375-6.759-4.955t-2.616-8.063q0-3.732 1.839-6.884t4.991-4.991 6.884-1.839zM5.196 21.982q0.054-0.125-0.125-0.214-0.179-0.054-0.232 0.036-0.054 0.125 0.125 0.214 0.161 0.107 0.232-0.036zM5.75 22.589q0.125-0.089-0.036-0.286-0.179-0.161-0.286-0.054-0.125 0.089 0.036 0.286 0.179 0.179 0.286 0.054zM6.286 23.393q0.161-0.125 0-0.339-0.143-0.232-0.304-0.107-0.161 0.089 0 0.321t0.304 0.125zM7.036 24.143q0.143-0.143-0.071-0.339-0.214-0.214-0.357-0.054-0.161 0.143 0.071 0.339 0.214 0.214 0.357 0.054zM8.054 24.589q0.054-0.196-0.232-0.286-0.268-0.071-0.339 0.125t0.232 0.268q0.268 0.107 0.339-0.107zM9.179 24.679q0-0.232-0.304-0.196-0.286 0-0.286 0.196 0 0.232 0.304 0.196 0.286 0 0.286-0.196zM10.214 24.5q-0.036-0.196-0.321-0.161-0.286 0.054-0.25 0.268t0.321 0.143 0.25-0.25z"></path>
</symbol>
<symbol id="icon-twitter" viewBox="0 0 30 32">
<title>twitter</title>
<path class="path1" d="M28.929 7.286q-1.196 1.75-2.893 2.982 0.018 0.25 0.018 0.75 0 2.321-0.679 4.634t-2.063 4.437-3.295 3.759-4.607 2.607-5.768 0.973q-4.839 0-8.857-2.589 0.625 0.071 1.393 0.071 4.018 0 7.161-2.464-1.875-0.036-3.357-1.152t-2.036-2.848q0.589 0.089 1.089 0.089 0.768 0 1.518-0.196-2-0.411-3.313-1.991t-1.313-3.67v-0.071q1.214 0.679 2.607 0.732-1.179-0.786-1.875-2.054t-0.696-2.75q0-1.571 0.786-2.911 2.161 2.661 5.259 4.259t6.634 1.777q-0.143-0.679-0.143-1.321 0-2.393 1.688-4.080t4.080-1.688q2.5 0 4.214 1.821 1.946-0.375 3.661-1.393-0.661 2.054-2.536 3.179 1.661-0.179 3.321-0.893z"></path>
<symbol id="icon-code" viewBox="0 0 20 20">
<title>code</title>
<path class="path1" d="M5.719 14.75c-0.236 0-0.474-0.083-0.664-0.252l-5.060-4.498 5.341-4.748c0.412-0.365 1.044-0.33 1.411 0.083s0.33 1.045-0.083 1.412l-3.659 3.253 3.378 3.002c0.413 0.367 0.45 0.999 0.083 1.412-0.197 0.223-0.472 0.336-0.747 0.336zM14.664 14.748l5.341-4.748-5.060-4.498c-0.413-0.367-1.045-0.33-1.411 0.083s-0.33 1.045 0.083 1.412l3.378 3.003-3.659 3.252c-0.413 0.367-0.45 0.999-0.083 1.412 0.197 0.223 0.472 0.336 0.747 0.336 0.236 0 0.474-0.083 0.664-0.252zM9.986 16.165l2-12c0.091-0.545-0.277-1.060-0.822-1.151-0.547-0.092-1.061 0.277-1.15 0.822l-2 12c-0.091 0.545 0.277 1.060 0.822 1.151 0.056 0.009 0.11 0.013 0.165 0.013 0.48 0 0.904-0.347 0.985-0.835z"></path>
</symbol>
<symbol id="icon-reddit" viewBox="0 0 32 32">
<title>reddit</title>
<path class="path1" d="M19.554 20.839q0.286 0.286 0 0.554-1.107 1.107-3.554 1.107t-3.554-1.107q-0.286-0.268 0-0.554 0.107-0.107 0.268-0.107t0.268 0.107q0.857 0.875 3.018 0.875 2.143 0 3.018-0.875 0.107-0.107 0.268-0.107t0.268 0.107zM14.071 17.607q0 0.661-0.464 1.125t-1.125 0.464-1.134-0.464-0.473-1.125q0-0.679 0.473-1.143t1.134-0.464 1.125 0.473 0.464 1.134zM21.125 17.607q0 0.661-0.473 1.125t-1.134 0.464-1.125-0.464-0.464-1.125 0.464-1.134 1.125-0.473 1.134 0.464 0.473 1.143zM25.607 15.464q0-0.875-0.625-1.5t-1.518-0.625-1.536 0.643q-2.321-1.607-5.554-1.714l1.125-5.054 3.571 0.804q0 0.661 0.464 1.125t1.125 0.464 1.134-0.473 0.473-1.134-0.473-1.134-1.134-0.473q-0.964 0-1.429 0.893l-3.946-0.875q-0.339-0.089-0.446 0.286l-1.232 5.571q-3.214 0.125-5.518 1.732-0.625-0.661-1.554-0.661-0.893 0-1.518 0.625t-0.625 1.5q0 0.625 0.33 1.143t0.884 0.786q-0.107 0.482-0.107 1 0 2.536 2.5 4.339t6.018 1.804q3.536 0 6.036-1.804t2.5-4.339q0-0.571-0.125-1.018 0.536-0.268 0.857-0.777t0.321-1.134zM32 16q0 3.25-1.268 6.214t-3.411 5.107-5.107 3.411-6.214 1.268-6.214-1.268-5.107-3.411-3.411-5.107-1.268-6.214 1.268-6.214 3.411-5.107 5.107-3.411 6.214-1.268 6.214 1.268 5.107 3.411 3.411 5.107 1.268 6.214z"></path>
<symbol id="icon-anchor" viewBox="0 0 16 16">
<title>anchor</title>
<path class="path1" d="M14.779 12.779c-1.471 1.993-4.031 3.245-6.779 3.221-2.748 0.023-5.309-1.229-6.779-3.221l-1.221 1.221v-4h4l-1.1 1.099c0.882 1.46 2.357 2.509 4.1 2.807v-6.047c-1.723-0.446-3-1.997-3-3.858 0-2.209 1.791-4 4-4s4 1.791 4 4c0 1.862-1.277 3.413-3 3.858v6.047c1.742-0.297 3.218-1.347 4.099-2.807l-1.1-1.099h4v4l-1.221-1.221zM10 4c0-1.104-0.895-2-2-2s-2 0.895-2 2c0 1.104 0.895 2 2 2s2-0.896 2-2z"></path>
</symbol>
<symbol id="icon-book" viewBox="0 0 24 24">
<title>book</title>
<path class="path1" d="M18.984 6.984v-1.969h-9.984v1.969h9.984zM15 15v-2.016h-6v2.016h6zM18.984 11.016v-2.016h-9.984v2.016h9.984zM20.016 2.016c1.078 0 1.969 0.891 1.969 1.969v12c0 1.078-0.891 2.016-1.969 2.016h-12c-1.078 0-2.016-0.938-2.016-2.016v-12c0-1.078 0.938-1.969 2.016-1.969h12zM3.984 6v14.016h14.016v1.969h-14.016c-1.078 0-1.969-0.891-1.969-1.969v-14.016h1.969z"></path>
</symbol>
<symbol id="icon-pro" viewBox="0 0 20 20">
<title>pro</title>
<path class="path1" d="M10 1.6c-4.639 0-8.4 3.761-8.4 8.4s3.761 8.4 8.4 8.4 8.4-3.761 8.4-8.4c0-4.639-3.761-8.4-8.4-8.4zM15 11h-4v4h-2v-4h-4v-2h4v-4h2v4h4v2z"></path>
</symbol>
<symbol id="icon-con" viewBox="0 0 20 20">
<title>con</title>
<path class="path1" d="M10 1.6c-4.639 0-8.4 3.761-8.4 8.4s3.761 8.4 8.4 8.4 8.4-3.761 8.4-8.4c0-4.639-3.761-8.4-8.4-8.4zM15 11h-10v-2h10v2z"></path>
</symbol>
<symbol id="icon-neutral" viewBox="0 0 20 20">
<title>neutral</title>
<path class="path1" d="M9.999 0.8c-5.081 0-9.199 4.119-9.199 9.201 0 5.080 4.118 9.199 9.199 9.199s9.2-4.119 9.2-9.199c0-5.082-4.119-9.201-9.2-9.201zM10 13.001c-1.657 0-3-1.344-3-3s1.343-3 3-3c1.656 0 3 1.344 3 3s-1.344 3-3 3z"></path>
</symbol>
</defs>
</svg>

Before

Width:  |  Height:  |  Size: 6.0 KiB

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 644 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 KiB

After

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 217 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 364 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 247 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 252 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 504 KiB

View File

@ -2,54 +2,24 @@
//- 💫 MAIN JAVASCRIPT
//- ----------------------------------
'use strict';
const $ = document.querySelector.bind(document);
const $$ = document.querySelectorAll.bind(document);
'use strict'
{
const updateVh = () => Math.max(document.documentElement.clientHeight, window.innerHeight || 0);
const nav = document.querySelector('.js-nav')
const fixedClass = 'is-fixed'
let vh, scrollY = 0, scrollUp = false
const nav = $('.js-nav');
const sidebar = $('.js-sidebar');
const vhPadding = 525;
let vh = updateVh();
let scrollY = 0;
let scrollUp = false;
const updateVh = () => Math.max(document.documentElement.clientHeight, window.innerHeight || 0)
const updateNav = () => {
const vh = updateVh();
const newScrollY = (window.pageYOffset || document.scrollTop) - (document.clientTop || 0);
scrollUp = newScrollY <= scrollY;
scrollY = newScrollY;
const vh = updateVh()
const newScrollY = (window.pageYOffset || document.scrollTop) - (document.clientTop || 0)
scrollUp = newScrollY <= scrollY
scrollY = newScrollY
if(scrollUp && !(isNaN(scrollY) || scrollY <= vh)) nav.classList.add('is-fixed');
else if(!scrollUp || (isNaN(scrollY) || scrollY <= vh/2)) nav.classList.remove('is-fixed');
if(scrollUp && !(isNaN(scrollY) || scrollY <= vh)) nav.classList.add(fixedClass)
else if(!scrollUp || (isNaN(scrollY) || scrollY <= vh/2)) nav.classList.remove(fixedClass)
}
const updateSidebar = () => {
const sidebar = $('.js-sidebar');
if(sidebar.offsetTop - scrollY <= 0) sidebar.classList.add('is-fixed');
else sidebar.classList.remove('is-fixed');
[...$$('[data-section]')].map(el => {
const trigger = el.getAttribute('data-section');
if(trigger) {
const target = $(`#${trigger}`);
const offset = parseInt(target.offsetTop);
const height = parseInt(target.scrollHeight);
if((offset - scrollY) <= vh/2 && (offset - scrollY) > -height + vhPadding) {
[...$$('[data-section]')].forEach(item => item.classList.remove('is-active'));
$(`[data-section="${trigger}"]`).classList.add('is-active');
}
}
});
}
window.addEventListener('resize', () => vh = updateVh());
window.addEventListener('scroll', updateNav);
if($('.js-sidebar')) window.addEventListener('scroll', updateSidebar);
window.addEventListener('scroll', () => requestAnimationFrame(updateNav))
}

View File

@ -1,10 +0,0 @@
{
"index": {
"title" : "Blog"
},
"announcement" : {
"title": "Important Announcement"
}
}

View File

@ -1,12 +0,0 @@
include ../_includes/_mixins
.u-padding
+label #[+date("2016-08-09")]
p.u-text-large Dear spaCy users,
p.u-text-medium Unfortunately, we (Henning Peters and Matthew Honnibal) are parting ways. Breaking up is never easy, and it's taken us a while to get our stuff together. Hopefully, you didn't notice anything was up — if you did, we hope you haven't been inconvenienced.
p.u-text-medium Here's how this is going to work: Matt will continue to develop and maintain spaCy and all related projects under his name. Nothing will change for you. Henning will take over our legal structure and start a new business under a new name.
p.u-text-medium Sincerely,#[br] Henning Peters and Matthew Honnibal

View File

@ -1,5 +0,0 @@
//- ----------------------------------
//- 💫 BLOG INDEX (REDIRECT)
//- ----------------------------------
script window.location = '!{SITE_URL}'

View File

@ -1,167 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > ANNOTATION SPECS
//- ----------------------------------
+section("annotation")
+h(2, "annotation").
Annotation Specifications
p.
This document describes the target annotations spaCy is trained to predict.
This is currently a work in progress. Please ask questions on the
#[+a("https://github.com/" + SOCIAL.github + "/spaCy/issues") issue tracker],
so that the answers can be integrated here to improve the documentation.
+section("annotation-tokenization")
+h(3, "annotation-tokenization").
Tokenization
p.
Tokenization standards are based on the OntoNotes 5 corpus. The
tokenizer differs from most by including tokens for significant
whitespace. Any sequence of whitespace characters beyond a single
space (' ') is included as a token. For instance:
+code.
from spacy.en import English
nlp = English(parser=False)
tokens = nlp('Some\nspaces and\ttab characters')
print([t.orth_ for t in tokens])
p Which produces:
+code.
['Some', '\n', 'spaces', ' ', 'and', '\t', 'tab', 'characters']
p.
The whitespace tokens are useful for much the same reason punctuation
is it's often an important delimiter in the text. By preserving it
in the token output, we are able to maintain a simple alignment between
the tokens and the original string, and we ensure that no information
is lost during processing.
+section("annotation-sentence-boundary")
+h(3, "annotation-sentence-boundary").
Sentence boundary detection
p.
Sentence boundaries are calculated from the syntactic parse tree, so
features such as punctuation and capitalisation play an important but
non-decisive role in determining the sentence boundaries. Usually
this means that the sentence boundaries will at least coincide with
clause boundaries, even given poorly punctuated text.
+section("annotation-pos-tagging")
+h(3, "annotation-pos-tagging").
Part-of-speech Tagging
p.
The part-of-speech tagger uses the OntoNotes 5 version of the Penn
Treebank tag set. We also map the tags to the simpler Google Universal
POS Tag set. Details #[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tagger.pyx") here].
+section("annotation-lemmatization")
+h(3, "annotation-lemmatization").
Lemmatization
p A "lemma" is the uninflected form of a word. In English, this means:
+list
+item #[strong Adjectives:] The form like "happy", not "happier" or "happiest"
+item #[strong Adverbs:] The form like "badly", not "worse" or "worst"
+item #[strong Nouns:] The form like "dog", not "dogs"; like "child", not "children"
+item #[strong Verbs:] The form like "write", not "writes", "writing", "wrote" or "written"
p.
The lemmatization data is taken from WordNet. However, we also add a
special case for pronouns: all pronouns are lemmatized to the special
token #[code -PRON-].
+section("annotation-dependency")
+h(3, "annotation-dependency").
Syntactic Dependency Parsing
p.
The parser is trained on data produced by the ClearNLP converter.
Details of the annotation scheme can be found
#[+a("http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf") here].
+section("annotation-ner")
+h(3, "annotation-ner").
Named Entity Recognition
+table(["Entity Type", "Description"])
+row
+cell PERSON
+cell People, including fictional.
+row
+cell NORP
+cell Nationalities or religious or political groups.
+row
+cell FAC
+cell Facilities, such as buildings, airports, highways, bridges, etc.
+row
+cell ORG
+cell Companies, agencies, institutions, etc.
+row
+cell GPE
+cell Countries, cities, states.
+row
+cell LOC
+cell Non-GPE locations, mountain ranges, bodies of water.
+row
+cell PRODUCT
+cell Vehicles, weapons, foods, etc. (Not services)
+row
+cell EVENT
+cell Named hurricanes, battles, wars, sports events, etc.
+row
+cell WORK_OF_ART
+cell Titles of books, songs, etc.
+row
+cell LAW
+cell Named documents made into laws
+row
+cell LANGUAGE
+cell Any named language
p The following values are also annotated in a style similar to names:
+table(["Entity Type", "Description"])
+row
+cell DATE
+cell Absolute or relative dates or periods
+row
+cell TIME
+cell Times smaller than a day
+row
+cell PERCENT
+cell Percentage (including “%”)
+row
+cell MONEY
+cell Monetary values, including unit
+row
+cell QUANTITY
+cell Measurements, as of weight or distance
+row
+cell ORDINAL
+cell "first", "second"
+row
+cell CARDINAL
+cell Numerals that do not fall under another type

View File

@ -1,305 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > DOC
//- ----------------------------------
+section("doc")
+h(2, "doc", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/doc.pyx")
| #[+tag class] Doc
p
| A sequence of #[code Token] objects. Access sentences and named entities,
| export annotations to numpy arrays, losslessly serialize to compressed
| binary strings.
+aside.
Internally, the #[code Doc] object holds an array of #[code TokenC] structs.
The Python-level #[code Token] and #[code Span] objects are views of this
array, i.e. they don't own the data themselves.
+code("python", "Overview").
class Doc:
def __init__(self, vocab, orths_and_spaces=None):
return self
def __getitem__(self, int i):
return Token()
def __getitem__(self, slice i_j):
return Span()
def __iter__(self):
yield Token()
def __len__(self):
return int
def __unicode__(self):
return unicode
def __bytes__(self):
return utf8
def __repr__(self):
return unicode
@property
def text(self):
return unicode
@property
def text_with_ws(self):
return unicode
@property
def vector(self):
return numpy.ndarray(dtype='float32')
@property
def vector_norm(self):
return float
@property
def ents(self):
yield Span()
@property
def noun_chunks(self):
yield Span()
@property
def sents(self):
yield Span()
def similarity(self, other):
return float
def merge(self, start_char, end_char, tag, lemma, ent_type):
return None
def to_array(self, attr_ids):
return numpy.ndarray(shape=(len(self), len(attr_ids)), dtype='int64')
def count_by(self, attr_id, exclude=None, counts=None):
return dict
def to_bytes(self):
return bytes
def from_array(self, attrs, array):
return None
def from_bytes(self, data):
return self
@staticmethod
def read_bytes(file_):
yield bytes
+section("doc-init")
+h(3, "doc-init")
| #[+tag method] Doc.__init__
.has-aside
+code("python", "Definition").
def __init__(self, vocab, orths_and_spaces=None):
return Doc
+aside("Implementation").
This method of constructing a #[code Doc] object is usually only used
for deserialization. Standard usage is to construct the document via
a call to the language object.
+table(["Name", "Type", "Description"])
+row
+cell vocab
+cell.
A Vocabulary object, which must match any models you want to
use (e.g. tokenizer, parser, entity recognizer).
+row
+cell orths_and_spaces
+cell.
A list of tokens in the document as a sequence of
#[code (orth_id, has_space)] tuples, where #[code orth_id]
is an integer and #[code has_space] is a boolean, indicating
whether the token has a trailing space.
+section("doc-sequenceapi")
+h(3, "doc-sequenceapi")
| #[+tag Section] Sequence API
+table(["Example", "Description"])
+row
+cell #[code doc[i]]
+cell.
Get the Token object at position i, where i is an integer.
Negative indexing is supported, and follows the usual Python
semantics, i.e. doc[-2] is doc[len(doc) - 2].
+row
+cell #[code doc[start : end]]
+cell.
Get a #[code Span] object, starting at position #[code start]
and ending at position #[code end], where #[code start] and
#[code end] are token indices. For instance,
#[code doc[2:5]] produces a span consisting of
tokens 2, 3 and 4. Stepped slices (e.g. #[code doc[start : end : step]])
are not supported, as #[code Span] objects must be contiguous
(cannot have gaps). You can use negative indices and open-ended
ranges, which have their normal Python semantics.
+row
+cell #[code for token in doc]
+cell.
Iterate over Token objects, from which the annotations can
be easily accessed. This is the main way of accessing Token
objects, which are the main way annotations are accessed from
Python. If faster-than-Python speeds are required, you can
instead access the annotations as a numpy array, or access the
underlying C data directly from Cython.
+row
+cell #[code len(doc)]
+cell.
The number of tokens in the document.
+section("doc-spans")
+h(3, "doc-spans-sents")
| #[+tag property] Doc.sents
p.
Yields sentence #[code Span] objects. Sentence spans have no label.
To improve accuracy on informal texts, spaCy calculates sentence
boundaries from the syntactic dependency parse. If the parser is disabled,
the #[code sents] iterator will be unavailable.
+code("python", "Example").
from spacy.en import English
nlp = English()
doc = nlp("This is a sentence. Here's another...")
assert [s.root.orth_ for s in doc.sents] == ["is", "'s"]
+h(3, "doc-spans-ents")
| #[+tag property] Doc.ents
p.
Yields named-entity #[code Span] objects, if the entity recognizer
has been applied to the document. Iterate over the span to get
individual Token objects, or access the label:
+code("python", "Example").
from spacy.en import English
nlp = English()
tokens = nlp(u'Mr. Best flew to New York on Saturday morning.')
ents = list(tokens.ents)
assert ents[0].label == 346
assert ents[0].label_ == 'PERSON'
assert ents[0].orth_ == 'Best'
assert ents[0].text == 'Mr. Best'
+h(3, "doc-spans-nounchunks")
| #[+tag property] Doc.noun_chunks
p.
Yields base noun-phrase #[code Span] objects, if the document
has been syntactically parsed. A base noun phrase, or
'NP chunk', is a noun phrase that does not permit other NPs to
be nested within it so no NP-level coordination, no prepositional
phrases, and no relative clauses. For example:
+code("python", "Example").
from spacy.en import English
nlp = English()
doc = nlp(u'The sentence in this example has three noun chunks.')
for chunk in doc.noun_chunks:
print(chunk.label_, chunk.orth_, '&lt;--', chunk.root.head.orth_)
+section("doc-exportimport-toarray")
+h(3, "doc-exportimport-toarray")
| #[+tag method] Doc.to_array
p.
Given a list of M attribute IDs, export the tokens to a numpy
#[code ndarray] of shape #[code N*M], where #[code N] is the length
of the document. The values will be 32-bit integers.
+code("python", "Example").
from spacy import attrs
doc = nlp(text)
# All strings mapped to integers, for easy export to numpy
np_array = doc.to_array([attrs.LOWER, attrs.POS, attrs.ENT_TYPE, attrs.IS_ALPHA])
+code("python", "Definition").
def to_array(self, attr_ids):
return numpy.ndarray(shape=(len(self), len(attr_ids)), dtype='int64')
+table(["Name", "Type", "Description"])
+row
+cell attr_ids
+cell list of ints
+cell.
A list of attribute ID ints. Attribute IDs can be imported
from #[code spacy.attrs] or #[code spacy.symbols].
+section("doc-exportimport-countby")
+h(4, "doc-exportimport-countby")
| #[+tag method] Doc.count_by
p.
Produce a dict of #[code {attribute (int): count (ints)}] frequencies,
keyed by the values of the given attribute ID.
+code("python", "Example").
def count_by(self, attr_id):
return dict
+table(["Name", "Type", "Description"])
+row
+cell attr_id
+cell int
+cell.
The attribute ID to key the counts.
+section("doc-exportimport-fromarray")
+h(4, "doc-exportimport-fromarray")
| #[+tag method] Doc.from_array
p Write to a #[code Doc] object, from an M*N array of attributes.
+code("python", "Definition").
def from_array(self, attrs, array):
return None
+section("doc-exportimport-frombytes")
+h(4, "doc-exportimport-frombytes") Doc.from_bytes
p Deserialize, loading from bytes.
+code("python", "Definition").
def from_bytes(self, byte_string):
return Doc
+section("doc-exportimport-tobytes")
+h(4, "doc-exportimport-tobytes")
| #[+tag method] Doc.to_bytes
p Serialize, producing a byte string.
+code("python", "Definition").
def to_bytes(self):
return bytes
+section("doc-exportimport-readbytes")
+h(4, "doc-exportimport-readbytes")
| #[+tag method] Doc.read_bytes
p.
A static method, used to read serialized #[code Doc] objects from
a file. For example:
+code("python", "Example").
from spacy.tokens.doc import Doc
loc = 'test_serialize.bin'
with open(loc, 'wb') as file_:
file_.write(nlp(u'This is a document.').to_bytes())
file_.write(nlp(u'This is another.').to_bytes())
docs = []
with open(loc, 'rb') as file_:
for byte_string in Doc.read_bytes(file_):
docs.append(Doc(nlp.vocab).from_bytes(byte_string))
assert len(docs) == 2
+code("python", "Definition").
@staticmethod
def read_bytes(file_):
yield bytes

View File

@ -1,258 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > LANGUAGE
//- ----------------------------------
+section("language")
+h(2, "language", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/language.py")
| #[+tag class] Language
p.
A pipeline that transforms text strings into annotated spaCy Doc objects. Usually you'll load the Language pipeline once and pass the instance around your program.
+code("python", "Overview").
class Language:
Defaults = BaseDefaults
def __init__(self, path=True, **overrides):
self.vocab = Vocab()
self.tokenizer = Tokenizer()
self.tagger = Tagger()
self.parser = DependencyParser()
self.entity = EntityRecognizer()
self.make_doc = lambda text: Doc()
self.pipeline = [self.tagger, self.parser, self.entity]
def __call__(self, text, **toggle):
doc = self.make_doc(text)
for proc in self.pipeline:
if toggle.get(process.name, True):
process(doc)
return doc
def pipe(self, texts_iterator, batch_size=1000, n_threads=2, **toggle):
docs = (self.make_doc(text) for text in texts_iterator)
for process in self.pipeline:
if toggle.get(process.name, True):
docs = process.pipe(docs, batch_size=batch_size, n_threads=n_threads)
for doc in self.docs:
yield doc
def end_training(self, path=None):
return None
class English(Language):
class Defaults(BaseDefaults):
pass
class German(Language):
class Defaults(BaseDefaults):
pass
+section("english-init")
+h(3, "english-init")
| #[+tag method] Language.__init__
p
| Load the pipeline. You can disable components by passing None as a value,
| e.g. pass parser=None, vectors=None to save memory if you're not using
| those components. You can also pass an object as the value.
| Pass a function create_pipeline to use a custom pipeline --- see
| the custom pipeline tutorial.
+aside("Efficiency").
Loading takes 10-20 seconds, and the instance consumes 2 to 3
gigabytes of memory. Intended use is for one instance to be
created for each language per process, but you can create more
if you're doing something unusual. You may wish to make the
instance a global variable or "singleton".
+table(["Example", "Description"])
+row
+cell #[code nlp = English()]
+cell Load everything, from default path.
+row
+cell #[code nlp = English(path='my_data')]
+cell Load everything, from specified path
+row
+cell #[code nlp = English(path=path_obj)]
+cell Load everything, from an object that follows the #[code pathlib.Path] protocol.
+row
+cell #[code nlp = English(parser=False, vectors=False)]
+cell Load everything except the parser and the word vectors.
+row
+cell #[code nlp = English(parser=my_parser)]
+cell Load everything, and use a custom parser.
+row
+cell #[code nlp = English(create_pipeline=my_pipeline)]
+cell Load everything, and use a custom pipeline.
+code("python", "Definition").
def __init__(self, path=True, **overrides):
D = self.Defaults
self.vocab = Vocab(path=path, parent=self, **D.vocab) \
if 'vocab' not in overrides \
else overrides['vocab']
self.tokenizer = Tokenizer(self.vocab, path=path, **D.tokenizer) \
if 'tokenizer' not in overrides \
else overrides['tokenizer']
self.tagger = Tagger(self.vocab, path=path, **D.tagger) \
if 'tagger' not in overrides \
else overrides['tagger']
self.parser = DependencyParser(self.vocab, path=path, **D.parser) \
if 'parser' not in overrides \
else overrides['parser']
self.entity = EntityRecognizer(self.vocab, path=path, **D.entity) \
if 'entity' not in overrides \
else overrides['entity']
self.matcher = Matcher(self.vocab, path=path, **D.matcher) \
if 'matcher' not in overrides \
else overrides['matcher']
if 'make_doc' in overrides:
self.make_doc = overrides['make_doc']
elif 'create_make_doc' in overrides:
self.make_doc = overrides['create_make_doc'](self)
else:
self.make_doc = lambda text: self.tokenizer(text)
if 'pipeline' in overrides:
self.pipeline = overrides['pipeline']
elif 'create_pipeline' in overrides:
self.pipeline = overrides['create_pipeline'](self)
else:
self.pipeline = [self.tagger, self.parser, self.matcher, self.entity]
+section("language-call")
+h(3, "language-call")
| #[+tag method] Language.__call__
p
| The main entry point to spaCy. Takes raw unicode text, and returns
| a #[code Doc] object, which can be iterated to access #[code Token]
| and #[code Span] objects.
+aside("Efficiency").
spaCy's algorithms are all linear-time, so you can supply
documents of arbitrary length, e.g. whole novels.
+table(["Example", "Description"], "code")
+row
+cell #[ doc = nlp(u'Some text.')]
+cell Apply the full pipeline.
+row
+cell #[ doc = nlp(u'Some text.', parse=False)]
+cell Applies tagger and entity, not parser
+row
+cell #[ doc = nlp(u'Some text.', entity=False)]
+cell Applies tagger and parser, not entity.
+row
+cell #[ doc = nlp(u'Some text.', tag=False)]
+cell Does not apply tagger, entity or parser
+row
+cell #[ doc = nlp(u'')]
+cell Zero-length tokens, not an error
+row
+cell #[ doc = nlp(b'Some text')]
+cell Error: need unicode
+row
+cell #[ doc = nlp(b'Some text'.decode('utf8'))]
+cell Decode bytes into unicode first.
+code("python", "Definition").
def __call__(self, text, tag=True, parse=True, entity=True, matcher=True):
return self
+table(["Name", "Type", "Description"])
+row
+cell text
+cell #[+a(link_unicode) unicode]
+cell.
The text to be processed. spaCy expects raw unicode text
you don"t necessarily need to, say, split it into paragraphs.
However, depending on your documents, you might be better
off applying custom pre-processing. Non-text formatting,
e.g. from HTML mark-up, should be removed before sending
the document to spaCy. If your documents have a consistent
format, you may be able to improve accuracy by pre-processing.
For instance, if the first word of your documents are always
in upper-case, it may be helpful to normalize them before
supplying them to spaCy.
+row
+cell tag
+cell #[+a(link_bool) bool]
+cell.
Whether to apply the part-of-speech tagger. Required for
parsing and entity recognition.
+row
+cell parse
+cell #[+a(link_bool) bool]
+cell.
Whether to apply the syntactic dependency parser.
+row
+cell entity
+cell #[+a(link_bool) bool]
+cell.
Whether to apply the named entity recognizer.
+section("english-pipe")
+h(3, "english-pipe")
| #[+tag method] English.pipe
p
| Parse a sequence of texts into a sequence of #[code Doc] objects.
| Accepts a generator as input, and produces a generator as output.
| Internally, it accumulates a buffer of #[code batch_size]
| texts, works on them with #[code n_threads] workers in parallel,
| and then yields the #[code Doc] objects one by one.
+aside("Efficiency").
spaCy releases the global interpreter lock around the parser and
named entity recognizer, allowing shared-memory parallelism via
OpenMP. However, OpenMP is not supported on OSX — so multiple
threads will only be used on Linux and Windows.
+table(["Example", "Description"], "usage")
+row
+cell #[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/examples/parallel_parse.py") parallel_parse.py]
+cell Parse comments from Reddit in parallel.
+code("python", "Definition").
def pipe(self, texts, n_threads=2, batch_size=1000):
yield Doc()
+table(["Arg", "Type", "Description"])
+row
+cell texts
+cell
+cell.
A sequence of unicode objects. Usually you will want this
to be a generator, so that you don"t need to have all of
your texts in memory.
+row
+cell n_threads
+cell #[+a(link_int) int]
+cell.
The number of worker threads to use. If -1, OpenMP will
decide how many to use at run time. Default is 2.
+row
+cell batch_size
+cell #[+a(link_int) int]
+cell.
The number of texts to buffer. Let"s say you have a
#[code batch_size] of 1,000. The input, #[code texts], is
a generator that yields the texts one-by-one. We want to
operate on them in parallel. So, we accumulate a work queue.
Instead of taking one document from #[code texts] and
operating on it, we buffer #[code batch_size] documents,
work on them in parallel, and then yield them one-by-one.
Higher #[code batch_size] therefore often results in better
parallelism, up to a point.

View File

@ -1,194 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > LEXEME
//- ----------------------------------
+section("lexeme")
+h(2, "lexeme", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/lexeme.pyx")
| #[+tag class] Lexeme
p.
The Lexeme object represents a lexical type, stored in the vocabulary
as opposed to a token, occurring in a document.
p.
Each Token object receives a reference to a lexeme object (specifically,
it receives a pointer to a #[code LexemeC] struct). This allows features
to be computed and saved once per type, rather than once per token. As
job sizes grow, this amounts to substantial efficiency improvements, as
the vocabulary size (number of types) will be much smaller than the total
number of words processed (number of tokens).
p.
All Lexeme attributes are therefore context independent, as a single lexeme
is reused for all usages of that word. Lexemes are keyed by the #[code orth]
attribute.
p.
Most Lexeme attributes can be set, with the exception of the primary key,
#[code orth]. Assigning to an attribute of the #[code Lexeme] object writes
to the underlying struct, so all tokens that are backed by that
#[code Lexeme] will inherit the new value.
+code("python", "Overview").
class Lexeme:
def __init__(self, vocab, key):
return self
int rank
int orth, lower, shape, prefix, suffix
unicode orth_, lower_, shape_, prefix_, suffix_
bool is_alpha, is_ascii, is_lower, is_title, is_punct, is_space, like_url, like_num, like_email, is_oov, is_stop
float prob
int cluster
numpy.ndarray[float64] vector
bool has_vector
def set_flag(self, flag_id, value):
return None
def check_flag(self, flag_id):
return bool
def similarity(self, other):
return float
+table(["Example", "Description"])
+row
+cell #[code.lang-python lexeme = nlp.vocab[string]]
+cell Lookup by string
+row
+cell #[code.lang-python lexeme = vocab[i]]
+cell Lookup by integer
+section("lexeme-stringfeatures")
+h(3, "lexeme-stringfeatures").
String Features
+table(["Name", "Description"])
+row
+cell orth / orth_
+cell.
The form of the word with no string normalization or processing,
as it appears in the string, without trailing whitespace.
+row
+cell lower / lower_
+cell.
The form of the word, but forced to lower-case, i.e.
#[code lower = word.orth_.lower()]
+row
+cell shape / shape_
+cell.
A transform of the word's string, to show orthographic features.
The characters a-z are mapped to x, A-Z is mapped to X, 0-9
is mapped to d. After these mappings, sequences of 4 or more
of the same character are truncated to length 4. Examples:
C3Po --&gt; XdXx, favorite --&gt; xxxx, :) --&gt; :)
+row
+cell prefix / prefix_
+cell.
A length-N substring from the start of the word. Length may
vary by language; currently for English n=1, i.e.
#[code prefix = word.orth_[:1]]
+row
+cell suffix / suffix_
+cell.
A length-N substring from the end of the word. Length may vary
by language; currently for English n=3, i.e.
#[code suffix = word.orth_[-3:]]
+section("lexeme-booleanflags")
+h(3, "lexeme-booleanflags")
| Boolean Flags
+table(["Name", "Description"])
+row
+cell is_alpha
+cell Equivalent to #[code word.orth_.isalpha()]
+row
+cell is_ascii
+cell Equivalent to any(ord(c) >= 128 for c in word.orth_)]
+row
+cell is_digit
+cell Equivalent to #[code word.orth_.isdigit()]
+row
+cell is_lower
+cell Equivalent to #[code word.orth_.islower()]
+row
+cell is_title
+cell Equivalent to #[code word.orth_.istitle()]
+row
+cell is_punct
+cell Equivalent to #[code word.orth_.ispunct()]
+row
+cell is_space
+cell Equivalent to #[code word.orth_.isspace()]
+row
+cell like_url
+cell Does the word resemble a URL?
+row
+cell like_num
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
+row
+cell like_email
+cell Does the word resemble an email?
+row
+cell is_oov
+cell Is the word out-of-vocabulary?
+row
+cell is_stop
+cell.
Is the word part of a "stop list"? Stop lists are used to
improve the quality of topic models, by filtering out common,
domain-general words.
+section("lexeme-distributional")
+h(3, "lexeme-distributional")
| Distributional Features
+table(["Name", "Description"])
+row
+cell prob
+cell.
The unigram log-probability of the word, estimated from
counts from a large corpus, smoothed using Simple Good Turing
estimation.
+row
+cell cluster
+cell.
The Brown cluster ID of the word. These are often useful features
for linear models. If youre using a non-linear model, particularly
a neural net or random forest, consider using the real-valued
word representation vector, in #[code Token.repvec], instead.
+row
+cell vector
+cell.
A "word embedding" representation: a dense real-valued vector
that supports similarity queries between words. By default,
spaCy currently loads vectors produced by the Levy and
Goldberg (2014) dependency-based word2vec model.
+row
+cell has_vector
+cell.
A boolean value indicating whether a vector.

View File

@ -1,81 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > MATCHER
//- ----------------------------------
+section("matcher")
+h(2, "matcher", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/matcher.pyx")
| #[+tag class] Matcher
p A full example can be found #[a(href="https://github.com/" + SOCIAL.github + "/spaCy/blob/master/examples/matcher_example.py") here].
+table(["Usage", "Description"])
+row
+cell #[code.lang-python nlp(doc)]
+cell As part of annotation pipeline.
+row
+cell #[code.lang-python nlp.matcher(doc)]
+cell Explicit invocation.
+row
+cell #[code.lang-python nlp.matcher.add(u'FooCorp', u'ORG', {}, [[{u'ORTH': u'Foo'}]])]
+cell Add a pattern to match.
+section("matcher-init")
+h(3, "matcher-init") __init__(self, vocab, patterns)
+table(["Name", "Type", "Description"])
+row
+cell vocab
+cell #[code.lang-python spacy.vocab.Vocab]
+cell Reference to the shared vocabulary object.
+row
+cell patterns
+cell #[code {entity_key: (etype, attrs, specs)}]
+cell.
Initial patterns to match. See #[code Matcher.add]
+section("matcher-add")
+h(3, "matcher-add") add(self, entity_key, etype, attrs, specs)
+table(["Name", "Type", "Description"])
+row
+cell entity_key
+cell unicode or int
+cell Your arbitrary ID string (or its integer encoding)
+row
+cell etype
+cell unicode or int
+cell A pre-registered entity type, e.g. u'PERSON', u'ORG', etc.
+row
+cell attrs
+cell #[code dict]
+cell Placeholder for future support of entity attributes.
+row
+cell specs
+cell #[code [[{int: unicode}]]]
+cell A list of surface forms, where each surface form is defined as a list of token definitions, and each token definition is a dictionary mapping attribute IDs to attribute values.
+section("matcher-saveload")
+h(3, "matcher-saveload")
| Save and Load
+section("matcher-saveload-dump")
+h(4, "matcher-saveload-dump") dump(loc)
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell #[+a(link_unicode) unicode]
+cell Path to save the gazetteer.json file.
+section("matcher-saveload-load")
+h(4, "matcher-saveload-load") load(loc)
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell #[+a(link_unicode) unicode]
+cell.
Path to load the gazetteer.json file from.

View File

@ -1,305 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > SPAN
//- ----------------------------------
+section("span")
+h(2, "span", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/span.pyx")
| #[+tag class] Span
p.
A slice of a #[code Doc] object, consisting of zero or
more tokens. Spans are usually used to represent sentences, named entities,
phrases.
+aside("Implementation").
#[code Span] objects are views — that is, they do not copy the
underlying C data. This makes them cheap to construct, as internally are
simply a reference to the #[code Doc] object, a start position, an end
position, and a label ID.
+code("python", "Overview").
class Span:
doc = Doc
start = int
end = int
label = int
def __init__(self, doc, start, end, label=0, vector=None, vector_norm=None):
return self
def __len__(self):
return int
def __getitem__(self, i):
return Token()
def __iter__(self):
yield Token()
def similarity(self, other):
return float
def merge(self, tag, lemma, ent_type):
return None
@property
def label_(self):
return unicode
@property
def vector(self):
return numpy.ndarray(dtype="float64")
@property
def vector_norm(self):
return float
@property
def text(self):
return unicode
@property
def text_with_ws(self):
return unicode
@property
def orth_(self):
return unicode
@property
def lemma_(self):
return unicode
@property
def root(self):
return Token()
@property
def lefts(self):
yield Token()
@property
def rights(self):
yield Token()
@property
def subtree(self):
yield Token()
+section("span-create")
+h(3, "span-init")
| #[+tag Section] Create a Span
p.
Span instances are usually created via the #[code Doc] object.
+table(["Example", "Description"])
+row
+cell #[code.lang-python span = doc[4 : 7]]
+cell Produce a span with tokens 4, 5 and 6.
+row
+cell #[code.lang-python span = Span(doc, start, end, label=spacy.symbols.PERSON)]
+cell Calling #[code Span.__init__] directly allows you to set a label.
+row
+cell #[code.lang-python for entity in doc.ents]
+cell See #[a(href="/docs#doc-spans-ents") Doc.ents]
+row
+cell #[code.lang-python for sentence in doc.sents]
+cell See #[a(href="/docs#doc-spans-sents") Doc.sents]
+row
+cell #[code.lang-python for noun_phrase in doc.noun_chunks]
+cell See #[a(href="/docs#doc-spans-nounchunks") Doc.noun_chunks]
+code("python", "Definition").
def __init__(self, doc, start, end, label=0, vector=None, vector_norm=None):
return Span()
+table(["Name", "Type", "Description"])
+row
+cell doc
+cell Doc
+cell The parent doc object, to slice from.
+row
+cell start
+cell int
+cell The index of the first token in the slice.
+row
+cell end
+cell int
+cell The index of the first token #[em outside] the slice (since ranges are exclusive in Python).
+row
+cell label
+cell int or unicode
+cell A label for the span. Either a string, or an integer ID, that should refer to a string mapped by the #[code Doc] object"s #[code StringStore].
+row
+cell vector
+cell
+cell
+row
+cell vector_norm
+cell
+cell
+section("span-merge")
+h(3, "span-merge")
| #[+tag method] Span.merge
p.
Merge the span into a single token, modifying the underlying
#[code.lang-python Doc] object in place.
+aside("Caveat").
Magic is done to allow you to call #[code.lang-python merge()]
without invalidating other #[code.lang-python Span] objects.
However, it"s difficult to ensure all indices are recomputed
correctly. Please report any errors encountered on the issue
tracker.
+code("python", "Example").
for ent in doc.ents:
ent.merge(ent.root.tag_, ent.text, ent.label_)
for np in doc.noun_chunks:
while len(np) > 1 and np[0].dep_ not in ('advmod', 'amod', 'compound'):
np = np[1:]
np.merge(np.root.tag_, np.text, np.root.ent_type_)
+code("python", "Definition").
def merge(self, tag, lemma, ent_type):
return None
+table(["Name", "Type", "Description"])
+row
+cell tag
+cell unicode
+cell The fine-grained part-of-speech tag to assign to the new token.
+row
+cell lemma
+cell unicode
+cell The lemma string for the new token.
+row
+cell ent_type
+cell unicode
+cell The named entity type to assign to the new token.
+section("span-similarity")
+h(3, "span-similarity")
| #[+tag method] Span.similarity
p Estimate the semantic similarity between the span and another #[code Span], #[code Doc], #[code Token] or #[code Lexeme].
+aside("Algorithm").
Similarity is estimated
using the cosine metric, between #[code Span.vector] and #[code other.vector].
By default, #[code Span.vector] is computed by averaging the vectors
of its tokens.
+code("python", "Example").
doc = nlp("Apples and oranges are similar. Boots and hippos aren't.")
apples_sent, boots_sent = doc.sents
fruit = doc.vocab[u'fruit']
assert apples_sent.similarity(fruit) > boot_sent.similarity(fruit)
+code("python", "Definition").
def similarity(self, other):
return float
+table(["Name", "Type", "Description"])
+row
+cell other
+cell Token, Span, Doc or Lexeme
+cell The other object to judge similarity with.
+section("span-sequence")
+h(3, "span-sequence")
| #[+tag section] Span as a Sequence
p.
#[code Span] objects act as a sequence of #[code Token] objects. In
this way they mirror the API of the #[code Doc] object.
+table(["Name", "Description"], "params")
+row
+cell #[code.lang-python token = span[i]]
+cell.
Get the #[code Token] object at position #[em i], where
#[code i] is an offset within the #[code Span], not the
document. That is, if you have #[code.lang-python span = doc[4:6]],
then #[code.lang-python span[0].i == 4]
+row
+cell #[code.lang-python for token in span]
+cell.
Iterate over the #[code Token] objects in the span.
+row
+cell __len__
+cell Number of tokens in the span.
+row
+cell text
+cell.
The text content of the span, obtained from
#[code ''.join(token.text_with_ws for token in span)].
+row
+cell start
+cell.
The start offset of the span, i.e. #[code span[0].i].
+row
+cell end
+cell.
The end offset of the span, i.e. #[code span[-1].i + 1].
+section("span-navigating-parse")
+h(3, "span-navigativing-parse")
| #[+tag Section] Span and the Syntactic Parse
p.
Span objects allow similar access to the syntactic parse as individual
tokens.
+table(["Name", "Type", "Description"])
+row
+cell root
+cell #[code.lang-python Token]
+cell.
The word with the shortest path to the root of the sentence is
the root of the span.
+row
+cell lefts
+cell #[code.lang-python yield Token]
+cell Tokens that are to the left of the span, whose head is within it.
+row
+cell rights
+cell #[code.lang-python yield Token]
+cell Tokens that are to the right of the span, whose head is within it.
+row
+cell subtree
+cell #[code.lang-python yield Token]
+cell.
Tokens in the range #[code (start, end+1)], where #[code start]
is the index of the leftmost word descended from a token in the
span, and #[code end] is the index of the rightmost token descended
from a token in the span.
+section("span-strings")
+h(3, "span-strings")
| #[+tag section] Span"s Strings API
p.
You can access the textual content of the span, and different view of
it, with the following properties.
+table(["Name", "Type", "Description"])
+row
+cell text_with_ws
+cell unicode
+cell.
The form of the span as it appears in the string, including
trailing whitespace. This is useful when you need to use linguistic
features to add inline mark-up to the string.
+row
+cell lemma / lemma_
+cell int / unicode
+cell.
Whitespace-concatenated lemmas of each token in the span.
+row
+cell label / label_
+cell int / unicode
+cell.
The span label, used particularly for named entities.

View File

@ -1,105 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > STRINGSTORE
//- ----------------------------------
+section("stringstore")
+h(2, "stringstore", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/strings.pyx")
| #[+tag class] StringStore
p Intern strings, and map them to sequential integer IDs.
p.
Only the integer IDs are held by spaCy's data
classes (#[code Doc], #[code Token], #[code Span] and #[code Lexeme])
&ndash; when you use a string-valued attribute like #[code token.orth_],
you access a property that computes #[code token.strings[token.orth]].
+aside("Efficiency").
The mapping table is very efficient , and a small-string optimization
is used to maintain a small memory footprint.
+table(["Usage", "Description"])
+row
+cell #[code string = string_store[int_id]]
+cell.
Retrieve a string from a given integer ID. If the integer ID
is not found, raise #[code IndexError].
+row
+cell #[code int_id = string_store[unicode_string]]
+cell.
Map a unicode string to an integer ID. If the string is
previously unseen, it is interned, and a new ID is returned.
+row
+cell #[code int_id = string_store[utf8_byte_string]]
+cell.
Byte strings are assumed to be in UTF-8 encoding. Strings
encoded with other codecs may fail silently. Given a utf8
string, the behaviour is the same as for unicode strings.
Internally, strings are stored in UTF-8 format. So if you start
with a UTF-8 byte string, it's less efficient to first decode
it as unicode, as StringStore will then have to encode it as
UTF-8 once again.
+row
+cell #[code n_strings = len(string_store)]
+cell.
Number of strings in the string-store.
+row
+cell #[code for string in string_store]
+cell
p.
Iterate over strings in the string store, in order, such
that the ith string in the sequence has the ID #[code i]:
+code.code-block-small.no-block.
string_store = doc.vocab.strings
for i, string in enumerate(string_store):
assert i == string_store[string]
+section("stringstore-init")
+h(3, "stringstore-init")
| #[+tag method] StringStore.__init__
+code("python", "Definition").
def __init__(self):
return self
+section("stringstore-dump")
+h(3, "stringstore-dump")
| #[+tag method] StringStore.dump
p Save the string-to-int mapping to the given file.
+code("python", "Definition").
def dump(self, file):
return None
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell str
+cell.
The file to write the data to.
+section("stringstore-load")
+h(3, "stringstore-load")
| #[+tag method] StringStore.load
p Load the strings from the given file.
+code("python", "Definition").
def load(self, file):
return None
+table(["Name", "Type", "Description"])
+row
+cell file
+cell file
+cell.
File-like object to load the data from. The format is subject
to change; so if you need to read/write compatible files, please
find details in the strings.pyx source.

View File

@ -1,321 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > TOKEN
//- ----------------------------------
+section("token")
+h(2, "token", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/token.pyx")
| #[+tag class] Token
p.
A Token represents a single word, punctuation or significant whitespace
symbol. Integer IDs are provided for all string features. The (unicode)
string is provided by an attribute of the same name followed by an underscore,
e.g. #[code token.orth] is an integer ID, #[code token.orth_] is the unicode
value. The only exception is the #[code token.text] attribute, which is (unicode)
string-typed.
+section("token-init")
+h(3, "token-init")
| Token.__init__
+code("python", "Definition").
def __init__(vocab, doc, offset):
return Token()
+table(["Name", "Type", "Description"])
+row
+cell vocab
+cell Vocab
+cell A Vocab object
+row
+cell doc
+cell Doc
+cell The parent sequence
+row
+cell offset
+cell #[+a(link_int) int]
+cell The index of the token within the document
+section("token-stringfeatures")
+h(3, "token-stringfeatures")
| String Features
+table(["Name", "Description"])
+row
+cell lemma / lemma_
+cell.
The "base" of the word, with no inflectional suffixes, e.g.
the lemma of "developing" is "develop", the lemma of "geese"
is "goose", etc. Note that #[em derivational] suffixes are
not stripped, e.g. the lemma of "instutitions" is "institution",
not "institute". Lemmatization is performed using the WordNet
data, but extended to also cover closed-class words such as
pronouns. By default, the WN lemmatizer returns "hi" as the
lemma of "his". We assign pronouns the lemma #[code -PRON-].
+row
+cell orth / orth_
+cell.
The form of the word with no string normalization or processing,
as it appears in the string, without trailing whitespace.
+row
+cell lower / lower_
+cell.
The form of the word, but forced to lower-case, i.e.
#[code lower = word.orth_.lower()]
+row
+cell shape / shape_
+cell.
A transform of the word's string, to show orthographic features.
The characters a-z are mapped to x, A-Z is mapped to X, 0-9
is mapped to d. After these mappings, sequences of 4 or more
of the same character are truncated to length 4. Examples:
C3Po --&gt; XdXx, favorite --&gt; xxxx, :) --&gt; :)
+row
+cell prefix / prefix_
+cell.
A length-N substring from the start of the word. Length may
vary by language; currently for English n=1, i.e.
#[code prefix = word.orth_[:1]]
+row
+cell suffix / suffix_
+cell.
A length-N substring from the end of the word. Length may
vary by language; currently for English n=3, i.e.
#[code suffix = word.orth_[-3:]]
+section("token-booleanflags")
+h(3, "token-booleanflags")
| Boolean Flags
+table(["Name", "Description"])
+row
+cell is_alpha
+cell.
Equivalent to #[code word.orth_.isalpha()]
+row
+cell is_ascii
+cell.
Equivalent to any(ord(c) >= 128 for c in word.orth_)]
+row
+cell is_digit
+cell.
Equivalent to #[code word.orth_.isdigit()]
+row
+cell is_lower
+cell.
Equivalent to #[code word.orth_.islower()]
+row
+cell is_title
+cell.
Equivalent to #[code word.orth_.istitle()]
+row
+cell is_punct
+cell.
Equivalent to #[code word.orth_.ispunct()]
+row
+cell is_space
+cell.
Equivalent to #[code word.orth_.isspace()]
+row
+cell like_url
+cell.
Does the word resemble a URL?
+row
+cell like_num
+cell.
Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
+row
+cell like_email
+cell.
Does the word resemble an email?
+row
+cell is_oov
+cell.
Is the word out-of-vocabulary?
+row
+cell is_stop
+cell.
Is the word part of a "stop list"? Stop lists are used to
improve the quality of topic models, by filtering out common,
domain-general words.
+section("token-distributional")
+h(3, "token-distributional")
| Distributional Features
+table(["Name", "Description"])
+row
+cell prob
+cell.
The unigram log-probability of the word, estimated from
counts from a large corpus, smoothed using Simple Good Turing
estimation.
+row
+cell cluster
+cell.
The Brown cluster ID of the word. These are often useful features
for linear models. If youre using a non-linear model, particularly
a neural net or random forest, consider using the real-valued
word representation vector, in #[code Token.repvec], instead.
+row
+cell vector
+cell.
A "word embedding" representation: a dense real-valued vector
that supports similarity queries between words. By default,
spaCy currently loads vectors produced by the Levy and
Goldberg (2014) dependency-based word2vec model.
+row
+cell has_vector
+cell.
A boolean value indicating whether a vector.
+section("token-alignment")
+h(3, "token-alignment")
| Alignment and Output
+table(["Name", "Description"])
+row
+cell idx
+cell.
Start index of the token in the string
+row
+cell len(token)
+cell.
Length of the token's orth string, in unicode code-points.
+row
+cell unicode(token)
+cell.
Same as #[code token.orth_].
+row
+cell str(token)
+cell.
In Python 3, returns #[code token.orth_]. In Python 2, returns
#[code token.orth_.encode('utf8')].
+row
+cell text
+cell.
An alias for #[code token.orth_].
+row
+cell text_with_ws
+cell.
#[code token.orth_ + token.whitespace_], i.e. the form of the
word as it appears in the string, trailing whitespace. This is
useful when you need to use linguistic features to add inline
mark-up to the string.
+row
+cell whitespace_
+cell.
The number of immediate syntactic children following the word
in the string.
+section("token-postags")
+h(3, "token-postags")
| Part-of-Speech Tags
+table(["Name", "Description"])
+row
+cell pos / pos_
+cell.
A coarse-grained, less detailed tag that represents the
word-class of the token. The set of #[code .pos] tags are
consistent across languages. The available tags are #[code ADJ],
#[code ADP], #[code ADV], #[code AUX], #[code CONJ], #[code DET],
#[code INTJ], #[code NOUN], #[code NUM], #[code PART],
#[code PRON], #[code PROPN], #[code PUNCT], #[code SCONJ],
#[code SYM], #[code VERB], #[code X], #[code EOL], #[code SPACE].
+row
+cell tag / tag_
+cell.
A fine-grained, more detailed tag that represents the
word-class and some basic morphological information for the
token. These tags are primarily designed to be good features
for subsequent models, particularly the syntactic parser.
They are language and treebank dependent. The tagger is
trained to predict these fine-grained tags, and then a
mapping table is used to reduce them to the coarse-grained
#[code .pos] tags.
+section("token-navigating")
+h(3, "token-navigating") Navigating the Parse Tree
+table(["Name", "Description"])
+row
+cell dep / dep_
+cell.
The syntactic relation type, aka the dependency label, connecting the word to its head.
+row
+cell head
+cell.
The immediate syntactic head of the token. If the token is the
root of its sentence, it is the token itself, i.e.
#[code root_token.head is root_token].
+row
+cell children
+cell.
An iterator that yields from lefts, and then yields from rights.
+row
+cell subtree
+cell.
An iterator for the part of the sentence syntactically governed
by the word, including the word itself.
+row
+cell left_edge
+cell.
The leftmost edge of the token's subtree.
+row
+cell right_edge
+cell.
The rightmost edge of the token's subtree.
+row
+cell nbor(i=1)
+cell.
Get the #[code i]#[sup th] next / previous neighboring token.
+section("token-namedentities")
+h(3, "token-namedentities")
| Named Entity Recognition
+table(["Name", "Description"])
+row
+cell ent_type
+cell.
If the token is part of an entity, its entity type.
+row
+cell ent_iob
+cell.
The IOB (inside, outside, begin) entity recognition tag for
the token.

View File

@ -1,154 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > API > VOCAB
//- ----------------------------------
+section("vocab")
+h(2, "vocab", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/vocab.pyx")
| #[+tag class] Vocab
p
| A look-up table that allows you to access #[code.lang-python Lexeme]
| objects. The #[code.lang-python Vocab] instance also provides access to
| the #[code.lang-python StringStore], and owns underlying C-data that
| is shared between #[code.lang-python Doc] objects.
+aside('Caveat').
You should avoid working with #[code Doc], #[code Token] or #[code Span]
objects backed by multiple different #[code Vocab] instances, as
they may assume inconsistent string-to-integer encodings. All #[code Doc]
objects produced by the same #[code Language] instance will hold
a reference to the same #[code Vocab] instance.
+code("python", "Overview").
class Vocab:
StringStore strings
Morphology morphology
dict get_lex_attr
int vectors_length
def __init__(self, get_lex_attr=None, tag_map=None, lemmatizer=None, serializer_freqs=None):
return self
@classmethod
def load(cls, data_dir, get_lex_attr):
return Vocab()
@classmethod
def from_package(cls, package, get_lx_attr=None, vectors_package=None):
return Vocab()
property serializer:
return Packer()
def __len__(self):
return int
def __contains__(self, string):
return bool
def __getitem__(self, id_or_string):
return Lexeme()
def dump(self, loc):
return None
def load_lexemes(self, loc):
return None
def dump_vectors(self, out_loc):
return None
def load_vectors(self, file_):
return int
def load_vectors_from_bin_loc(self, loc):
return int
+table(["Example", "Description"])
+row
+cell #[code.lang-python lexeme = vocab[integer_id]]
+cell.
Get a lexeme by its orth ID.
+row
+cell #[code.lang-python lexeme = vocab[string]]
+cell.
Get a lexeme by the string corresponding to its orth ID.
+row
+cell #[code.lang-python for lexeme in vocab]
+cell.
Iterate over #[code Lexeme] objects.
+row
+cell #[code.lang-python int_id = vocab.strings[u'dog']]
+cell.
Access the #[code StringStore] via #[code vocab.strings]
+row
+cell #[code.lang-python nlp.vocab is nlp.tokenizer.vocab]
+cell.
Access the from #[code.lang-python Doc]
+section("vocab-dump")
+h(3, "vocab-dump")
| #[+tag method] Vocab.dump
+code("python", "Definition").
def dump(self, loc):
return None
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell #[+a(link_unicode) unicode]
+cell Path where the vocabulary should be saved.
+section("vocab-load_lexemes")
+h(3, "vocab-load_lexemes")
| #[+tag method] Vocab.load_lexemes
+code("python", "Definition").
def load_lexemes(self, loc):
return None
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell #[+a(link_unicode) unicode]
+cell Path to load the lexemes.bin file from.
+section("vocab-dump_vectors")
+h(3, "vocab-dump_vectors")
| #[+tag method] Vocab.dump_vectors
+code("python", "Definition").
def dump_vectors(self, loc):
return None
+section("vocab-loadvectors")
+h(3, "vocab-loadvectors")
| #[+tag method] Vocab.load_vectors
+code("python", "Definition").
def load_vectors(self, file_):
return None
+table(["Name", "Type", "Description"])
+row
+cell file
+cell #[+a(link_unicode) unicode]
+cell A file-like object, to load word vectors from.
+section("vocab-loadvectorsfrombinloc")
+h(3, "vocab-saveload-loadvectorsfrom")
| #[+tag method] Vocab.load_vectors_from_bin_loc
+code("python", "Definition").
def load_vectors_from_bin_loc(self, loc):
return None
+table(["Name", "Type", "Description"])
+row
+cell loc
+cell #[+a(link_unicode) unicode]
+cell.
A path to a file, in spaCy's binary word-vectors file format.

View File

@ -2,29 +2,27 @@
"index": {
"title" : "Documentation",
"sidebar": {
"Quickstart": [
["Getting started", "#getting-started", "getting-started"],
["Usage Examples", "#examples", "examples"]
],
"API": [
["Language", "#language", "language"],
["Doc", "#doc", "doc"],
["Token", "#token", "token"],
["Span", "#span", "span"],
["Lexeme", "#lexeme", "lexeme"],
["Vocab", "#vocab", "vocab"],
["StringStore", "#stringstore", "stringstore"],
["Matcher", "#matcher", "matcher"]
],
"More": [
["Annotation Specs", "#annotation", "annotation"],
["Tutorials", "#tutorials", "tutorials"]
],
"Feedback": [
["Suggest Edits", "https://github.com/spacy-io/spaCy/tree/master/website/docs"],
["Github Issue Tracker", "https://github.com/spacy-io/spaCy/issues"]
]
"sections": {
"Usage": {
"url": "/docs/usage",
"svg": "computer",
"description": "How to use spaCy and its features."
},
"API": {
"url": "/docs/api",
"svg": "brain",
"description": "The detailed reference for spaCy's API."
},
"Tutorials": {
"url": "/docs/usage/tutorials",
"svg": "eye",
"description": "End-to-end examples, with code you can modify and run."
},
"Showcase & Demos": {
"url": "/docs/usage/showcase",
"svg": "bubble",
"description": "Demos, libraries and products from the spaCy community."
}
}
}
}

View File

@ -1,176 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > QUICKSTART > USAGE EXAMPLES
//- ----------------------------------
+section("examples")
+h(2, "examples").
Usage Examples
+h(3, "examples-resources") Load resources and process text
+code.
import spacy
en_nlp = spacy.load('en')
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
de_doc = de_nlp(u'ich bin ein Berliner.')
+h(3, "multi-threaded") Multi-threaded generator (using OpenMP. No GIL!)
+code.
texts = [u'One document.', u'...', u'Lots of documents']
# .pipe streams input, and produces streaming output
iter_texts = (texts[i % 3] for i in xrange(100000000))
for i, doc in enumerate(nlp.pipe(iter_texts, batch_size=50, n_threads=4)):
assert doc.is_parsed
if i == 100:
break
+h(3, "examples-tokens-sentences") Get tokens and sentences
+code.
token = doc[0]
sentence = next(doc.sents)
assert token is sentence[0]
assert sentence.text == 'Hello, world.'
+h(3, "examples-integer-ids") Use integer IDs for any string
+code.
hello_id = nlp.vocab.strings['Hello']
hello_str = nlp.vocab.strings[hello_id]
assert token.orth == hello_id == 3125
assert token.orth_ == hello_str == 'Hello'
+h(3, "examples-string-views-flags") Get and set string views and flags
+code.
assert token.shape_ == 'Xxxxx'
for lexeme in nlp.vocab:
if lexeme.is_alpha:
lexeme.shape_ = 'W'
elif lexeme.is_digit:
lexeme.shape_ = 'D'
elif lexeme.is_punct:
lexeme.shape_ = 'P'
else:
lexeme.shape_ = 'M'
assert token.shape_ == 'W'
+h(3, "examples-numpy-arrays") Export to numpy arrays
+code.
from spacy.attrs import ORTH, LIKE_URL, IS_OOV
attr_ids = [ORTH, LIKE_URL, IS_OOV]
doc_array = doc.to_array(attr_ids)
assert doc_array.shape == (len(doc), len(attr_ids))
assert doc[0].orth == doc_array[0, 0]
assert doc[1].orth == doc_array[1, 0]
assert doc[0].like_url == doc_array[0, 1]
assert list(doc_array[:, 1]) == [t.like_url for t in doc]
+h(3, "examples-word-vectors") Word vectors
+code.
doc = nlp("Apples and oranges are similar. Boots and hippos aren't.")
apples = doc[0]
oranges = doc[2]
boots = doc[6]
hippos = doc[8]
assert apples.similarity(oranges) > boots.similarity(hippos)
+h(3, "examples-pos-tags") Part-of-speech tags
+code.
from spacy.parts_of_speech import ADV
def is_adverb(token):
return token.pos == spacy.parts_of_speech.ADV
# These are data-specific, so no constants are provided. You have to look
# up the IDs from the StringStore.
NNS = nlp.vocab.strings['NNS']
NNPS = nlp.vocab.strings['NNPS']
def is_plural_noun(token):
return token.tag == NNS or token.tag == NNPS
def print_coarse_pos(token):
print(token.pos_)
def print_fine_pos(token):
print(token.tag_)
+h(3, "examples-dependencies") Syntactic dependencies
+code.
def dependency_labels_to_root(token):
'''Walk up the syntactic tree, collecting the arc labels.'''
dep_labels = []
while token.head is not token:
dep_labels.append(token.dep)
token = token.head
return dep_labels
+h(3, "examples-entities") Named entities
+code.
def iter_products(docs):
for doc in docs:
for ent in doc.ents:
if ent.label_ == 'PRODUCT':
yield ent
def word_is_in_entity(word):
return word.ent_type != 0
def count_parent_verb_by_person(docs):
counts = defaultdict(defaultdict(int))
for doc in docs:
for ent in doc.ents:
if ent.label_ == 'PERSON' and ent.root.head.pos == VERB:
counts[ent.orth_][ent.root.head.lemma_] += 1
return counts
+h(3, "examples-inline") Calculate inline mark-up on original string
+code.
def put_spans_around_tokens(doc, get_classes):
'''Given some function to compute class names, put each token in a
span element, with the appropriate classes computed.
All whitespace is preserved, outside of the spans. (Yes, I know HTML
won't display it. But the point is no information is lost, so you can
calculate what you need, e.g. <br /> tags, <p> tags, etc.)
'''
output = []
template = '<span classes="{classes}">{word}</span>{space}'
for token in doc:
if token.is_space:
output.append(token.orth_)
else:
output.append(
template.format(
classes=' '.join(get_classes(token)),
word=token.orth_,
space=token.whitespace_))
string = ''.join(output)
string = string.replace('\n', '')
string = string.replace('\t', ' ')
return string
+h(3, "examples-binary") Efficient binary serialization
+code.
import spacy
from spacy.tokens.doc import Doc
byte_string = doc.to_bytes()
open('moby_dick.bin', 'wb').write(byte_string)
nlp = spacy.load('en')
for byte_string in Doc.read_bytes(open('moby_dick.bin', 'rb')):
doc = Doc(nlp.vocab)
doc.from_bytes(byte_string)

View File

@ -1,122 +0,0 @@
//- ----------------------------------
//- 💫 QUICKSTART > GETTING STARTED
//- ----------------------------------
+section("getting-started")
+h(2, "getting-started")
| Getting started
+section("install-spacy")
+h(3, "install-spacy")
| Install spaCy
p.
spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux,
OS X and Windows. The latest spaCy releases are currently only available as source packages over #[+a("https://pypy.python.org/pypi/spacy") pip]. Installaton requires a working build environment. See notes on #[a(href="/docs#install-source-ubuntu") Ubuntu],
#[a(href="/docs#install-source-osx") OS X] and
#[a(href="/docs#install-source-windows") Windows] for details.
+code("bash", "pip").
pip install -U spacy
p.
After installation you need to download a language model. Models for English (#[code en]) and German (#[code de]) are available.
+code("bash").
# English:
# - Install tagger, parser, NER and GloVe vectors:
python -m spacy.en.download all
# - OR install English tagger, parser and NER
python -m spacy.en.download parser
# - OR install English GloVe vectors
python -m spacy.en.download glove
# German:
# - Install German tagger, parser, NER and word vectors
python -m spacy.de.download all
# Upgrade/overwrite existing data
python -m spacy.en.download --force
# Check whether the model was successfully installed
python -c "import spacy; spacy.load('en'); print('OK')"
p.
The download command fetches and installs about 1 GB of data which it installs
within the #[code spacy] package directory.
+section("install-source")
+h(3, "install-source")
| Compile from source
p.
The other way to install spaCy is to clone its
#[a(href="https://github.com/spacy-io/spaCy") GitHub repository] and
build it from source. That is the common way if you want to make changes
to the code base.
p.
You'll need to make sure that you have a development enviroment consisting
of a Python distribution including header files, a compiler, pip,
virtualenv and git installed. The compiler
part is the trickiest. How to do that depends on your system. See
notes on #[a(href="/docs#install-source-ubuntu") Ubuntu],
#[a(href="/docs#install-source-osx") OS X] and
#[a(href="/docs#install-source-windows") Windows] for details.
+code("bash").
# make sure you are using recent pip/virtualenv versions
python -m pip install -U pip virtualenv
# find git install instructions at https://git-scm.com/downloads
git clone https://github.com/spacy-io/spaCy.git
cd spaCy
virtualenv .env && source .env/bin/activate
pip install -r requirements.txt
pip install -e .
p.
Compared to regular install via #[code pip] and #[code conda]
#[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/requirements.txt") requirements.txt]
additionally installs developer dependencies such as #[code cython].
+h(4, "install-source-ubuntu")
| Ubuntu
p Install system-level dependencies via #[code apt-get]:
+code("bash").
sudo apt-get install build-essential python-dev git
+h(4, "install-source-osx")
| OS X
p.
Install a recent version of XCode, including the so-called "Command Line Tools". OS X
ships with Python and git preinstalled.
+h(4, "install-source-windows")
| Windows
p.
Install a version of Visual Studio Express or higher that matches the version that was
used to compile your Python interpreter. For official distributions
these are VS 2008 (Python 2.7), VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
+section("run-tests")
+h(3, "run-tests")
| Run tests
p.
spaCy comes with an extensive test suite. First, find out where spaCy is installed:
+code("bash").
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
p.
Then run #[code pytest] on that directory. The flags #[code --vectors],
#[code --slow] and #[code --model] are optional and enable additional tests:
+code("bash").
# make sure you are using recent pytest version
python -m pip install -U pytest
python -m pytest &lt;spacy-directory&gt; --vectors --model --slow

View File

@ -1,12 +0,0 @@
//- ----------------------------------
//- 💫 DOCS > TUTORIALS
//- ----------------------------------
+section("tutorials")
+h(2, "tutorials") Tutorials
each post, slug in public.docs.tutorials._data
if slug != 'index'
a.o-block(href='/docs/tutorials/' + slug)
+h(3)=post.title
p=post.description

103
website/docs/api/_data.json Normal file
View File

@ -0,0 +1,103 @@
{
"sidebar": {
"Introduction": {
"Facts & Figures": "./",
"Philosophy": "philosophy"
},
"Classes": {
"Doc": "doc",
"Token": "token",
"Span": "span",
"Language": "language",
"Tagger": "tagger",
"DependencyParser": "dependencyparser",
"EntityRecognizer": "entityrecognizer",
"Matcher": "matcher",
"Lexeme": "lexeme",
"Vocab": "vocab",
"StringStore": "stringstore",
"GoldParse": "goldparse"
},
"Other": {
"Annotation Specs": "annotation"
}
},
"index": {
"title": "Facts & Figures",
"next": "philosophy"
},
"philosophy": {
"title": "Philosophy"
},
"language": {
"title": "Language",
"tag": "class"
},
"doc": {
"title": "Doc",
"tag": "class"
},
"token": {
"title": "Token",
"tag": "class"
},
"span": {
"title": "Span",
"tag": "class"
},
"lexeme": {
"title": "Lexeme",
"tag": "class"
},
"vocab": {
"title": "Vocab",
"tag": "class"
},
"stringstore": {
"title": "StringStore",
"tag": "class"
},
"matcher": {
"title": "Matcher",
"tag": "class"
},
"dependenyparser": {
"title": "DependencyParser",
"tag": "class"
},
"entityrecognizer": {
"title": "EntityRecognizer",
"tag": "class"
},
"dependencyparser": {
"title": "DependencyParser",
"tag": "class"
},
"tagger": {
"title": "Tagger",
"tag": "class"
},
"goldparse": {
"title": "GoldParse",
"tag": "class"
},
"annotation": {
"title": "Annotation Specifications"
}
}

View File

@ -0,0 +1,148 @@
//- 💫 DOCS > API > ANNOTATION SPECS
include ../../_includes/_mixins
p This document describes the target annotations spaCy is trained to predict.
+h(2, "tokenization") Tokenization
p
| Tokenization standards are based on the
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus.
| The tokenizer differs from most by including tokens for significant
| whitespace. Any sequence of whitespace characters beyond a single space
| (#[code ' ']) is included as a token.
+aside-code("Example").
from spacy.en import English
nlp = English(parser=False)
tokens = nlp('Some\nspaces and\ttab characters')
print([t.orth_ for t in tokens])
# ['Some', '\n', 'spaces', ' ', 'and', '\t', 'tab', 'characters']
p
| The whitespace tokens are useful for much the same reason punctuation is
| it's often an important delimiter in the text. By preserving it in the
| token output, we are able to maintain a simple alignment between the
| tokens and the original string, and we ensure that no information is
| lost during processing.
+h(2, "sentence-boundary") Sentence boundary detection
p
| Sentence boundaries are calculated from the syntactic parse tree, so
| features such as punctuation and capitalisation play an important but
| non-decisive role in determining the sentence boundaries. Usually this
| means that the sentence boundaries will at least coincide with clause
| boundaries, even given poorly punctuated text.
+h(2, "pos-tagging") Part-of-speech Tagging
p
| The part-of-speech tagger uses the
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] version of
| the Penn Treebank tag set. We also map the tags to the simpler Google
| Universal POS Tag set. See
| #[+src(gh("spaCy", "spacy/tagger.pyx")) tagger.pyx] for details.
+h(2, "lemmatization") Lemmatization
p A "lemma" is the uninflected form of a word. In English, this means:
+list
+item #[strong Adjectives]: The form like "happy", not "happier" or "happiest"
+item #[strong Adverbs]: The form like "badly", not "worse" or "worst"
+item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children"
+item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written"
p
| The lemmatization data is taken from
| #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a
| special case for pronouns: all pronouns are lemmatized to the special
| token #[code -PRON-].
+h(2, "dependency-parsing") Syntactic Dependency Parsing
p
| The parser is trained on data produced by the
| #[+a("http://www.clearnlp.com") ClearNLP] converter. Details of the
| annotation scheme can be found
| #[+a("http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf") here].
+h(2, "named-entities") Named Entity Recognition
+table(["Entity Type", "Description"])
+row
+cell #[code PERSON]
+cell People, including fictional.
+row
+cell #[code NORP]
+cell Nationalities or religious or political groups.
+row
+cell #[code FAC]
+cell Facilities, such as buildings, airports, highways, bridges, etc.
+row
+cell #[code ORG]
+cell Companies, agencies, institutions, etc.
+row
+cell #[code GPE]
+cell Countries, cities, states.
+row
+cell #[code LOC]
+cell Non-GPE locations, mountain ranges, bodies of water.
+row
+cell #[code PRODUCT]
+cell Vehicles, weapons, foods, etc. (Not services)
+row
+cell #[code EVENT]
+cell Named hurricanes, battles, wars, sports events, etc.
+row
+cell #[code WORK_OF_ART]
+cell Titles of books, songs, etc.
+row
+cell #[code LAW]
+cell Named documents made into laws
+row
+cell #[code LANGUAGE]
+cell Any named language
p The following values are also annotated in a style similar to names:
+table(["Entity Type", "Description"])
+row
+cell #[code DATE]
+cell Absolute or relative dates or periods
+row
+cell #[code TIME]
+cell Times smaller than a day
+row
+cell #[code PERCENT]
+cell Percentage (including “%”)
+row
+cell #[code MONEY]
+cell Monetary values, including unit
+row
+cell #[code QUANTITY]
+cell Measurements, as of weight or distance
+row
+cell #[code ORDINAL]
+cell "first", "second"
+row
+cell #[code CARDINAL]
+cell Numerals that do not fall under another type

View File

@ -0,0 +1,135 @@
//- 💫 DOCS > API > DEPENDENCYPARSER
include ../../_includes/_mixins
p Annotate syntactic dependencies on #[code Doc] objects.
+h(2, "load") DependencyParser.load
+tag classmethod
p Load the statistical model from the supplied path.
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell #[code Path]
+cell The path to load from.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared by the documents to be processed.
+row
+cell #[code require]
+cell bool
+cell Whether to raise an error if the files are not found.
+footrow
+cell return
+cell #[code DependencyParser]
+cell The newly constructed object.
+h(2, "init") DependencyParser.__init__
+tag method
p Create a #[code DependencyParser].
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared with documents to be processed.
+row
+cell #[code model]
+cell #[thinc.linear.AveragedPerceptron]
+cell The statistical model.
+footrow
+cell return
+cell #[code DependencyParser]
+cell The newly constructed object.
+h(2, "call") DependencyParser.__call__
+tag method
p
| Apply the dependency parser, setting the heads and dependency relations
| onto the #[code Doc] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to be processed.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "pipe") DependencyParser.pipe
+tag method
p Process a stream of documents.
+table(["Name", "Type", "Description"])
+row
+cell #[code stream]
+cell -
+cell The sequence of documents to process.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel.
+footrow
+cell yield
+cell #[code Doc]
+cell Documents, in order.
+h(2, "update") DependencyParser.update
+tag method
p Update the statistical model.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The example document for the update.
+row
+cell #[code gold]
+cell #[code GoldParse]
+cell The gold-standard annotations, to calculate the loss.
+footrow
+cell return
+cell int
+cell The loss on this example.
+h(2, "step_through") DependencyParser.step_through
+tag method
p Set up a stepwise state, to introspect and control the transition sequence.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to step through.
+footrow
+cell return
+cell #[code StepwiseState]
+cell A state object, to step through the annotation process.

416
website/docs/api/doc.jade Normal file
View File

@ -0,0 +1,416 @@
//- 💫 DOCS > API > DOC
include ../../_includes/_mixins
p A container for accessing linguistic annotations.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code mem]
+cell #[code Pool]
+cell The document's local memory heap, for all C data it owns.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The store of lexical types.
+row
+cell #[code user_data]
+cell -
+cell A generic storage area, for user custom data.
+row
+cell #[code is_tagged]
+cell bool
+cell
| A flag indicating that the document has been part-of-speech
| tagged.
+row
+cell #[code is_parsed]
+cell bool
+cell A flag indicating that the document has been syntactically parsed.
+row
+cell #[code sentiment]
+cell float
+cell The document's positivity/negativity score, if available.
+row
+cell #[code user_hooks]
+cell dict
+cell
| A dictionary that allows customisation of the #[code Doc]'s
| properties.
+row
+cell #[code user_token_hooks]
+cell dict
+cell
| A dictionary that allows customisation of properties of
| #[code Token] chldren.
+row
+cell #[code user_span_hooks]
+cell dict
+cell
| A dictionary that allows customisation of properties of
| #[code Span] chldren.
+h(2, "init") Doc.__init__
+tag method
p Construct a #[code Doc] object.
+aside("Note")
| The most common way to get a #[code Doc] object is via the #[code nlp]
| object. This method is usually only used for deserialization or preset
| tokenization.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A storage container for lexical types.
+row
+cell #[code words]
+cell -
+cell A list of strings to add to the container.
+row
+cell #[code spaces]
+cell -
+cell
| A list of boolean values indicating whether each word has a
| subsequent space. Must have the same length as #[code words], if
| specified. Defaults to a sequence of #[code True].
+footrow
+cell return
+cell #[code Doc]
+cell The newly constructed object.
+h(2, "getitem") Doc.__getitem__
+tag method
p Get a #[code Token] object.
+aside-code("Example").
doc = nlp(u'Give it back! He pleaded.')
assert doc[0].text == 'Give'
assert doc[-1].text == '.'
span = doc[1:1]
assert span.text == 'it back'
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The index of the token.
+footrow
+cell return
+cell #[code Token]
+cell The token at #[code doc[i]].
p Get a #[code Span] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code start_end]
+cell tuple
+cell The slice of the document to get.
+footrow
+cell return
+cell #[code Span]
+cell The span at #[code doc[start : end]].
+h(2, "iter") Doc.__iter__
+tag method
p Iterate over #[code Token] objects.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A #[code Token] object.
+h(2, "len") Doc.__len__
+tag method
p Get the number of tokens in the document.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of tokens in the document.
+h(2, "similarity") Doc.similarity
+tag method
p
| Make a semantic similarity estimate. The default estimate is cosine
| similarity using an average of word vectors.
+table(["Name", "Type", "Description"])
+row
+cell #[code other]
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+footrow
+cell return
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "to_array") Doc.to_array
+tag method
p
| Export the document annotations to a numpy array of shape #[code N*M]
| where #[code N] is the length of the document and #[code M] is the number
| of attribute IDs to export. The values will be 32-bit integers.
+aside-code("Example").
from spacy import attrs
doc = nlp(text)
# All strings mapped to integers, for easy export to numpy
np_array = doc.to_array([attrs.LOWER, attrs.POS,
attrs.ENT_TYPE, attrs.IS_ALPHA])
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_ids]
+cell ints
+cell A list of attribute ID ints.
+footrow
+cell return
+cell #[code numpy.ndarray[ndim=2, dtype='int32']]
+cell
| The exported attributes as a 2D numpy array, with one row per
| token and one column per attribute.
+h(2, "count_by") Doc.count_by
+tag method
p Count the frequencies of a given attribute.
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_id]
+cell int
+cell The attribute ID
+footrow
+cell return
+cell dict
+cell A dictionary mapping attributes to integer counts.
+h(2, "from_array") Doc.from_array
+tag method
p Load attributes from a numpy array.
+table(["Name", "Type", "Description"])
+row
+cell #[code attr_ids]
+cell ints
+cell A list of attribute ID ints.
+row
+cell #[code values]
+cell #[code numpy.ndarray[ndim=2, dtype='int32']]
+cell The attribute values to load.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "to_bytes") Doc.to_bytes
+tag method
p Export the document contents to a binary string.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell bytes
+cell
| A losslessly serialized copy of the #[code Doc] including all
| annotations.
+h(2, "from_bytes") Doc.from_bytes
+tag method
p Import the document contents from a binary string.
+table(["Name", "Type", "Description"])
+row
+cell #[code byte_string]
+cell bytes
+cell The string to load from.
+footrow
+cell return
+cell #[code Doc]
+cell The #[code self] variable.
+h(2, "merge") Doc.merge
+tag method
p
| Retokenize the document, such that the span at
| #[code doc.text[start_idx : end_idx]] is merged into a single token. If
| #[code start_idx] and #[end_idx] do not mark start and end token
| boundaries, the document remains unchanged.
+table(["Name", "Type", "Description"])
+row
+cell #[code start_idx]
+cell int
+cell The character index of the start of the slice to merge.
+row
+cell #[code end_idx]
+cell int
+cell The character index after the end of the slice to merge.
+row
+cell #[code **attributes]
+cell -
+cell
| Attributes to assign to the merged token. By default,
| attributes are inherited from the syntactic root token of
| the span.
+footrow
+cell return
+cell #[code Token]
+cell
| The newly merged token, or None if the start and end
| indices did not fall at token boundaries
+h(2, "read_bytes") Doc.read_bytes
+tag staticmethod
p A static method, used to read serialized #[code Doc] objects from a file.
+aside-code("Example").
from spacy.tokens.doc import Doc
loc = 'test_serialize.bin'
with open(loc, 'wb') as file_:
file_.write(nlp(u'This is a document.').to_bytes())
file_.write(nlp(u'This is another.').to_bytes())
docs = []
with open(loc, 'rb') as file_:
for byte_string in Doc.read_bytes(file_):
docs.append(Doc(nlp.vocab).from_bytes(byte_string))
assert len(docs) == 2
+table(["Name", "Type", "Description"])
+row
+cell file
+cell buffer
+cell A binary buffer to read the serialized annotations from.
+footrow
+cell yield
+cell bytes
+cell Binary strings from with documents can be loaded.
+h(2, "text") Doc.text
+tag property
p A unicode representation of the document text.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell unicode
+cell The original verbatim text of the document.
+h(2, "text_with_ws") Doc.text_with_ws
+tag property
p
| An alias of #[code Doc.text], provided for duck-type compatibility with
| #[code Span] and #[code Token].
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell unicode
+cell The original verbatim text of the document.
+h(2, "sents") Doc.sents
+tag property
p Iterate over the sentences in the document.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Span]
+cell Sentences in the document.
+h(2, "ents") Doc.ents
+tag property
p Iterate over the entities in the document.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Span]
+cell Entities in the document.
+h(2, "noun_chunks") Doc.noun_chunks
+tag property
p
| Iterate over the base noun phrases in the document. A base noun phrase,
| or "NP chunk", is a noun phrase that does not permit other NPs to be
| nested within it.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Span]
+cell Noun chunks in the document
+h(2, "vector") Doc.vector
+tag property
p
| A real-valued meaning representation. Defaults to an average of the
| token vectors.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the document's semantics.
+h(2, "has_vector") Doc.has_vector
+tag property
p
| A boolean value indicating whether a word vector is associated with the
| object.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell bool
+cell Whether the document has a vector data attached.

View File

@ -0,0 +1,133 @@
//- 💫 DOCS > API > ENTITYRECOGNIZER
include ../../_includes/_mixins
p Annotate named entities on #[code Doc] objects.
+h(2, "load") EntityRecognizer.load
+tag classmethod
p Load the statistical model from the supplied path.
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell #[code Path]
+cell The path to load from.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared by the documents to be processed.
+row
+cell #[code require]
+cell bool
+cell Whether to raise an error if the files are not found.
+footrow
+cell return
+cell #[code EntityRecognizer]
+cell The newly constructed object.
+h(2, "init") EntityRecognizer.__init__
+tag method
p Create an #[code EntityRecognizer].
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared with documents to be processed.
+row
+cell #[code model]
+cell #[thinc.linear.AveragedPerceptron]
+cell The statistical model.
+footrow
+cell return
+cell #[code EntityRecognizer]
+cell The newly constructed object.
+h(2, "call") EntityRecognizer.__call__
+tag method
p Apply the entity recognizer, setting the NER tags onto the #[code Doc] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to be processed.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "pipe") EntityRecognizer.pipe
+tag method
p Process a stream of documents.
+table(["Name", "Type", "Description"])
+row
+cell #[code stream]
+cell -
+cell The sequence of documents to process.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel.
+footrow
+cell yield
+cell #[code Doc]
+cell Documents, in order.
+h(2, "update") EntityRecognizer.update
+tag method
p Update the statistical model.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The example document for the update.
+row
+cell #[code gold]
+cell #[code GoldParse]
+cell The gold-standard annotations, to calculate the loss.
+footrow
+cell return
+cell int
+cell The loss on this example.
+h(2, "step_through") EntityRecognizer.step_through
+tag method
p Set up a stepwise state, to introspect and control the transition sequence.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to step through.
+footrow
+cell return
+cell #[code StepwiseState]
+cell A state object, to step through the annotation process.

View File

@ -0,0 +1,103 @@
//- 💫 DOCS > API > GOLDPARSE
include ../../_includes/_mixins
p Collection for training annotations.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code tags]
+cell list
+cell The part-of-speech tag annotations.
+row
+cell #[code heads]
+cell list
+cell The syntactic head annotations.
+row
+cell #[code labels]
+cell list
+cell The syntactic relation-type annotations.
+row
+cell #[code ents]
+cell list
+cell The named entity annotations.
+row
+cell #[code cand_to_gold]
+cell list
+cell The alignment from candidate tokenization to gold tokenization.
+row
+cell #[code gold_to_cand]
+cell list
+cell The alignment from gold tokenization to candidate tokenization.
+h(2, "init") GoldParse.__init__
+tag method
p Create a GoldParse.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document the annotations refer to.
+row
+cell #[code words]
+cell -
+cell A sequence of unicode word strings.
+row
+cell #[code tags]
+cell -
+cell A sequence of strings, representing tag annotations.
+row
+cell #[code heads]
+cell -
+cell A sequence of integers, representing syntactic head offsets.
+row
+cell #[code deps]
+cell -
+cell A sequence of strings, representing the syntactic relation types.
+row
+cell #[code entities]
+cell -
+cell A sequence of named entity annotations, either as BILUO tag strings, or as #[code (start_char, end_char, label)] tuples, representing the entity positions.
+footrow
+cell return
+cell #[code GoldParse]
+cell The newly constructed object.
+h(2, "len") GoldParse.__len__
+tag method
p Get the number of gold-standard tokens.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of gold-standard tokens.
+h(2, "is_projective") GoldParse.is_projective
+tag property
p
| Whether the provided syntactic annotations form a projective dependency
| tree.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell bool
+cell Whether annotations form projective tree.

239
website/docs/api/index.jade Normal file
View File

@ -0,0 +1,239 @@
//- 💫 DOCS > API > FACTS & FIGURES
include ../../_includes/_mixins
+h(2, "comparison") Feature comparison
p
| Here's a quick comparison of the functionalities offered by spaCy,
| #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") SyntaxNet],
| #[+a("http://www.nltk.org/py-modindex.html") NLTK] and
| #[+a("http://stanfordnlp.github.io/CoreNLP/") CoreNLP].
+table([ "", "spaCy", "SyntaxNet", "NLTK", "CoreNLP"])
+row
+cell Easy installation
each icon in [ "pro", "con", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Python API
each icon in [ "pro", "con", "pro", "con" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Multi-language support
each icon in [ "con", "pro", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Tokenization
each icon in [ "pro", "pro", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Part-of-speech tagging
each icon in [ "pro", "pro", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Sentence segmentation
each icon in [ "pro", "pro", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Dependency parsing
each icon in [ "pro", "pro", "con", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Entity Regonition
each icon in [ "pro", "con", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Integrated word vectors
each icon in [ "pro", "con", "con", "con" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Sentiment analysis
each icon in [ "pro", "con", "pro", "pro" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell Coreference resolution
each icon in [ "con", "con", "con", "pro" ]
+cell.u-text-center #[+procon(icon)]
+h(2, "benchmarks") Benchmarks
p
| Two peer-reviewed papers in 2015 confirm that it offers the
| #[strong fastest syntactic parser in the world] and that
| #[strong its accuracy is within 1% of the best] available. The few
| systems that are more accurate are 20&times; slower or more.
+aside("About the evaluation")
| The first of the evaluations was published by #[strong Yahoo! Labs] and
| #[strong Emory University], as part of a survey of current parsing
| technologies #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") (Choi et al., 2015)].
| Their results and subsequent discussions helped us develop a novel
| psychologically-motivated technique to improve spaCy's accuracy, which
| we published in joint work with Macquarie University
| #[+a("https://aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)].
+table([ "System", "Language", "Accuracy", "Speed (wps)"])
+row
each data in [ "spaCy", "Cython", "91.8", "13,963" ]
+cell #[strong=data]
+row
each data in [ "ClearNLP", "Java", "91.7", "10,271" ]
+cell=data
+row
each data in [ "CoreNLP", "Java", "89.6", "8,602"]
+cell=data
+row
each data in [ "MATE", "Java", "92.5", "550"]
+cell=data
+row
each data in [ "Turbo", "C++", "92.4", "349" ]
+cell=data
+h(3, "parse-accuracy") Parse accuracy
p
| In 2016, Google released their
| #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") SyntaxNet]
| library, setting a new state of the art for syntactic dependency parsing
| accuracy. SyntaxNet's algorithm is very similar to spaCy's. The main
| difference is that SyntaxNet uses a neural network while spaCy uses a
| sparse linear model.
+aside("Methodology")
| #[+a("http://arxiv.org/abs/1603.06042") Andor et al. (2016)] chose
| slightly different experimental conditions from
| #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al. (2015)],
| so the two accuracy tables here do not present directly comparable
| figures. We have only evaluated spaCy in the "News" condition following
| the SyntaxNet methodology. We don't yet have benchmark figures for the
| "Web" and "Questions" conditions.
+table([ "System", "News", "Web", "Questions" ])
+row
+cell spaCy
each data in [ 92.8, "n/a", "n/a" ]
+cell=data
+row
+cell #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") Parsey McParseface]
each data in [ 94.15, 89.08, 94.77 ]
+cell=data
+row
+cell #[+a("http://www.cs.cmu.edu/~ark/TurboParser/") Martins et al. (2013)]
each data in [ 93.10, 88.23, 94.21 ]
+cell=data
+row
+cell #[+a("http://research.google.com/pubs/archive/38148.pdf") Zhang and McDonald (2014)]
each data in [ 93.32, 88.65, 93.37 ]
+cell=data
+row
+cell #[+a("http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf") Weiss et al. (2015)]
each data in [ 93.91, 89.29, 94.17 ]
+cell=data
+row
+cell #[strong #[+a("http://arxiv.org/abs/1603.06042") Andor et al. (2016)]]
each data in [ 94.44, 90.17, 95.40 ]
+cell #[strong=data]
+h(3, "speed-comparison") Detailed speed comparison
p
| Here we compare the per-document processing time of various spaCy
| functionalities against other NLP libraries. We show both absolute
| timings (in ms) and relative performance (normalized to spaCy). Lower is
| better.
+aside("Methodology")
| #[strong Set up:] 100,000 plain-text documents were streamed from an
| SQLite3 database, and processed with an NLP library, to one of three
| levels of detail — tokenization, tagging, or parsing. The tasks are
| additive: to parse the text you have to tokenize and tag it. The
| pre-processing was not subtracted from the times — I report the time
| required for the pipeline to complete. I report mean times per document,
| in milliseconds.#[br]#[br]
| #[strong Hardware]: Intel i7-3770 (2012)#[br]
| #[strong Implementation]: #[+src(gh("spacy-benchmarks")) spacy-benchmarks]
+table
+row.u-text-label.u-text-center
th.c-table__head-cell
th.c-table__head-cell(colspan="3") Absolute (ms per doc)
th.c-table__head-cell(colspan="3") Relative (to spaCy)
+row
each column in ["System", "Tokenize", "Tag", "Parse", "Tokenize", "Tag", "Parse"]
th.c-table__head-cell.u-text-label=column
+row
+cell #[strong spaCy]
each data in [ "0.2ms", "1ms", "19ms"]
+cell #[strong=data]
each data in [ "1x", "1x", "1x" ]
+cell=data
+row
each data in [ "CoreNLP", "2ms", "10ms", "49ms", "10x", "10x", "2.6x"]
+cell=data
+row
each data in [ "ZPar", "1ms", "8ms", "850ms", "5x", "8x", "44.7x" ]
+cell=data
+row
each data in [ "NLTK", "4ms", "443ms", "n/a", "20x", "443x", "n/a" ]
+cell=data
+h(3, "ner") Named entity comparison
p
| #[+a("https://aclweb.org/anthology/W/W16/W16-2703.pdf") Jiang et al. (2016)]
| present several detailed comparisons of the named entity recognition
| models provided by spaCy, CoreNLP, NLTK and LingPipe. Here we show their
| evaluation of person, location and organization accuracy on Wikipedia.
+aside("Methodology")
| Making a meaningful comparison of different named entity recognition
| systems is tricky. Systems are often trained on different data, which
| usually have slight differences in annotation style. For instance, some
| corpora include titles as part of person names, while others don't.
| These trivial differences in convention can distort comparisons
| significantly. Jiang et al.'s #[em partial overlap] metric goes a long
| way to solving this problem.
+table([ "System", "Precision", "Recall", "F-measure" ])
+row
+cell spaCy
each data in [ 0.7240, 0.6514, 0.6858 ]
+cell=data
+row
+cell #[strong CoreNLP]
each data in [ 0.7914, 0.7327, 0.7609 ]
+cell #[strong=data]
+row
+cell NLTK
each data in [ 0.5136, 0.6532, 0.5750 ]
+cell=data
+row
+cell LingPipe
each data in [ 0.5412, 0.5357, 0.5384 ]
+cell=data

View File

@ -0,0 +1,138 @@
//- 💫 DOCS > API > LANGUAGE
include ../../_includes/_mixins
p A text processing pipeline.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A container for the lexical types.
+row
+cell #[code tokenizer]
+cell #[code Tokenizer]
+cell Find word boundaries and create #[code Doc] object.
+row
+cell #[code tagger]
+cell #[code Tagger]
+cell Annotate #[code Doc] objects with POS tags.
+row
+cell #[code parser]
+cell #[code DependencyParser]
+cell Annotate #[code Doc] objects with syntactic dependencies.
+row
+cell #[code entity]
+cell #[code EntityRecognizer]
+cell Annotate #[code Doc] objects with named entities.
+row
+cell #[code matcher]
+cell #[code Matcher]
+cell Rule-based sequence matcher.
+row
+cell #[code make_doc]
+cell #[code lambda text: Doc]
+cell Create a #[code Doc] object from unicode text.
+row
+cell #[code pipeline]
+cell -
+cell Sequence of annotation functions.
+h(2, "init") Language.__init__
+tag method
p Create or load the pipeline.
+table(["Name", "Type", "Description"])
+row
+cell #[code **kwrags]
+cell -
+cell Keyword arguments indicating which defaults to override.
+footrow
+cell return
+cell #[code Language]
+cell #[code self]
+h(2, "call") Language.__call__
+tag method
p Apply the pipeline to a single text.
+aside-code("Example").
from spacy.en import English
nlp = English()
doc = nlp('An example sentence. Another example sentence.')
doc[0].orth_, doc[0].head.tag_
# ('An', 'NN')
+table(["Name", "Type", "Description"])
+row
+cell #[code text]
+cell unicode
+cell The text to be processed.
+row
+cell #[code tag]
+cell bool
+cell Whether to apply the part-of-speech tagger.
+row
+cell #[code parse]
+cell bool
+cell Whether to apply the syntactic dependency parser.
+row
+cell #[code entity]
+cell bool
+cell Whether to apply the named entity recognizer.
+footrow
+cell return
+cell #[code Doc]
+cell A container for accessing the linguistic annotations.
+h(2, "pipe") Language.pipe
+tag method
p
| Process texts as a stream, and yield #[code Doc] objects in order.
| Supports GIL-free multi-threading.
+aside-code("Example").
texts = [u'One document.', u'...', u'Lots of documents']
for doc in nlp.pipe(texts, batch_size=50, n_threads=4):
assert doc.is_parsed
+table(["Name", "Type", "Description"])
+row
+cell #[code texts]
+cell -
+cell A sequence of unicode objects.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of worker threads to use. If #[code -1], OpenMP will
| decide how many to use at run time. Default is #[code 2].
+row
+cell #[code batch_size]
+cell int
+cell The number of texts to buffer.
+footrow
+cell yield
+cell #[code Doc]
+cell Containers for accessing the linguistic annotations.

View File

@ -0,0 +1,239 @@
//- 💫 DOCS > API > LEXEME
include ../../_includes/_mixins
p An entry in the vocabulary.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell
+row
+cell #[code lower]
+cell int
+cell Lower-case form of the word.
+row
+cell #[code lower_]
+cell unicode
+cell Lower-case form of the word.
+row
+cell #[code shape]
+cell int
+cell Transform of the word's string, to show orthographic features.
+row
+cell #[code shape_]
+cell unicode
+cell Transform of the word's string, to show orthographic features.
+row
+cell #[code prefix]
+cell int
+cell Length-N substring from the start of the word. Defaults to #[code N=1].
+row
+cell #[code prefix_]
+cell unicode
+cell Length-N substring from the start of the word. Defaults to #[code N=1].
+row
+cell #[code suffix]
+cell int
+cell Length-N substring from the end of the word. Defaults to #[code N=3].
+row
+cell #[code suffix_]
+cell unicode
+cell Length-N substring from the start of the word. Defaults to #[code N=3].
+row
+cell #[code is_alpha]
+cell bool
+cell Equivalent to #[code word.orth_.isalpha()].
+row
+cell #[code is_ascii]
+cell bool
+cell Equivalent to #[code [any(ord(c) >= 128 for c in word.orth_)]].
+row
+cell #[code is_digit]
+cell bool
+cell Equivalent to #[code word.orth_.isdigit()].
+row
+cell #[code is_lower]
+cell bool
+cell Equivalent to #[code word.orth_.islower()].
+row
+cell #[code is_title]
+cell bool
+cell Equivalent to #[code word.orth_.istitle()].
+row
+cell #[code is_punct]
+cell bool
+cell Equivalent to #[code word.orth_.ispunct()].
+row
+cell #[code is_space]
+cell bool
+cell Equivalent to #[code word.orth_.isspace()].
+row
+cell #[code like_url]
+cell bool
+cell Does the word resemble a URL?
+row
+cell #[code like_num]
+cell bool
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
+row
+cell #[code like_email]
+cell bool
+cell Does the word resemble an email address?
+row
+cell #[code is_oov]
+cell bool
+cell Is the word out-of-vocabulary?
+row
+cell #[code is_stop]
+cell bool
+cell Is the word part of a "stop list"?
+row
+cell #[code lang]
+cell int
+cell Language of the parent vocabulary.
+row
+cell #[code lang_]
+cell unicode
+cell Language of the parent vocabulary.
+row
+cell #[code prob]
+cell float
+cell Smoothed log probability estimate of token's type.
+row
+cell #[code sentiment]
+cell float
+cell A scalar value indicating the positivity or negativity of the token.
+row
+cell #[code lex_id]
+cell int
+cell ID of the token's lexical type.
+row
+cell #[code text]
+cell unicode
+cell Verbatim text content.
+h(2, "init") Lexeme.__init__
+tag method
p Create a #[code Lexeme] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The parent vocabulary.
+row
+cell #[code orth]
+cell int
+cell The orth id of the lexeme.
+footrow
+cell return
+cell #[code Lexeme]
+cell The newly constructed object.
+h(2, "set_flag") Lexeme.set_flag
+tag method
p Change the value of a boolean flag.
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to set.
+row
+cell #[code value]
+cell bool
+cell The new value of the flag.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "check_flag") Lexeme.check_flag
+tag method
p Check the value of a boolean flag.
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to query.
+footrow
+cell return
+cell bool
+cell The value of the flag.
+h(2, "similarity") Lexeme.similarity
+tag method
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
+table(["Name", "Type", "Description"])
+row
+cell #[code other]
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+footrow
+cell return
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "vector") Lexeme.vector
+tag property
p A real-valued meaning representation.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
+cell A real-valued meaning representation.
+h(2, "has_vector") Lexeme.has_vector
+tag property
p A boolean value indicating whether a word vector is associated with the object.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell bool
+cell Whether a word vector is associated with the object.

View File

@ -0,0 +1,179 @@
//- 💫 DOCS > API > MATCHER
include ../../_includes/_mixins
p Match sequences of tokens, based on pattern rules.
+h(2, "load") Matcher.load
+tag classmethod
p Load the matcher and patterns from a file path.
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell #[code Path]
+cell Path to a JSON-formatted patterns file.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary that the documents to match over will refer to.
+footrow
+cell return
+cell #[code Matcher]
+cell The newly constructed object.
+h(2, "init") Matcher.__init__
+tag method
p Create the Matcher.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell
| The vocabulary object, which must be shared with the documents
| the matcher will operate on.
+row
+cell #[code patterns]
+cell dict
+cell Patterns to add to the matcher.
+footrow
+cell return
+cell #[code Matcher]
+cell The newly constructed object.
+h(2, "call") Matcher.__call__
+tag method
p Find all token sequences matching the supplied patterns on the Doc.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The document to match over.
+footrow
+cell return
+cell list
+cell
| A list of#[code (entity_key, label_id, start, end)] tuples,
| describing the matches. A match tuple describes a
| #[code span doc[start:end]]. The #[code label_id] and
| #[code entity_key] are both integers.
+h(2, "pipe") Matcher.pipe
+tag method
p Match a stream of documents, yielding them in turn.
+table(["Name", "Type", "Description"])
+row
+cell #[code docs]
+cell -
+cell A stream of documents.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel, if the #[code Matcher] implementation supports
| multi-threading.
+footrow
+cell yield
+cell #[code Doc]
+cell Documents, in order.
+h(2, "add_entity") Matcher.add_entity
+tag method
p Add an entity to the matcher.
+table(["Name", "Type", "Description"])
+row
+cell #[code entity_key]
+cell unicode / int
+cell An ID for the entity.
+row
+cell #[code attrs]
+cell -
+cell Attributes to associate with the Matcher.
+row
+cell #[code if_exists]
+cell unicode
+cell
| #[code 'raise'], #[code 'ignore'] or #[code 'update']. Controls
| what happens if the entity ID already exists. Defaults to
| #[code 'raise'].
+row
+cell #[code acceptor]
+cell -
+cell Callback function to filter matches of the entity.
+row
+cell #[code on_match]
+cell -
+cell Callback function to act on matches of the entity.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "add_pattern") Matcher.add_pattern
+tag method
p Add a pattern to the matcher.
+table(["Name", "Type", "Description"])
+row
+cell #[code entity_key]
+cell unicode / int
+cell An ID for the entity.
+row
+cell #[code token_specs]
+cell -
+cell Description of the pattern to be matched.
+row
+cell #[code label]
+cell unicode / int
+cell Label to assign to the matched pattern. Defaults to #[code ""].
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "has_entity") Matcher.has_entity
+tag method
p Check whether the matcher has an entity.
+table(["Name", "Type", "Description"])
+row
+cell #[code entity_key]
+cell unicode / int
+cell The entity key to check.
+footrow
+cell return
+cell bool
+cell Whether the matcher has the entity.

View File

@ -0,0 +1,14 @@
//- 💫 DOCS > API > PHILOSOPHY
include ../../_includes/_mixins
p Every product needs to know why it exists. Here's what we're trying to with spaCy and why it's different from other NLP libraries.
+h(2) 1. No job too big.
p Most programs get cheaper to run over time, but NLP programs often get more expensive. The data often grows faster than the hardware improves. For web-scale tasks, Moore's law can't save us — so if we want to read the web, we have to sweat performance.
+h(2) 2. Take a stand.
p Most NLP toolkits position themselves as platforms, rather than libraries. They offer a pluggable architecture, and leave it to the user to arrange the components they offer into a useful system. This is fine for researchers, but for production users, this does too little. Components go out of date quickly, and configuring a good system takes very detailed knowledge. Compatibility problems can be extremely subtle. spaCy is therefore extremely opinionated. The API does not expose any algorithmic details. You're free to configure another pipeline, but the core library eliminates redundancy, and only offers one choice of each component.
+h(2) 3. Stay current.
p There's often significant improvement in NLP models year-on-year. This has been especially true recently, given the success of deep learning models. With spaCy, you should be able to build things you couldn't build yesterday. To deliver on that promise, we need to be giving you the latest stuff.

264
website/docs/api/span.jade Normal file
View File

@ -0,0 +1,264 @@
//- 💫 DOCS > API > SPAN
include ../../_includes/_mixins
p A slice from a #[code Doc] object.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code start]
+cell int
+cell The token offset for the start of the span.
+row
+cell #[code end]
+cell int
+cell The token offset for the end of the span.
+row
+cell #[code start_char]
+cell int
+cell The character offset for the end of the span.
+row
+cell #[code end_char]
+cell int
+cell The character offset for the end of the span.
+row
+cell #[code label]
+cell int
+cell The span's label.
+row
+cell #[code label_]
+cell unicode
+cell The span's label.
+row
+cell #[code lemma_]
+cell unicode
+cell The span's lemma.
+row
+cell #[code ent_id]
+cell int
+cell The integer ID of the named entity the token is an instance of.
+row
+cell #[code ent_id_]
+cell unicode
+cell The string ID of the named entity the token is an instance of.
+h(2, "init") Span.__init__
+tag method
p Create a Span object from the #[code slice doc[start : end]].
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code start]
+cell int
+cell The index of the first token of the span.
+row
+cell #[code end]
+cell int
+cell The index of the first token after the span.
+row
+cell #[code label]
+cell int
+cell A label to attach to the span, e.g. for named entities.
+row
+cell #[code vector]
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
+cell A meaning representation of the span.
+footrow
+cell return
+cell #[code Span]
+cell The newly constructed object.
+h(2, "getitem") Span.__getitem__
+tag method
p Get a #[code Token] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The index of the token within the span.
+footrow
+cell return
+cell #[code Token]
+cell The token at #[code span[i]].
p Get a #[code Span] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code start_end]
+cell tuple
+cell The slice of the span to get.
+footrow
+cell return
+cell #[code Span]
+cell The span at #[code span[start : end]].
+h(2, "iter") Span.__iter__
+tag method
p Iterate over #[code Token] objects.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A #[code Token] object.
+h(2, "len") Span.__len__
+tag method
p Get the number of tokens in the span.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of tokens in the span.
+h(2, "similarity") Span.similarity
+tag method
p
| Make a semantic similarity estimate. The default estimate is cosine
| similarity using an average of word vectors.
+table(["Name", "Type", "Description"])
+row
+cell #[code other]
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+footrow
+cell return
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "merge") Span.merge
+tag method
p Retokenize the document, such that the span is merged into a single token.
+table(["Name", "Type", "Description"])
+row
+cell #[code **attributes]
+cell -
+cell
| Attributes to assign to the merged token. By default, attributes
| are inherited from the syntactic root token of the span.
+footrow
+cell return
+cell #[code Token]
+cell The newly merged token.
+h(2, "text") Span.text
+tag property
p A unicode representation of the span text.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell unicode
+cell The original verbatim text of the span.
+h(2, "text_with_ws") Span.text_with_ws
+tag property
p
| The text content of the span with a trailing whitespace character if the
| last token has one.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell unicode
+cell The text content of the span (with trailing whitespace).
+h(2, "sent") Span.sent
+tag property
p The sentence span that this span is a part of.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code Span]
+cell The sentence this is part of.
+h(2, "root") Span.root
+tag property
p
| The token within the span that's highest in the parse tree. If there's a
| tie, the earlist is prefered.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code Token]
+cell The root token.
+h(2, "lefts") Span.lefts
+tag property
p Tokens that are to the left of the span, whose head is within the span.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A left-child of a token of the span.
+h(2, "rights") Span.rights
+tag property
p Tokens that are to the right of the span, whose head is within the span.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A right-child of a token of the span.
+h(2, "subtree") Span.subtree
+tag property
p Tokens that descend from tokens in the span, but fall outside it.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A descendant of a token within the span.

View File

@ -0,0 +1,107 @@
//- 💫 DOCS > API > STRINGSTORE
include ../../_includes/_mixins
p Map strings to and from integer IDs.
+h(2, "init") StringStore.__init__
+tag method
p Create the #[code StringStore].
+table(["Name", "Type", "Description"])
+row
+cell #[code strings]
+cell -
+cell A sequence of unicode strings to add to the store.
+footrow
+cell return
+cell #[code StringStore]
+cell The newly constructed object.
+h(2, "len") StringStore.__len__
+tag method
p Get the number of strings in the store.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of strings in the store.
+h(2, "getitem") StringStore.__getitem__
+tag method
p Retrieve a string from a given integer ID, or vice versa.
+table(["Name", "Type", "Description"])
+row
+cell #[code string_or_id]
+cell bytes / unicode / int
+cell The value to encode.
+footrow
+cell return
+cell unicode / int
+cell The value to retrieved.
+h(2, "contains") StringStore.__contains__
+tag method
p Check whether a string is in the store.
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The string to check.
+footrow
+cell return
+cell bool
+cell Whether the store contains the string.
+h(2, "iter") StringStore.__iter__
+tag method
p Iterate over the strings in the store, in order.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell unicode
+cell A string in the store.
+h(2, "dump") StringStore.dump
+tag method
p Save the strings to a JSON file.
+table(["Name", "Type", "Description"])
+row
+cell #[code file]
+cell buffer
+cell The file to save the strings.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "load") StringStore.load
+tag method
p Load the strings from a JSON file.
+table(["Name", "Type", "Description"])
+row
+cell #[code file]
+cell buffer
+cell The file from which to load the strings.
+footrow
+cell return
+cell #[code None]
+cell -

View File

@ -0,0 +1,117 @@
//- 💫 DOCS > API > TAGGER
include ../../_includes/_mixins
p Annotate part-of-speech tags on #[code Doc] objects.
+h(2, "load") Tagger.load
+tag classmethod
p Load the statistical model from the supplied path.
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell #[code Path]
+cell The path to load from.
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared by the documents to be processed.
+row
+cell #[code require]
+cell bool
+cell Whether to raise an error if the files are not found.
+footrow
+cell return
+cell #[code Tagger]
+cell The newly constructed object.
+h(2, "init") Tagger.__init__
+tag method
p Create a #[code Tagger].
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocabulary. Must be shared with documents to be processed.
+row
+cell #[code model]
+cell #[thinc.linear.AveragedPerceptron]
+cell The statistical model.
+footrow
+cell return
+cell #[code Tagger]
+cell The newly constructed object.
+h(2, "call") Tagger.__call__
+tag method
p Apply the tagger, setting the POS tags onto the #[code Doc] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The tokens to be tagged.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "pipe") Tagger.pipe
+tag method
p Tag a stream of documents.
+table(["Name", "Type", "Description"])
+row
+cell #[code stream]
+cell -
+cell The sequence of documents to tag.
+row
+cell #[code batch_size]
+cell int
+cell The number of documents to accumulate into a working set.
+row
+cell #[code n_threads]
+cell int
+cell
| The number of threads with which to work on the buffer in
| parallel.
+footrow
+cell yield
+cell #[code Doc]
+cell Documents, in order.
+h(2, "update") Tagger.update
+tag method
p Update the statistical model, with tags supplied for the given document.
+table(["Name", "Type", "Description"])
+row
+cell #[code doc]
+cell #[code Doc]
+cell The example document for the update.
+row
+cell #[code gold]
+cell #[code GoldParse]
+cell Manager for the gold-standard tags.
+footrow
+cell return
+cell int
+cell Number of tags predicted correctly.

460
website/docs/api/token.jade Normal file
View File

@ -0,0 +1,460 @@
//- 💫 DOCS > API > TOKEN
include ../../_includes/_mixins
p An individual token — i.e. a word, punctuation symbol, whitespace, etc.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell The vocab object of the parent #[code Doc].
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code i]
+cell int
+cell The index of the token within the parent document.
+row
+cell #[code ent_type]
+cell int
+cell Named entity type.
+row
+cell #[code ent_type_]
+cell unicode
+cell Named entity type.
+row
+cell #[code ent_iob]
+cell int
+cell
| IOB code of named entity tag.
| #[code 1="I", 2="O", B="B"]. #[code 0] means no tag is assigned.
+row
+cell #[code ent_iob_]
+cell unicode
+cell
| IOB code of named entity tag. #[code "B"]
| means the token begins an entity, #[code "I"] means it inside an
| entity, #[code "O"] means it is outside an entity, and
| #[code ""] means no entity tag is set.
+row
+cell #[code ent_id]
+cell int
+cell ID of the entity the token is an instance of, if any.
+row
+cell #[code ent_id_]
+cell unicode
+cell ID of the entity the token is an instance of, if any.
+row
+cell #[code lemma]
+cell int
+cell
| Base form of the word, with no inflectional suffixes.
+row
+cell #[code lemma_]
+cell unicode
+cell Base form of the word, with no inflectional suffixes.
+row
+cell #[code lower]
+cell int
+cell Lower-case form of the word.
+row
+cell #[code lower_]
+cell unicode
+cell Lower-case form of the word.
+row
+cell #[code shape]
+cell int
+cell Transform of the word's string, to show orthographic features.
+row
+cell #[code shape_]
+cell unicode
+cell A transform of the word's string, to show orthographic features.
+row
+cell #[code prefix]
+cell int
+cell Integer ID of a length-N substring from the start of the
| word. Defaults to #[code N=1].
+row
+cell #[code prefix_]
+cell unicode
+cell
| A length-N substring from the start of the word. Defaults to
| #[code N=1].
+row
+cell #[code suffix]
+cell int
+cell
| Length-N substring from the end of the word. Defaults to #[code N=3].
+row
+cell #[code suffix_]
+cell unicode
+cell Length-N substring from the start of the word. Defaults to #[code N=3].
+row
+cell #[code is_alpha]
+cell bool
+cell Equivalent to #[code word.orth_.isalpha()].
+row
+cell #[code is_ascii]
+cell bool
+cell Equivalent to #[code [any(ord(c) >= 128 for c in word.orth_)]].
+row
+cell #[code is_digit]
+cell bool
+cell Equivalent to #[code word.orth_.isdigit()].
+row
+cell #[code is_lower]
+cell bool
+cell Equivalent to #[code word.orth_.islower()].
+row
+cell #[code is_title]
+cell bool
+cell Equivalent to #[code word.orth_.istitle()].
+row
+cell #[code is_punct]
+cell bool
+cell Equivalent to #[code word.orth_.ispunct()].
+row
+cell #[code is_space]
+cell bool
+cell Equivalent to #[code word.orth_.isspace()].
+row
+cell #[code like_url]
+cell bool
+cell Does the word resemble a URL?
+row
+cell #[code like_num]
+cell bool
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
+row
+cell #[code like_email]
+cell bool
+cell Does the word resemble an email address?
+row
+cell #[code is_oov]
+cell bool
+cell Is the word out-of-vocabulary?
+row
+cell #[code is_stop]
+cell bool
+cell Is the word part of a "stop list"?
+row
+cell #[code pos]
+cell int
+cell Coarse-grained part-of-speech.
+row
+cell #[code pos_]
+cell unicode
+cell Coarse-grained part-of-speech.
+row
+cell #[code tag]
+cell int
+cell Fine-grained part-of-speech.
+row
+cell #[code tag_]
+cell unicode
+cell Fine-grained part-of-speech.
+row
+cell #[code dep]
+cell int
+cell Syntactic dependency relation.
+row
+cell #[code dep_]
+cell unicode
+cell Syntactic dependency relation.
+row
+cell #[code lang]
+cell int
+cell Language of the parent document's vocabulary.
+row
+cell #[code lang_]
+cell unicode
+cell Language of the parent document's vocabulary.
+row
+cell #[code prob]
+cell float
+cell Smoothed log probability estimate of token's type.
+row
+cell #[code idx]
+cell int
+cell The character offset of the token within the parent document.
+row
+cell #[code sentiment]
+cell float
+cell A scalar value indicating the positivity or negativity of the token.
+row
+cell #[code lex_id]
+cell int
+cell ID of the token's lexical type.
+row
+cell #[code text]
+cell unicode
+cell Verbatim text content.
+row
+cell #[code text_with_ws]
+cell unicode
+cell Text content, with trailing space character if present.
+row
+cell #[code whitespace]
+cell int
+cell Trailing space character if present.
+row
+cell #[code whitespace_]
+cell unicode
+cell Trailing space character if present.
+h(2, "init") Token.__init__
+tag method
p Construct a #[code Token] object.
+table(["Name", "Type", "Description"])
+row
+cell #[code vocab]
+cell #[code Vocab]
+cell A storage container for lexical types.
+row
+cell #[code doc]
+cell #[code Doc]
+cell The parent document.
+row
+cell #[code offset]
+cell int
+cell The index of the token within the document.
+footrow
+cell return
+cell #[code Token]
+cell The newly constructed object.
+h(2, "len") Token.__len__
+tag method
p Get the number of unicode characters in the token.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of unicode characters in the token.
+h(2, "check_flag") Token.check_flag
+tag method
p Check the value of a boolean flag.
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_id]
+cell int
+cell The attribute ID of the flag to check.
+footrow
+cell return
+cell bool
+cell Whether the flag is set.
+h(2, "nbor") Token.nbor
+tag method
p Get a neighboring token.
+table(["Name", "Type", "Description"])
+row
+cell #[code i]
+cell int
+cell The relative position of the token to get. Defaults to #[code 1].
+footrow
+cell return
+cell #[code Token]
+cell The token at position #[code self.doc[self.i+i]]
+h(2, "similarity") Token.similarity
+tag method
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
+table(["Name", "Type", "Description"])
+row
+cell other
+cell -
+cell
| The object to compare with. By default, accepts #[code Doc],
| #[code Span], #[code Token] and #[code Lexeme] objects.
+footrow
+cell return
+cell float
+cell A scalar similarity score. Higher is more similar.
+h(2, "is_ancestor") Token.is_ancestor
+tag method
p
| Check whether this token is a parent, grandparent, etc. of another
| in the dependency tree.
+table(["Name", "Type", "Description"])
+row
+cell descendant
+cell #[code Token]
+cell Another token.
+footrow
+cell return
+cell bool
+cell Whether this token is the ancestor of the descendant.
+h(2, "vector") Token.vector
+tag property
p A real-valued meaning representation.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
+cell A 1D numpy array representing the token's semantics.
+h(2, "has_vector") Token.has_vector
+tag property
p
| A boolean value indicating whether a word vector is associated with the
| object.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell bool
+cell Whether the token has a vector data attached.
+h(2, "head") Token.head
+tag property
p The syntactic parent, or "governor", of this token.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code Token]
+cell The head.
+h(2, "conjuncts") Token.conjuncts
+tag property
p A sequence of coordinated tokens, including the token itself.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A coordinated token.
+h(2, "children") Token.children
+tag property
p A sequence of the token's immediate syntactic children.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A child token such that #[code child.head==self].
+h(2, "subtree") Token.subtree
+tag property
p A sequence of all the token's syntactic descendents.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell A descendant token such that #[code self.is_ancestor(descendant)].
+h(2, "left_edge") Token.left_edge
+tag property
p The leftmost token of this token's syntactic descendants.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code Token]
+cell The first token such that #[code self.is_ancestor(token)].
+h(2, "right_edge") Token.right_edge
+tag property
p The rightmost token of this token's syntactic descendents.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell #[code Token]
+cell The last token such that #[code self.is_ancestor(token)].
+h(2, "ancestors") Token.ancestors
+tag property
p The rightmost token of this token's syntactic descendants.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Token]
+cell
| A sequence of ancestor tokens such that
| #[code ancestor.is_ancestor(self)].

278
website/docs/api/vocab.jade Normal file
View File

@ -0,0 +1,278 @@
//- 💫 DOCS > API > VOCAB
include ../../_includes/_mixins
p
| A look-up table that allows you to access #[code Lexeme] objects. The
| #[code Vocab] instance also provides access to the #[code StringStore],
| and owns underlying C-data that is shared between #[code Doc] objects.
+h(2, "attributes") Attributes
+table(["Name", "Type", "Description"])
+row
+cell #[code strings]
+cell #[code StringStore]
+cell A table managing the string-to-int mapping.
+row
+cell #[code vectors_length]
+cell int
+cell The dimensionality of the word vectors, if present.
+h(2, "load") Vocab.load
+tag classmethod
p Load the vocabulary from a path.
+table(["Name", "Type", "Description"])
+row
+cell #[code path]
+cell #[code Path]
+cell The path to load from.
+row
+cell #[code lex_attr_getters]
+cell dict
+cell
| A dictionary mapping attribute IDs to functions to compute them.
| Defaults to #[code None].
+row
+cell #[code lemmatizer]
+cell -
+cell A lemmatizer. Defaults to #[code None].
+row
+cell #[code tag_map]
+cell dict
+cell
| A dictionary mapping fine-grained tags to coarse-grained
| parts-of-speech, and optionally morphological attributes.
+row
+cell #[code oov_prob]
+cell float
+cell The default probability for out-of-vocabulary words.
+footrow
+cell return
+cell #[code Vocab]
+cell The newly constructed object.
+h(2, "init") Vocab.__init__
+tag method
p Create the vocabulary.
+table(["Name", "Type", "Description"])
+row
+cell #[code lex_attr_getters]
+cell dict
+cell
| A dictionary mapping attribute IDs to functions to compute them.
| Defaults to #[code None].
+row
+cell #[code lemmatizer]
+cell -
+cell A lemmatizer. Defaults to #[code None].
+row
+cell #[code tag_map]
+cell dict
+cell
| A dictionary mapping fine-grained tags to coarse-grained
| parts-of-speech, and optionally morphological attributes.
+row
+cell #[code oov_prob]
+cell float
+cell The default probability for out-of-vocabulary words.
+footrow
+cell return
+cell #[code Vocab]
+cell The newly constructed object.
+h(2, "len") Vocab.__len__
+tag method
p Get the number of lexemes in the vocabulary.
+table(["Name", "Type", "Description"])
+footrow
+cell return
+cell int
+cell The number of lexems in the vocabulary.
+h(2, "getitem") Vocab.__getitem__
+tag method
p
| Retrieve a lexeme, given an int ID or a unicode string. If a previously
| unseen unicode string is given, a new lexeme is created and stored.
+table(["Name", "Type", "Description"])
+row
+cell #[code id_or_string]
+cell int / unicode
+cell The integer ID of a word, or its unicode string.
+footrow
+cell return
+cell #[code Lexeme]
+cell The lexeme indicated by the given ID.
+h(2, "iter") Span.__iter__
+tag method
p Iterate over the lexemes in the vocabulary.
+table(["Name", "Type", "Description"])
+footrow
+cell yield
+cell #[code Lexeme]
+cell An entry in the vocabulary.
+h(2, "contains") Vocab.__contains__
+tag method
p Check whether the string has an entry in the vocabulary.
+table(["Name", "Type", "Description"])
+row
+cell #[code string]
+cell unicode
+cell The ID string.
+footrow
+cell return
+cell bool
+cell Whether the string has an entry in the vocabulary.
+h(2, "resize_vectors") Vocab.resize_vectors
+tag method
p
| Set #[code vectors_length] to a new size, and allocate more memory for
| the #[code Lexeme] vectors if necessary. The memory will be zeroed.
+table(["Name", "Type", "Description"])
+row
+cell #[code new_size]
+cell int
+cell The new size of the vectors.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "add_flag") Vocab.add_flag
+tag method
p Set a new boolean flag to words in the vocabulary.
+table(["Name", "Type", "Description"])
+row
+cell #[code flag_getter]
+cell dict
+cell A function #[code f(unicode) -> bool], to get the flag value.
+row
+cell #[code flag_id]
+cell int
+cell
| An integer between 1 and 63 (inclusive), specifying the bit at
| which the flag will be stored. If #[code -1], the lowest
| available bit will be chosen.
+footrow
+cell return
+cell int
+cell The integer ID by which the flag value can be checked.
+h(2, "dump") Vocab.dump
+tag method
p Save the lexemes binary data to the given location.
+table(["Name", "Type", "Description"])
+row
+cell #[code loc]
+cell #[code Path]
+cell The path to load from.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "load_lexemes") Vocab.load_lexemes
+tag method
p
+table(["Name", "Type", "Description"])
+row
+cell #[code loc]
+cell unicode
+cell Path to load the lexemes.bin file from.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "dump_vectors") Vocab.dump_vectors
+tag method
p Save the word vectors to a binary file.
+table(["Name", "Type", "Description"])
+row
+cell #[code loc]
+cell #[code Path]
+cell The path to save to.
+footrow
+cell return
+cell #[code None]
+cell -
+h(2, "load_vectors") Vocab.load_vectors
+tag method
p Load vectors from a text-based file.
+table(["Name", "Type", "Description"])
+row
+cell #[code file_]
+cell buffer
+cell
| The file to read from. Entries should be separated by newlines,
| and each entry should be whitespace delimited. The first value
| of the entry should be the word string, and subsequent entries
| should be the values of the vector.
+footrow
+cell return
+cell int
+cell The length of the vectors loaded.
+h(2, "load_vectors_from_bin_loc") Vocab.load_vectors_from_bin_loc
+tag method
p Load vectors from the location of a binary file.
+table(["Name", "Type", "Description"])
+row
+cell #[code loc]
+cell unicode
+cell The path of the binary file to load from.
+footrow
+cell return
+cell int
+cell The length of the vectors loaded.

View File

@ -1,26 +1,27 @@
//- ----------------------------------
//- 💫 DOCS
//- ----------------------------------
include ../_includes/_mixins
- var link_bool = 'http://docs.python.org/library/functions.html#bool'
- var link_int = 'http://docs.python.org/library/functions.html#int'
- var link_unicode = 'http://docs.python.org/library/functions.html#unicode'
p=lorem_short
include _quickstart-install
include _quickstart-examples
+aside("Help us improve the docs")
| Did you spot a mistake or come across explanations that
| are unclear? You can find a "Suggest edits" button at the
| bottom at each page that points you to the source.
| We always appreciate
| #[+a(gh("spaCy") + "/pulls") pull requests].#[br]#[br]
| Have you built something cool with spaCy, or did you
| write a tutorial to help others use spaCy?
| #[a(href="mailto:#{EMAIL}") Let us know!]
+h(2, "api") API
+grid
each details, title in sections
+card(false, false)
a(href=details.url)
+svg("graphics", details.svg, 300, 150).u-color-theme
include _api-language
include _api-doc
include _api-token
include _api-span
include _api-lexeme
include _api-vocab
include _api-stringstore
include _api-matcher
a(href=details.url)
+h(3)=title
include _annotation-specs
include _tutorials
p=details.description
+button(details.url, true, "primary")(target="_self") View

View File

@ -1,49 +0,0 @@
{
"training": {
"title": "Training the tagger, entity recogniser and parser",
"date": "2016-10-17",
"description": "This tutorial describes how to train new statistical models for spaCy's part-of-speech tagger, named entity recognizer and dependency parser."
},
"custom-pipelines": {
"title": "Custom Pipelines",
"date": "2016-10-17",
"description": "spaCy 1.0 introduces dynamic pipelines, so that you can easily create custom workflows. This tutorial describes the feature, and introduces experimental support for dynamic Token attributes. The tutorial also discusses how we can make it easier to use bidirectional LSTMs with spaCy."
},
"rule-based-matcher": {
"title": "Rule-based Matcher",
"date": "2016-10-17",
"description": "spaCy features a rule-matching engine that operates over tokens. The rules can refer to token annotations and flags, and matches support callbacks to accept, modify and/or act on the match. The rule matcher also allows you to associate patterns with entity IDs, to allow some basic entity linking or disambiguation."
},
"load-new-word-vectors": {
"title": "Load new word vectors",
"date": "2015-09-24",
"description": "Word vectors allow simple similarity queries, and drive many NLP applications. This tutorial explains how to load custom word vectors into spaCy, to make use of task or data-specific representations."
},
"byo-annotations": {
"title": "Using Pre-existing Tokenization, Tags, and Other Annotations",
"date": "2016-04-15",
"description": "spaCy assumes by default that your data is raw text. However, sometimes your data is partially annotated, e.g. with pre-existing tokenization, part-of-speech tags, etc. This tutorial explains how to use these annotations in spaCy."
},
"mark-adverbs": {
"title": "Mark all adverbs, particularly for verbs of speech",
"date": "2015-08-18",
"description": "Let's say you're developing a proofreading tool, or possibly an IDE for writers. You're convinced by Stephen King's advice that adverbs are not your friend so you want to highlight all adverbs."
},
"syntax-search": {
"title": "Search Reddit for comments about Google doing something",
"date": "2015-08-18",
"description": "Example use of the spaCy NLP tools for data exploration. Here we will look for Reddit comments that describe Google doing something, i.e. discuss the company's actions. This is difficult, because other senses of \"Google\" now dominate usage of the word in conversation, particularly references to using Google products."
},
"twitter-filter": {
"title": "Finding Relevant Tweets",
"date": "2015-08-18",
"description": "In this tutorial, we will use word vectors to search for tweets about Jeb Bush. We'll do this by building up two word lists: one that represents the type of meanings in the Jeb Bush tweets, and another to help screen out irrelevant tweets that mention the common, ambiguous word \"bush\"."
}
}

View File

@ -1,117 +0,0 @@
include ../../_includes/_mixins
p.u-text-large spaCy assumes by default that your data is raw text. However, sometimes your data is partially annotated, e.g. with pre-existing tokenization, part-of-speech tags, etc. This tutorial explains how to use these annotations in spaCy.
+h(2, "quick-reference") Quick Reference
+table(['Description', 'Usage'], 'code')
+row
+cell Use pre-existing tokenization
+cell #[code.lang-python doc = Doc(nlp.vocab, [('A', True), ('token', False), ('!', False)])]
+row
+cell Use pre-existing tokenization (deprecated)
+cell #[code.lang-python doc = nlp.tokenizer.tokens_from_list([u'A', u'token', u'!'])]
+row
+cell Assign pre-existing tags
+cell #[code.lang-python nlp.tagger.tag_from_strings(doc, ['DT', 'NN'])]
+row
+cell Assign named entity annotations from an array
+cell #[code.lang-python doc.from_array([ENT_TYPE, ENT_IOB], values)]
+row
+cell Assign dependency parse annotations from an array
+cell #[code.lang-python doc.from_array([HEAD, DEP], values)]
+h(2, "examples") Examples
+code('python', 'Tokenization').
import spacy
nlp = spacy.load('en')
tokens = [u'A', u'list', u'of', u'strings', u'.']
doc = nlp.tokenizer.tokens_from_list(tokens)
assert len(doc) == len(tokens)
# With this method, we don't get to specify how the corresponding string
# would be spaced, so we have to assume a space before every token.
assert doc.text == u'A list of strings .'
+code('python', 'Tokenization').
import spacy
from spacy.tokens import Doc
nlp = spacy.load('en')
tokens = [u'A', u'list', u'of', u'strings', u'.']
has_space = [True, True, True, False, False]
doc = Doc(nlp.vocab, orth_and_spaces=zip(tokens, has_space))
assert len(doc) == len(tokens)
# Spacing is correct, given by boolean values above.
assert doc.text == u'A list of strings.'
# Here's how it would look with different boolean values.
tokens = [u'A', u'list', u'of', u'strings', u'.']
has_space = [False, True, True, True, False]
doc = Doc(nlp.vocab, orth_and_spaces=zip(tokens, has_space))
assert doc.text == u'Alist of strings .'
+code('python', 'POS Tags').
import spacy
nlp = spacy.load('en')
# Tokenize a string into a Doc, but don't apply the whole pipeline ---
# that is, don't predict the part-of-speech tags, syntactic parse, named
# entities, etc.
doc = nlp.tokenizer(u'A unicode string, untokenized.')
nlp.tagger.tag_from_strings([u'DT', u'JJ', u'NN', u',', u'VBN', u'.'])
# Now predict dependency parse and named entities. Note that if you assign
# tags in a way that's very unlike the behaviour of the POS tagger model,
# the subsequent models may perform worse. These models use the POS tags
# as features, so if you give them unexpected tags, you may be giving them
# run-time conditions that don't resemble the training data.
nlp.parser(doc)
nlp.entity(doc)
+code('python', 'Dependency Parse').
import spacy
from spacy.attrs import HEAD, DEP
from spacy.symbols import det, nmod, root, punct
from numpy import ndarray
nlp = spacy.load('en')
# Get the Doc object, and apply the pipeline except the dependency parser
doc = nlp(u'A unicode string.', parse=False)
columns = [HEAD, DEP]
values = ndarray(shape=(len(columns), len(doc)), dtype='int32')
# Syntactic parse specified as head offsets
heads = [2, 1, 0, -1]
# Integer IDs for the dependency labels. See the parse in the displaCy
# demo at spacy.io/demos/displacy
labels = [det, nmod, root, punct]
values[0] = heads
values[1] = labels
doc.from_array(columns, values)
+code('python', 'Named Entities').
import spacy
from spacy.attrs import ENT_TYPE, ENT_IOB
from spacy.symbols import PERSON, ORG
from numpy import ndarray
nlp = spacy.load('en')
# Get the Doc object, and apply the pipeline except the entity recognizer
doc = nlp(u'My name is Matt.', entity=False)
columns = [ENT_TYPE, ENT_IOB]
values = ndarray(shape=(len(columns), len(doc)), dtype='int32')
# IOB values are 0=missing, 1=I, 2=O, 3=B
values[0] = [1, 1, 1, 3, 1]
values[1] = [0, 0, 0, PERSON, 0]
doc.from_array(columns, values)

View File

@ -1,89 +0,0 @@
include ../../_includes/_mixins
p.u-text-large spaCy 1.0 introduces dynamic pipelines, so that you can easily create custom workflows. This tutorial describes the feature, and introduces experimental support for dynamic Token attributes. The tutorial also discusses how we can make it easier to use bidirectional LSTMs with spaCy.
p Best practices in NLP are now already pretty different from when I first designed spaCy, even though it's only been two years. The spaCy 1.0 release has a new custom pipeline API to help you use the new hotness.
p Before 1.0, spaCy's pipeline was hard-coded. When you called #[code nlp(text)], spaCy would apply the tokenizer, tagger, parser and named entity recognizer, in sequence. This design assumed that users should subclass the #[code Language] class to customize the pipeline. However, the #[code Language] class has gotten more complicated, and subclassing it now feels like a relatively "serious" thing to do. It feels hard.
p In spaCy 1.0, the order of operations is no longer hard-coded. Instead, the new #[code Language.__call__] does something like this:
+code.
def __call__(self, text):
doc = self.make_doc(text)
for process in self.pipeline:
process(doc)
return doc
p The pipeline can consist of any sequence of callables. They should accept a Doc object, and modify it in-place. You can install the pipeline by passing a callable to the #[code spacy.load()] function, or the constructor of the #[code Language] class:
+code("python", "Basic Example").
import spacy
def arbitrary_fixup_rules(doc):
for token in doc:
if token.text == u'bill' and token.tag_ == u'NNP':
token.tag_ = u'NN'
def custom_pipeline(nlp):
return (nlp.tagger, arbitrary_fixup_rules, nlp.parser, nlp.entity)
nlp = spacy.load('en', pipeline=custom_pipeline)
p The value passed to the #[code pipeline] keyword should be a callable that takes the #[code Language] instance (i.e. #[code nlp]) as an argument. The callable should return a sequence of callables. Each member of the sequence should take a Doc object as its sole positional argument.
+h(2, "experimental-lstm") Experimental: Bidirectional LSTM with custom pipeline
p Probably the most important new technology in Natural Language Processing is the rise of bidirectional LSTM models. These models associate each word with a #[em context-specific] vector. You can also neatly include character level features, so that all relevant aspects of the word are captured. This is pretty much the best way to do feature extraction in NLP at the moment, for almost any task.
p spaCy doesn't feature any pre-trained LSTM models yet, and the details of this API are still being refined. But, because BiLSTMs are proving so important, I wanted to get the proposal up.
p Version 1.0 adds an attribute #[code tensor] to the #[code Doc] object. The #[code tensor] attribute expects a numpy ndarray object, and is publicly writeable. This gives you a place to store the output of the LSTM (or some other real-valued output you want to keep).
+code("python", "Basic Example").
import spacy
from spacy.symbols import LEMMA, TAG
class LSTMModel(object):
def __init__(self, **kwargs):
# Load your weights etc
pass
def __call__(self, doc):
features = doc.to_array([LEMMA, TAG])
doc.tensor = lstm(features)
def custom_pipeline(nlp):
return (nlp.tagger, LSTMModel(), nlp.parser, nlp.entity)
nlp = spacy.load('en', pipeline=custom_pipeline)
p Now, so far we only have the LSTM output as an attribute of the #[code Doc] object. We'd like to be able to do stuff like #[code doc[0].vector], and have that get us the LSTM vector for the token. We can do #[code doc.tensor[doc[0].i]], but I'd like a little more sugar. The details of this part are still experimental — in particular, don't take the names too seriously at this point.
p A relevant implementation detail of spaCy is that the #[code Token] objects are thin proxies, that can be created and destroyed as convenient. The #[code Doc] object owns all the data. This means that we can't simply assign a vector to the #[code Token] objects. Instead, we'll add a hook that gets called by #[code token.vector]. We'll also add space for hooks in other places we might need them.
+aside("Why don't Token and Span own their data?") Well, we want the sequence of tokens to be stored together in memory. That means we really want to have a sequence owned by the #[code Doc] object. But if we have that, then we would have to copy data to the #[code Token] objects. This gets super messy, especially if the tokens should be able to modify their state. The Token therefore proxies to the Doc, to maintain a single source of truth.
p Here's what that looks like:
+code.
def install_vector_hook(doc):
doc.getters_for_token['similarity'] = lambda token: doc.tensor[token.i]
def custom_pipeline(nlp):
return (nlp.tagger, LSTMModel(), install_vector_hook, nlp.parser, nlp.entity)
nlp = spacy.load('en', pipeline=custom_pipeline)
p The #[code install_vector_hook] function will run after the LSTM. It modifies the #[code Doc], setting a value in a dictionary that the #[code Token] knows to look for. When you access the #[code token.vector] property, the token checks whether there's a special-case listener for that attribute:
+code.
@property
def vector(self):
if 'vector' in self.doc.getters_for_tokens:
return self.doc.getters_for_tokens['vector'](self)
else:
return self.c.lex.vector
p As I said — don't take the names too seriously at this point. But do test out the feature — it should be all working. You should be able to customize he behaviour of a lot of attributes this way already. Possibly we should just make it everything on the Token and the Span, but I think it might not be nice to have so much uncertainty about how some values are being calculated. There's such a thing as being too dynamic.

Some files were not shown because too many files have changed in this diff Show More