Update to new website
2
.gitignore
vendored
|
@ -107,3 +107,5 @@ website/demos/sense2vec/
|
|||
# Website
|
||||
website/_deploy.sh
|
||||
website/package.json
|
||||
|
||||
website/blog/announcement.jade
|
||||
|
|
|
@ -1,7 +1,11 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 404 ERROR
|
||||
//- ----------------------------------
|
||||
|
||||
include _includes/_mixins
|
||||
|
||||
p.u-text-large.u-text-center Ooops, this page does not exist. Click #[a(href="javascript:history.go(-1)") here] to go back.
|
||||
+landing-header
|
||||
h1.c-landing__title.u-heading-0
|
||||
| Ooops, this page#[br]
|
||||
| does not exist!
|
||||
|
||||
h2.c-landing__title.u-heading-3.u-padding-small
|
||||
a(href="javascript:history.go(-1)") Click here to go back.
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
|
||||
|
||||
# Source files for the spacy.io website and docs
|
||||
# spacy.io website and docs
|
||||
|
||||
The [spacy.io](https://spacy.io) website is implemented in [Jade (aka Pug)](https://www.jade-lang.org), and is built or served by [Harp](https://harpjs.com). Jade is an extensible templating language with a readable syntax, that compiles to HTML.
|
||||
The website source makes extensive use of Jade mixins, so that the design system is abstracted away from the content you're
|
||||
writing. You can read more about our approach in our blog post, ["Rebuilding a Website with Modular Markup"](https://explosion.ai/blog/modular-markup).
|
||||
|
||||
|
||||
## Building the site
|
||||
## Viewing the site locally
|
||||
|
||||
```bash
|
||||
sudo npm install --global harp
|
||||
|
@ -17,3 +17,102 @@ harp server
|
|||
```
|
||||
|
||||
This will serve the site on [http://localhost:9000](http://localhost:9000).
|
||||
|
||||
|
||||
## Making changes to the site
|
||||
|
||||
The docs can always use another example or more detail, and they should always be up to date and not misleading. If you see something, say something – we always appreciate a [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the correct file to edit, simply click on the "Suggest edits" button at the bottom of a page.
|
||||
|
||||
### File structure
|
||||
|
||||
While all page content lives in the `.jade` files, article meta (page titles, sidebars etc.) is stored as JSON. Each folder contains a `_data.json` with all required meta for its files.
|
||||
|
||||
For simplicity, all sites linked in the [tutorials](https://spacy.io/docs/usage/tutorials) and [showcase](https://spacy.io/docs/usage/showcase) are also stored as JSON. So in order to edit those pages, there's no need to dig into the Jade files – simply edit the [`_data.json`](website/docs/usage/_data.json).
|
||||
|
||||
### Markup language and conventions
|
||||
|
||||
Jade/Pug is a whitespace-sensitive markup language that compiles to HTML. Indentation is used to nest elements, and for template logic, like `if`/`else` or `for`, mainly used to iterate over objects and arrays in the meta data. It also allows inline JavaScript expressions.
|
||||
|
||||
For an overview of Harp and Jade, see [this blog post](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade). For more info on the Jade/Pug syntax, check out their [documentation](https://pugjs.org).
|
||||
|
||||
In the [spacy.io](https://spacy.io) source, we use 4 spaces to indent and hard-wrap at 80 characters.
|
||||
|
||||
```pug
|
||||
p This is a very short paragraph. It stays inline.
|
||||
|
||||
p
|
||||
| This is a much longer paragraph. It's hard-wrapped at 80 characters to
|
||||
| make it easier to read on GitHub and in editors that do not have soft
|
||||
| wrapping enabled. To prevent Jade from interpreting each line as a new
|
||||
| element, it's prefixed with a pipe and two spaces. This ensures that no
|
||||
| spaces are dropped – for example, if your editor strips out trailing
|
||||
| whitespace by default. Inline links are added using the inline syntax,
|
||||
| like this: #[+a("https://google.com") Google].
|
||||
```
|
||||
|
||||
Note that for external links, `+a("...")` is used instead of `a(href="...")` – it's a mixin that takes care of adding all required attributes.
|
||||
|
||||
### Mixins
|
||||
|
||||
Each file includes a collection of [custom mixins](website/_includes/_mixins.jade) that make it easier to add content components – no HTML or class names required.
|
||||
|
||||
For example:
|
||||
```pug
|
||||
//- Bulleted list
|
||||
|
||||
+list
|
||||
+item This is a list item.
|
||||
+item This is another list item.
|
||||
|
||||
//- Table with header
|
||||
|
||||
+table([ "Header one", "Header two" ])
|
||||
+row
|
||||
+cell Table cell
|
||||
+cell Another one
|
||||
|
||||
+row
|
||||
+cell And one more.
|
||||
+cell And the last one.
|
||||
|
||||
//- Headlines with optional permalinks
|
||||
|
||||
+h(2, "link-id") Headline 2 with link to #link-id
|
||||
```
|
||||
|
||||
Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace:
|
||||
|
||||
```pug
|
||||
+code("This is a label").
|
||||
import spacy
|
||||
en_nlp = spacy.load('en')
|
||||
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
|
||||
```
|
||||
|
||||
You can find the documentation for the available mixins in [`_includes/_mixins.jade`](website/_includes/_mixins.jade).
|
||||
|
||||
### Linking to the Github repo
|
||||
|
||||
Since GitHub links can be long and tricky, you can use the `gh()` function to generate them automatically for spaCy and all repositories owned by [explosion](https://github.com/explosion):
|
||||
|
||||
```pug
|
||||
//- Syntax: gh(repo, [file], [branch])
|
||||
|
||||
+src(gh("spaCy", "spacy/matcher.pyx"))
|
||||
|
||||
//- https://github.com/explosion/spaCy/blob/master/spacy/matcher.pyx
|
||||
|
||||
```
|
||||
|
||||
`+src()` creates a link with a little source icon to indicate it's linking to a code source.
|
||||
|
||||
### Most common causes of compile errors
|
||||
|
||||
| Problem | Fix |
|
||||
| --- | --- |
|
||||
| JSON formatting errors | make sure last elements of objects don't end with commas and/or use a JSON linter |
|
||||
| unescaped characters like `<` or `>` and sometimes `'` in inline elements | replace with encoded version: `<`, `>` etc. |
|
||||
| "Cannot read property 'call' of undefined" / "foo is not a function" | make sure mixin names are spelled correctly and mixins file is included with the correct path |
|
||||
| "no closing bracket found" | make sure inline elements end with a `]`, like `#[code spacy.load('en')]` and for nested inline elements, make sure they're all on the same line and contain spaces between them (**bad:** `#[+api("doc")#[code Doc]]`) |
|
||||
|
||||
If Harp fails and throws a Jade error, don't take the reported line number at face value – it's often wrong, as the page is compiled from templates and several files.
|
||||
|
|
|
@ -2,27 +2,27 @@
|
|||
"index": {
|
||||
"landing": true,
|
||||
"logos": [
|
||||
[
|
||||
["chartbeat", "https://chartbeat.com"],
|
||||
["socrata", "https://www.socrata.com"],
|
||||
["chattermill", "https://chattermill.io"],
|
||||
["cytora", "http://www.cytora.com"],
|
||||
["signaln", "http://signaln.com"],
|
||||
["duedil", "https://www.duedil.com/"],
|
||||
["spyjack", "https://spyjack.io"]
|
||||
],
|
||||
[
|
||||
["keyreply", "https://keyreply.com/"],
|
||||
["dato", "https://dato.com"],
|
||||
["kip", "http://kipthis.com"],
|
||||
["wonderflow", "http://www.wonderflow.co"],
|
||||
["foxtype", "https://foxtype.com"]
|
||||
],
|
||||
[
|
||||
["synapsify", "http://www.gosynapsify.com"],
|
||||
["stitchfix", "https://www.stitchfix.com/"],
|
||||
["wayblazer", "http://wayblazer.com"]
|
||||
]
|
||||
{
|
||||
"chartbeat": "https://chartbeat.com",
|
||||
"cytora": "http://www.cytora.com",
|
||||
"duedil": "https://www.duedil.com",
|
||||
"socrata": "https://www.socrata.com",
|
||||
"indico": "https://indico.io",
|
||||
"signaln": "http://signaln.com"
|
||||
},
|
||||
{
|
||||
"keyreply": "https://keyreply.com",
|
||||
"dato": "https://dato.com",
|
||||
"kip": "http://kipthis.com",
|
||||
"wonderflow": "http://www.wonderflow.co",
|
||||
"foxtype": "https://foxtype.com"
|
||||
},
|
||||
{
|
||||
"synapsify": "http://www.gosynapsify.com",
|
||||
"stitchfix": "https://www.stitchfix.com",
|
||||
"wayblazer": "http://wayblazer.com",
|
||||
"chattermill": "https://chattermill.io"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
|
@ -32,28 +32,10 @@
|
|||
|
||||
"404": {
|
||||
"title": "404 Error",
|
||||
"asides": false
|
||||
"landing": true
|
||||
},
|
||||
|
||||
"styleguide": {
|
||||
"title" : "Styleguide",
|
||||
"asides": true,
|
||||
|
||||
"sidebar": {
|
||||
"About": [
|
||||
["Introduction", "#section-introduction", "introduction"]
|
||||
],
|
||||
"Design": [
|
||||
["Colors", "#section-colors", "colors"],
|
||||
["Logo", "#section-logo", "logo"],
|
||||
["Typography", "#section-typography", "typography"],
|
||||
["Grid", "#section-grid", "grid"],
|
||||
["Elements", "#section-elements", "elements"],
|
||||
["Components", "#section-components", "components"]
|
||||
],
|
||||
"Code": [
|
||||
["Source", "#section-source", "source"]
|
||||
]
|
||||
}
|
||||
"announcement" : {
|
||||
"title": "Important Announcement"
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
{
|
||||
"globals": {
|
||||
"title": "spaCy.io",
|
||||
"title": "spaCy",
|
||||
"description": "spaCy is a free open-source library featuring state-of-the-art speed and accuracy and a powerful Python API.",
|
||||
|
||||
"SITENAME": "spaCy",
|
||||
"SLOGAN": "Industrial-strength Natural Language Processing",
|
||||
"SLOGAN": "Industrial-strength Natural Language Processing in Python",
|
||||
"SITE_URL": "https://spacy.io",
|
||||
"EMAIL": "contact@explosion.ai",
|
||||
|
||||
|
@ -12,6 +12,8 @@
|
|||
"COMPANY_URL": "https://explosion.ai",
|
||||
"DEMOS_URL": "https://demos.explosion.ai",
|
||||
|
||||
"SPACY_VERSION": "1.1",
|
||||
|
||||
"SOCIAL": {
|
||||
"twitter": "spacy_io",
|
||||
"github": "explosion",
|
||||
|
@ -21,9 +23,39 @@
|
|||
"SCRIPTS" : [ "main", "prism" ],
|
||||
"DEFAULT_SYNTAX" : "python",
|
||||
"ANALYTICS": "UA-58931649-1",
|
||||
"MAILCHIMP": {
|
||||
"user": "spacy.us12",
|
||||
"id": "83b0498b1e7fa3c91ce68c3f1",
|
||||
"list": "89ad33e698"
|
||||
},
|
||||
|
||||
"NAVIGATION": {
|
||||
"Home": "/",
|
||||
"Docs": "/docs",
|
||||
"Demos": "/docs/usage/showcase",
|
||||
"Blog": "https://explosion.ai/blog"
|
||||
},
|
||||
|
||||
"FOOTER": {
|
||||
"spaCy": {
|
||||
"Usage": "/docs/usage",
|
||||
"API Reference": "/docs/api",
|
||||
"Tutorials": "/docs/usage/tutorials",
|
||||
"Showcase": "/docs/usage/showcase"
|
||||
},
|
||||
"Support": {
|
||||
"Issue Tracker": "https://github.com/explosion/spaCy/issues",
|
||||
"StackOverflow": "http://stackoverflow.com/questions/tagged/spacy",
|
||||
"Reddit usergroup": "https://www.reddit.com/r/spacynlp/",
|
||||
"Gitter chat": "https://gitter.im/explosion/spaCy"
|
||||
},
|
||||
"Connect": {
|
||||
"Twitter": "https://twitter.com/spacy_io",
|
||||
"GitHub": "https://github.com/explosion/spaCy",
|
||||
"Blog": "https://explosion.ai/blog",
|
||||
"Contact": "mailto:contact@explosion.ai"
|
||||
}
|
||||
}
|
||||
|
||||
"SPACY_VERSION": "1.0",
|
||||
"SPACY_STARS": "2500",
|
||||
"GITHUB": { "user": "explosion", "repo": "spacy" }
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,17 +1,30 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > FOOTER
|
||||
//- ----------------------------------
|
||||
|
||||
include _mixins
|
||||
|
||||
footer.o-footer.o-inline-list.u-pattern.u-text-center.u-text-label.u-text-strong
|
||||
footer.o-footer.u-text.u-border-dotted
|
||||
+grid.o-content
|
||||
each group, label in FOOTER
|
||||
+grid-col("quarter")
|
||||
ul
|
||||
li.u-text-label.u-color-subtle=label
|
||||
|
||||
each url, item in group
|
||||
li
|
||||
+a(url)(target=url.includes("http") ? "_blank" : "")=item
|
||||
|
||||
if SECTION != "docs"
|
||||
+grid-col("quarter")
|
||||
include _newsletter
|
||||
|
||||
if SECTION == "docs"
|
||||
.o-content.o-block.u-border-dotted
|
||||
include _newsletter
|
||||
|
||||
.o-inline-list.u-text-center.u-text-tiny.u-color-subtle
|
||||
span © #{new Date().getFullYear()} #[+a(COMPANY_URL, true)=COMPANY]
|
||||
|
||||
+a(COMPANY_URL, true)
|
||||
+svg("graphics", "explosion", 45).o-icon.u-color-theme.u-grayscale
|
||||
|
||||
+a(COMPANY_URL + "/legal", true) Legal / Imprint
|
||||
a(href="mailto:#{EMAIL}") #[+icon("mail", 16)]
|
||||
|
||||
+a("https://twitter.com/" + SOCIAL.twitter)(aria-label="Twitter")
|
||||
+icon("twitter", 20)
|
||||
|
||||
+a("https://github.com/" + SOCIAL.github + "/spaCy")(aria-label="GitHub")
|
||||
+icon("github", 20)
|
||||
|
|
|
@ -1,6 +1,11 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > FUNCTIONS
|
||||
//- ----------------------------------
|
||||
|
||||
//- More descriptive variables for current.path and current.source
|
||||
|
||||
- CURRENT = current.source
|
||||
- SECTION = current.path[0]
|
||||
- SUBSECTION = current.path[1]
|
||||
|
||||
|
||||
//- Add prefixes to items of an array (for modifier CSS classes)
|
||||
|
||||
|
@ -9,3 +14,10 @@
|
|||
- return prefix + '--' + arg;
|
||||
- }).join(' ');
|
||||
- }
|
||||
|
||||
|
||||
//- Generate GitHub links
|
||||
|
||||
- function gh(repo, filepath, branch) {
|
||||
- return 'https://github.com/' + SOCIAL.github + '/' + repo + (filepath ? '/blob/' + (branch || 'master') + '/' + filepath : '' );
|
||||
- }
|
||||
|
|
|
@ -1,6 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > LOGO
|
||||
//- ----------------------------------
|
||||
|
||||
svg.o-logo(class=(logo_size) ? "o-logo--" + logo_size : "" viewBox="0 0 675 215" width="500")
|
||||
path(d="M83.6 83.3C68.3 81.5 67.2 61 47.5 62.8c-9.5 0-18.4 4-18.4 12.7 0 13.2 20.3 14.4 32.5 17.7 20.9 6.3 41 10.7 41 33.3 0 28.8-22.6 38.8-52.4 38.8-24.9 0-50.2-8.9-50.2-31.8 0-6.4 6.1-11.3 12-11.3 7.5 0 10.1 3.2 12.7 8.4 5.8 10.2 12.3 15.6 28.3 15.6 10.2 0 20.6-3.9 20.6-12.7 0-12.6-12.8-15.3-26.1-18.4-23.5-6.6-43.6-10-46-36.1C-1 34.5 91.7 32.9 97 71.9c.1 7.1-6.5 11.4-13.4 11.4zm110.2-39c32.5 0 51 27.2 51 60.8 0 33.7-17.9 60.8-51 60.8-18.4 0-29.8-7.8-38.1-19.8v44.5c0 13.4-4.3 19.8-14.1 19.8-11.9 0-14.1-7.6-14.1-19.8V61.3c0-10.6 4.4-17 14.1-17 9.1 0 14.1 7.2 14.1 17v3.6c9.2-11.6 19.7-20.6 38.1-20.6zm-7.7 98.4c19.1 0 27.6-17.6 27.6-38.1 0-20.1-8.6-38.1-27.6-38.1-19.8 0-29 16.3-29 38.1 0 21.2 9.2 38.1 29 38.1zM266.9 76c0-23.4 26.9-31.7 52.9-31.7 36.6 0 51.7 10.7 51.7 46v34c0 8.1 5 24.1 5 29 0 7.4-6.8 12-14.1 12-8.1 0-14.1-9.5-18.4-16.3-11.9 9.5-24.5 16.3-43.8 16.3-21.3 0-38.1-12.6-38.1-33.3 0-18.4 13.2-28.9 29-32.5 0 .1 51-12 51-12.1 0-15.7-5.5-22.6-22-22.6-14.5 0-21.9 4-27.5 12.7-4.5 6.6-4 10.6-12.7 10.6-6.9-.1-13-4.9-13-12.1zm43.6 70.2c22.3 0 31.8-11.8 31.8-35.3v-5c-6 2-30.3 8-36.8 9.1-7 1.4-14.1 6.6-14.1 14.9.1 9.1 9.4 16.3 19.1 16.3zM474.5 0c31.5 0 65.7 18.8 65.7 48.8 0 7.7-5.8 14.1-13.4 14.1-10.3 0-11.8-5.5-16.3-13.4-7.6-13.9-16.5-23.3-36.1-23.3-30.2-.2-43.7 25.6-43.7 57.8 0 32.4 11.2 55.8 42.4 55.8 20.7 0 32.2-12 38.1-27.6 2.4-7.1 6.7-14.1 15.6-14.1 7 0 14.1 7.2 14.1 14.8 0 31.8-32.4 53.8-65.8 53.8-36.5 0-57.2-15.4-68.5-41-5.5-12.2-9.1-24.9-9.1-42.4-.1-49.2 28.6-83.3 77-83.3zm180.3 44.3c8 0 12.7 5.2 12.7 13.4 0 3.3-2.6 9.9-3.6 13.4L625.1 173c-8.6 22.1-15.1 37.4-44.5 37.4-14 0-26.1-1.2-26.1-13.4 0-7 5.3-10.6 12.7-10.6 1.4 0 3.6.7 5 .7 2.1 0 3.6.7 5 .7 14.7 0 16.8-15.1 22-25.5l-37.4-92.6c-2.1-5-3.6-8.4-3.6-11.3 0-8.2 6.4-14.1 14.8-14.1 9.5 0 13.3 7.5 15.6 15.6l24.7 73.5L638 65.5c3.9-10.5 4.2-21.2 16.8-21.2z")
|
102
website/_includes/_mixins-base.jade
Normal file
|
@ -0,0 +1,102 @@
|
|||
//- 💫 MIXINS > BASE
|
||||
|
||||
//- Aside wrapper
|
||||
|
||||
mixin aside-wrapper(label)
|
||||
aside.c-aside
|
||||
.c-aside__content(role="complementary")&attributes(attributes)
|
||||
if label
|
||||
h4.u-text-label.u-text-label--dark=label
|
||||
|
||||
block
|
||||
|
||||
//- Date
|
||||
input - [string] date in the format YYYY-MM-DD
|
||||
|
||||
mixin date(input)
|
||||
- var date = new Date(input)
|
||||
- var months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]
|
||||
|
||||
time(datetime=JSON.parse(JSON.stringify(date)))&attributes(attributes)=months[date.getMonth()] + ' ' + date.getDate() + ', ' + date.getFullYear()
|
||||
|
||||
|
||||
//- SVG from map
|
||||
|
||||
mixin svg(file, name, width, height)
|
||||
svg(aria-hidden="true" viewBox="0 0 #{width} #{height || width}" width=width height=(height || width))&attributes(attributes)
|
||||
use(xlink:href="/assets/img/#{file}.svg##{name}")
|
||||
|
||||
|
||||
//- Icon
|
||||
|
||||
mixin icon(name, size)
|
||||
+svg("icons", "icon-" + name, size || 20).o-icon&attributes(attributes)
|
||||
|
||||
|
||||
//- Pro/Con/Neutral icon
|
||||
|
||||
mixin procon(icon)
|
||||
- colors = { pro: "green", con: "red" }
|
||||
+icon(icon)(class="u-color-#{colors[icon] || 'subtle'}" aria-label=icon)&attributes(attributes)
|
||||
|
||||
|
||||
//- Headlines Helper Mixin
|
||||
|
||||
mixin headline(level)
|
||||
if level == 1
|
||||
h1.u-heading-1&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 2
|
||||
h2.u-heading-2&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 3
|
||||
h3.u-heading-3&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 4
|
||||
h4.u-heading-4&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 5
|
||||
h5.u-heading-5&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Permalink rendering
|
||||
|
||||
mixin permalink(id)
|
||||
if id
|
||||
a.u-permalink(id=id href="##{id}")
|
||||
+icon("anchor").u-permalink__icon
|
||||
block
|
||||
|
||||
else
|
||||
block
|
||||
|
||||
|
||||
//- Terminal-style code window
|
||||
|
||||
mixin terminal(label)
|
||||
.x-terminal
|
||||
.x-terminal__icons: span
|
||||
.u-padding-small.u-text-label.u-text-center=label
|
||||
|
||||
+code.x-terminal__code
|
||||
block
|
||||
|
||||
|
||||
//- Logo
|
||||
|
||||
mixin logo()
|
||||
+svg("graphics", "spacy", 500).o-logo&attributes(attributes)
|
||||
|
||||
|
||||
//- Landing
|
||||
|
||||
mixin landing-header()
|
||||
header.c-landing
|
||||
.c-landing__wrapper
|
||||
.c-landing__content
|
||||
block
|
|
@ -1,9 +1,255 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > MIXINS
|
||||
//- ----------------------------------
|
||||
|
||||
include _functions
|
||||
include _mixins-base
|
||||
|
||||
include _mixins/_base
|
||||
include _mixins/_components
|
||||
include _mixins/_headlines
|
||||
|
||||
//- Headlines
|
||||
level - [integer] headline level, corresponds to h1, h2, h3 etc.
|
||||
id - [string] unique identifier, creates permalink (optional)
|
||||
|
||||
mixin h(level, id)
|
||||
+headline(level).u-heading&attributes(attributes)
|
||||
+permalink(id)
|
||||
block
|
||||
|
||||
|
||||
//- External links
|
||||
url - [string] link href
|
||||
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
|
||||
info: https://mathiasbynens.github.io/rel-noopener/
|
||||
|
||||
mixin a(url, trusted)
|
||||
a(href=url target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
//- Source link (with added icon for "code")
|
||||
url - [string] link href, can also be gh() function to generate GitHub link
|
||||
see _functions.jade for more info
|
||||
|
||||
mixin src(url)
|
||||
+a(url)
|
||||
block
|
||||
|
||||
| #[+icon("code", 16).u-color-subtle]
|
||||
|
||||
|
||||
//- API link (with added tag and automatically generated path)
|
||||
path - [string] path to API docs page relative to /docs/api/
|
||||
|
||||
mixin api(path)
|
||||
+a("/docs/api/" + path, true)(target="_self").u-no-border.u-inline-block
|
||||
block
|
||||
|
||||
| #[+icon("book", 18).o-help-icon.u-color-subtle]
|
||||
|
||||
|
||||
//- Aside for text
|
||||
label - [string] aside title (optional)
|
||||
|
||||
mixin aside(label)
|
||||
+aside-wrapper(label)
|
||||
.c-aside__text.u-text-small
|
||||
block
|
||||
|
||||
|
||||
//- Aside for code
|
||||
label - [string] aside title (optional or false for no label)
|
||||
language - [string] language for syntax highlighting (default: "python")
|
||||
supports basic relevant languages available for PrismJS
|
||||
|
||||
mixin aside-code(label, language)
|
||||
+aside-wrapper(label)
|
||||
+code(false, language).o-no-block
|
||||
block
|
||||
|
||||
|
||||
//- Link button
|
||||
url - [string] link href
|
||||
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
|
||||
info: https://mathiasbynens.github.io/rel-noopener/
|
||||
...style - all other arguments are added as class names c-button--argument
|
||||
see assets/css/_components/_buttons.sass
|
||||
|
||||
mixin button(url, trusted, ...style)
|
||||
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Code block
|
||||
label - [string] aside title (optional or false for no label)
|
||||
language - [string] language for syntax highlighting (default: "python")
|
||||
supports basic relevant languages available for PrismJS
|
||||
|
||||
mixin code(label, language)
|
||||
pre.c-code-block.o-block(class="lang-#{(language || DEFAULT_SYNTAX)}")&attributes(attributes)
|
||||
if label
|
||||
h4.u-text-label.u-text-label--dark=label
|
||||
|
||||
code.c-code-block__content
|
||||
block
|
||||
|
||||
|
||||
//- Images / figures
|
||||
url - [string] url or path to image
|
||||
width - [integer] image width in px, for better rendering (default: 500)
|
||||
caption - [string] image caption
|
||||
alt - [string] alternative image text, defaults to caption
|
||||
|
||||
mixin image(url, width, caption, alt)
|
||||
figure.o-block&attributes(attributes)
|
||||
img(src=url alt=(alt || caption) width="#{width || 500}")
|
||||
|
||||
if caption
|
||||
+image-caption=caption
|
||||
|
||||
else
|
||||
block
|
||||
|
||||
//- Image caption
|
||||
|
||||
mixin image-caption()
|
||||
figcaption.u-text-small.u-color-subtle.u-padding-small&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Label
|
||||
|
||||
mixin label()
|
||||
.u-text-label.u-color-subtle&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Tag
|
||||
|
||||
mixin tag()
|
||||
span.u-text-tag.u-text-tag--spaced(aria-hidden="true")
|
||||
block
|
||||
|
||||
|
||||
//- List
|
||||
type - [string] "numbers", "letters", "roman" (bulleted list if none set)
|
||||
start - [integer] start number
|
||||
|
||||
mixin list(type, start)
|
||||
if type
|
||||
ol.c-list.o-block.u-text(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
else
|
||||
ul.c-list.c-list--bullets.o-block.u-text&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- List item (only used within +list)
|
||||
|
||||
mixin item(procon)
|
||||
if procon
|
||||
li&attributes(attributes)
|
||||
+procon(procon).c-list__icon
|
||||
block
|
||||
|
||||
else
|
||||
li.c-list__item&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table
|
||||
head - [array] table headings (should match number of columns)
|
||||
|
||||
mixin table(head)
|
||||
table.c-table.o-block&attributes(attributes)
|
||||
|
||||
if head
|
||||
+row
|
||||
each column in head
|
||||
th.c-table__head-cell.u-text-label=column
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Table row (only used within +table)
|
||||
|
||||
mixin row()
|
||||
tr.c-table__row&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Footer table row (only ued within +table)
|
||||
|
||||
mixin footrow()
|
||||
tr.c-table__row.c-table__row--foot&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table cell (only used within +row in +table)
|
||||
|
||||
mixin cell()
|
||||
td.c-table__cell.u-text&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Grid Container
|
||||
|
||||
mixin grid()
|
||||
.o-grid.o-block&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Grid Column (only used within +grid)
|
||||
width - [string] "quarter", "third", "half", "two-thirds", "three-quarters"
|
||||
see $grid in assets/css/_variables.sass
|
||||
|
||||
mixin grid-col(width)
|
||||
.o-grid__col(class="o-grid__col--#{width}")&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Card (only used within +grid)
|
||||
title - [string] card title
|
||||
details - [object] url, image, author, description, tags etc.
|
||||
(see /docs/usage/_data.json)
|
||||
|
||||
mixin card(title, details)
|
||||
+grid-col("half").u-border.u-padding-medium.u-text&attributes(attributes)
|
||||
if details.image
|
||||
+a(details.url).o-block-small
|
||||
img(src=details.image alt=title width="300" role="presentation")
|
||||
|
||||
if title
|
||||
+a(details.url)
|
||||
+h(3)=title
|
||||
|
||||
if details.author
|
||||
.u-text-small.u-color-subtle by #{details.author}
|
||||
|
||||
if details.description || details.tags
|
||||
ul
|
||||
if details.description
|
||||
li=details.description
|
||||
|
||||
if details.tags
|
||||
li
|
||||
each tag in details.tags
|
||||
span.u-text-tag #{tag}
|
||||
|
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Simpler card list item (only used within +list)
|
||||
title - [string] card title
|
||||
details - [object] url, image, author, description, tags etc.
|
||||
(see /docs/usage/_data.json)
|
||||
|
||||
mixin card-item(title, details)
|
||||
+item&attributes(attributes)
|
||||
+a(details.url)=title
|
||||
|
||||
if details.description
|
||||
br
|
||||
span=details.description
|
||||
|
||||
if details.author
|
||||
br
|
||||
span.u-text-small.u-color-subtle by #{details.author}
|
||||
|
|
|
@ -1,42 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 MIXINS > BASE
|
||||
//- ----------------------------------
|
||||
|
||||
//- External Link
|
||||
|
||||
mixin a(url, trusted)
|
||||
a(href=url target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Sections for content pages
|
||||
id - [string] id, can be headline id as it's being prefixed (optional)
|
||||
block - section content (block and inline elements)
|
||||
|
||||
mixin section(id)
|
||||
section.o-section(id=(id) ? 'section-' + id : '')&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Date
|
||||
input - [string] date in the format YYYY-MM-DD
|
||||
|
||||
mixin date(input)
|
||||
- var date = new Date(input)
|
||||
- var months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]
|
||||
|
||||
time(datetime=JSON.parse(JSON.stringify(date)))&attributes(attributes)=months[date.getMonth()] + ' ' + date.getDate() + ', ' + date.getFullYear()
|
||||
|
||||
|
||||
//- Grid Container
|
||||
|
||||
mixin grid(...style)
|
||||
.o-grid.o-block(class=prefixArgs(style, "o-grid"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Grid Column
|
||||
|
||||
mixin grid-col(...style)
|
||||
.o-grid__col(class=prefixArgs(style, "o-grid__col"))&attributes(attributes)
|
||||
block
|
|
@ -1,112 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 MIXINS > COMPONENTS
|
||||
//- ----------------------------------
|
||||
|
||||
//- Aside
|
||||
|
||||
mixin aside(label)
|
||||
span.c-aside.u-text-small(role="complementary")&attributes(attributes)
|
||||
span.c-aside__label.u-text-label.u-text-strong.u-color-theme=label
|
||||
block
|
||||
|
||||
|
||||
//- Button
|
||||
|
||||
mixin button(url, trusted, ...style)
|
||||
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target="_blank" rel=(!trusted) ? "noopener nofollow" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Code
|
||||
|
||||
mixin code(language, label, small)
|
||||
pre.c-code-block(class="lang-#{(language || DEFAULT_SYNTAX)} #{small ? '' : 'o-block'}")&attributes(attributes)
|
||||
if label
|
||||
span.c-code-block__label.u-text-label.u-text-strong=label
|
||||
|
||||
code.c-code-block__content(class=small ? "u-code-small" : "u-code-regular")
|
||||
block
|
||||
|
||||
|
||||
//- Icon
|
||||
|
||||
mixin icon(name, size)
|
||||
- var size = size || 20
|
||||
|
||||
svg.o-icon(aria-hidden="true" viewBox="0 0 #{size} #{size}" width=size height=size)&attributes(attributes)
|
||||
use(xlink:href="/assets/img/icons.svg#icon-#{name}")
|
||||
|
||||
|
||||
//- Image for illustration purposes
|
||||
file - [string] file name (in /assets/img)
|
||||
alt - [string] descriptive alt text (optional)
|
||||
caption - [string] image caption (optional)
|
||||
|
||||
mixin image(file, alt, caption)
|
||||
figure.o-block&attributes(attributes)
|
||||
img(src="/assets/img/#{file}" alt=(alt || caption) width="800")
|
||||
|
||||
if caption
|
||||
figcaption.u-text-small=caption
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Label
|
||||
|
||||
mixin label()
|
||||
.u-text-label.u-text-strong.u-color-theme&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- List
|
||||
|
||||
mixin list(type, start)
|
||||
if type
|
||||
ol.c-list.o-block(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : "")&attributes(attributes)
|
||||
block
|
||||
|
||||
else
|
||||
ul.c-list.c-list--bullets.o-block&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- List item
|
||||
|
||||
mixin item()
|
||||
li.c-list__item.u-text-regular&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table
|
||||
|
||||
mixin table(head)
|
||||
table.c-table.o-block.has-aside&attributes(attributes)
|
||||
|
||||
if head
|
||||
+row
|
||||
each column in head
|
||||
th.c-table__head-cell.u-text-label.u-text-strong=column
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Table row
|
||||
|
||||
mixin row(...style)
|
||||
tr.c-table__row(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table cell
|
||||
|
||||
mixin cell(...style)
|
||||
td.c-table__cell.u-text-regular.has-aside(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Tag
|
||||
|
||||
mixin tag()
|
||||
span.u-text-tag.u-text-label.u-color-theme.u-text-strong.u-padding-small
|
||||
block
|
|
@ -1,49 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 MIXINS > HEADLINES
|
||||
//- ----------------------------------
|
||||
|
||||
//- Headlines Helper Mixin
|
||||
|
||||
mixin headline(level)
|
||||
if level == 1
|
||||
h1.u-heading-1&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 2
|
||||
h2.u-heading-2&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 3
|
||||
h3.u-heading-3&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 4
|
||||
h4.u-heading-4&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 5
|
||||
h5.u-heading-5&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Permalink rendering
|
||||
|
||||
mixin permalink(id)
|
||||
if id
|
||||
a.u-permalink(id=id href="##{id}")
|
||||
+icon("link").u-permalink__icon
|
||||
block
|
||||
|
||||
else
|
||||
block
|
||||
|
||||
|
||||
//- Headlines
|
||||
|
||||
mixin h(level, id, source)
|
||||
+headline(level)&attributes(attributes)
|
||||
+permalink(id)
|
||||
block
|
||||
|
||||
if source
|
||||
+button(source, false, "secondary").u-text-small.u-float-right Source
|
|
@ -1,26 +1,17 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > TOP NAVIGATION
|
||||
//- ----------------------------------
|
||||
|
||||
include _mixins
|
||||
|
||||
nav.c-nav.u-text-label.js-nav
|
||||
nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : "")
|
||||
a(href='/') #[+logo]
|
||||
|
||||
a(href='/')
|
||||
!=partial("_includes/_logo", { logo_size: 'small' })
|
||||
if SUBSECTION != "index"
|
||||
.u-text-label.u-padding-small=SUBSECTION
|
||||
|
||||
ul.c-nav__menu
|
||||
li.c-nav__menu__item(class=(current.path[0] == 'index') ? "is-active" : "")
|
||||
a(href='/') Home
|
||||
|
||||
li.c-nav__menu__item(class=(current.path[0] == 'docs') ? "is-active" : "")
|
||||
a(href="/docs") Docs
|
||||
each url, item in NAVIGATION
|
||||
li.c-nav__menu__item
|
||||
a(href=url target=url.includes("http") ? "_blank" : "")=item
|
||||
|
||||
li.c-nav__menu__item
|
||||
a(href="https://demos.explosion.ai" target="_blank") Demos
|
||||
|
||||
li.c-nav__menu__item
|
||||
a(href="https://explosion.ai/blog" target="_blank") Blog
|
||||
|
||||
li.c-nav__menu__item
|
||||
a(href="https://github.com/" + SOCIAL.github + "/spaCy" target="_blank") #[+icon("github", 18)] #[span.u-hidden-sm GitHub]
|
||||
+a(gh("spaCy"))(aria-label="GitHub").u-hidden-xs #[+icon("github", 20)]
|
||||
|
|
|
@ -1,20 +1,16 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > NEWSLETTER SIGNUP
|
||||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > NEWSLETTER
|
||||
|
||||
include _mixins
|
||||
ul.o-block
|
||||
li.u-text-label.u-color-subtle Stay in the loop!
|
||||
li Receive updates about new releases, tutorials and more.
|
||||
|
||||
.o-block.u-text-center.u-padding.u-border-top
|
||||
form.o-grid#mc-embedded-subscribe-form(action="//#{MAILCHIMP.user}.list-manage.com/subscribe/post?u=#{MAILCHIMP.id}&id=#{MAILCHIMP.list}" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
|
||||
|
||||
+label Sign up for the spaCy newsletter
|
||||
h3.u-heading-1 Stay in the loop!
|
||||
p.u-text-large Receive updates about new releases, tutorials and more.
|
||||
|
||||
form#mc-embedded-subscribe-form.o-inline-list(action="https://spacy.us12.list-manage.com/subscribe/post?u=83b0498b1e7fa3c91ce68c3f1&id=89ad33e698" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
|
||||
input#mce-EMAIL.u-border.u-padding-small.u-text-regular(type="email" name="EMAIL" placeholder="Your email address")
|
||||
|
||||
//- Spam bot protection
|
||||
//- MailChimp spam protection
|
||||
div(style="position: absolute; left: -5000px;" aria-hidden="true")
|
||||
input(type="text" name="b_83b0498b1e7fa3c91ce68c3f1_89ad33e698" tabindex="-1" value="")
|
||||
input(type="text" name="b_#{MAILCHIMP.id}_#{MAILCHIMP.list}" tabindex="-1" value="")
|
||||
|
||||
button#mc-embedded-subscribe.c-button.c-button--primary.u-text-label(type="submit" name="subscribe") Sign up
|
||||
.o-grid-col.u-border.u-padding-small
|
||||
input#mce-EMAIL.u-text(type="email" name="EMAIL" placeholder="Your email")
|
||||
|
||||
button#mc-embedded-subscribe.u-text-label.u-color-theme(type="submit" name="subscribe") Sign up
|
||||
|
|
27
website/_includes/_page-docs.jade
Normal file
|
@ -0,0 +1,27 @@
|
|||
//- 💫 INCLUDES > DOCS PAGE TEMPLATE
|
||||
|
||||
- sidebar_content = (SUBSECTION != "index") ? public.docs[SUBSECTION]._data.sidebar : public.docs._data.sidebar || FOOTER
|
||||
|
||||
include _sidebar
|
||||
|
||||
main.o-main.o-main--sidebar.o-main--aside
|
||||
article.o-content
|
||||
+h(1)=title
|
||||
if tag
|
||||
+tag=tag
|
||||
|
||||
!=yield
|
||||
|
||||
+grid.o-content.u-text
|
||||
+grid-col("half")
|
||||
if next && public.docs[SUBSECTION]._data[next]
|
||||
- data = public.docs[SUBSECTION]._data[next]
|
||||
|
||||
.o-inline-list
|
||||
span #[strong.u-text-label Read next:] #[a(href=next).u-link=data.title]
|
||||
|
||||
+grid-col("half").u-text-right
|
||||
.o-inline-list
|
||||
+button(gh("spacy", "website/" + current.path.join('/') + ".jade"), false, "secondary").u-text-tag Suggest edits #[+icon("code", 14)]
|
||||
|
||||
include _footer
|
|
@ -1,14 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > SCRIPTS
|
||||
//- ----------------------------------
|
||||
|
||||
each script in SCRIPTS
|
||||
script(src="/assets/js/" + script + ".js", type="text/javascript")
|
||||
|
||||
if landing
|
||||
script(async src="https://platform.twitter.com/widgets.js" charset="utf-8")
|
||||
|
||||
if environment == "deploy"
|
||||
script window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
|
||||
|
||||
script(async src="https://www.google-analytics.com/analytics.js")
|
|
@ -1,13 +1,13 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 INCLUDES > SIDEBAR
|
||||
//- ----------------------------------
|
||||
|
||||
include _mixins
|
||||
|
||||
nav.c-sidebar.js-sidebar
|
||||
.c-sidebar__body.u-text-regular
|
||||
each items, menu in sidebar
|
||||
ul.o-block-small
|
||||
menu.c-sidebar.js-sidebar.u-text
|
||||
if sidebar_content
|
||||
each items, menu in sidebar_content
|
||||
ul.c-sidebar__section.o-block
|
||||
li.u-text-label.u-color-subtle=menu
|
||||
each item in items
|
||||
li: a(href=item[1] data-section=(item[2]) ? "section-" + item[2] : "")=item[0]
|
||||
|
||||
each url, item in items
|
||||
li(class=(CURRENT == url || (CURRENT == "index" && url == "./")) ? "is-active" : "")
|
||||
+a(url)(target=url.includes("http") ? "_blank" : "")=item
|
||||
|
|
|
@ -1,13 +1,19 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 GLOBAL LAYOUT
|
||||
//- ----------------------------------
|
||||
|
||||
include _includes/_mixins
|
||||
|
||||
doctype html
|
||||
|
||||
html(lang="en")
|
||||
title=(current.path[0] == "index") ? SITENAME + " | " + SLOGAN : title + " | " + SITENAME
|
||||
title
|
||||
if SECTION == "docs" && SUBSECTION && SUBSECTION != "index"
|
||||
| #{title} | #{SITENAME} #{SUBSECTION == "api" ? "API" : "Usage"} Documentation
|
||||
|
||||
else if SECTION != "index"
|
||||
| #{title} | #{SITENAME}
|
||||
|
||||
else
|
||||
| #{SITENAME} - #{SLOGAN}
|
||||
|
||||
meta(charset="utf-8")
|
||||
meta(name="viewport" content="width=device-width, initial-scale=1.0")
|
||||
|
@ -19,41 +25,40 @@ html(lang="en")
|
|||
meta(property="og:url" content="#{SITE_URL}/#{current.path.join('/')}")
|
||||
meta(property="og:title" content=title)
|
||||
meta(property="og:description" content=description)
|
||||
meta(property="og:image" content="/assets/img/social.png")
|
||||
meta(property="og:image" content="#{SITE_URL}/assets/img/social#{(SECTION == 'docs') ? '_docs' : ''}.jpg")
|
||||
|
||||
meta(name="twitter:card" content="summary_large_image")
|
||||
meta(name="twitter:site" content="@" + SOCIAL.twitter)
|
||||
meta(name="twitter:title" content=title)
|
||||
meta(name="twitter:description" content=description)
|
||||
meta(name="twitter:image" content="/assets/img/social.jpg")
|
||||
meta(name="twitter:image" content="#{SITE_URL}/assets/img/social#{(SECTION == 'docs') ? '_docs' : ''}.jpg")
|
||||
|
||||
link(rel="shortcut icon" href="/assets/img/favicon.ico")
|
||||
link(rel="icon" type="image/x-icon" href="/assets/img/favicon.ico")
|
||||
link(href="/assets/css/style.css" rel="stylesheet")
|
||||
|
||||
if SUBSECTION == "usage"
|
||||
link(href="/assets/css/style_red.css?v1" rel="stylesheet")
|
||||
|
||||
else
|
||||
link(href="/assets/css/style.css?v1" rel="stylesheet")
|
||||
|
||||
body
|
||||
include _includes/_navigation
|
||||
|
||||
if !landing
|
||||
header.o-header.u-pattern.u-text-center
|
||||
if current.path[1] == "tutorials"
|
||||
h2.u-heading-1.u-text-shadow Tutorials
|
||||
if SECTION == "docs"
|
||||
include _includes/_page-docs
|
||||
|
||||
else
|
||||
+h(1).u-text-shadow=title
|
||||
|
||||
if sidebar
|
||||
include _includes/_sidebar
|
||||
|
||||
main.o-content(class="#{(sidebar) ? 'o-content--sidebar' : '' } #{((current.path[0] == 'docs' && asides != false) || asides) ? 'o-content--asides' : '' } #{(current.path[1] == 'tutorials') ? 'o-content--article' : '' }")
|
||||
if current.path[1] == "tutorials"
|
||||
+h(1)=title
|
||||
|
||||
!=yield
|
||||
|
||||
else
|
||||
!=yield
|
||||
|
||||
main!=yield
|
||||
include _includes/_footer
|
||||
|
||||
include _includes/_scripts
|
||||
each script in SCRIPTS
|
||||
script(src="/assets/js/" + script + ".js?v1", type="text/javascript")
|
||||
|
||||
if environment == "deploy"
|
||||
script
|
||||
| window.ga=window.ga||function(){
|
||||
| (ga.q=ga.q||[]).push(arguments)}; ga.l=+new Date;
|
||||
| ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
|
||||
|
||||
script(async src="https://www.google-analytics.com/analytics.js")
|
||||
|
|
14
website/announcement.jade
Normal file
|
@ -0,0 +1,14 @@
|
|||
//- 💫 SPACY ANNOUNCEMENT FROM 2016-08-09 (needs to stay for reference)
|
||||
|
||||
include _includes/_mixins
|
||||
|
||||
.o-content.u-padding
|
||||
+h(1)
|
||||
+label #[+date("2016-08-09")]
|
||||
| Dear spaCy users,
|
||||
|
||||
p Unfortunately, we (Henning Peters and Matthew Honnibal) are parting ways. Breaking up is never easy, and it's taken us a while to get our stuff together. Hopefully, you didn't notice anything was up — if you did, we hope you haven't been inconvenienced.
|
||||
|
||||
p Here's how this is going to work: Matt will continue to develop and maintain spaCy and all related projects under his name. Nothing will change for you. Henning will take over our legal structure and start a new business under a new name.
|
||||
|
||||
p Sincerely,#[br] Henning Peters and Matthew Honnibal
|
|
@ -1,6 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > ANIMATIONS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > ANIMATIONS
|
||||
|
||||
//- Fade in
|
||||
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > FONTS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > FONTS
|
||||
|
||||
// Source Sans Pro
|
||||
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > GRID
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > GRID
|
||||
|
||||
//- Grid container
|
||||
|
||||
|
|
|
@ -1,21 +1,14 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > LAYOUT
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > LAYOUT
|
||||
|
||||
//- HTML
|
||||
|
||||
html
|
||||
@include breakpoint(min, lg)
|
||||
font-size: $type-base
|
||||
|
||||
@include breakpoint(max, md)
|
||||
font-size: $type-base * 0.8
|
||||
|
||||
//- Body
|
||||
|
||||
body
|
||||
display: flex
|
||||
flex-flow: row wrap
|
||||
animation: fadeIn 0.25s ease
|
||||
background: $color-back
|
||||
color: $color-front
|
||||
|
@ -24,15 +17,12 @@ body
|
|||
//- Paragraphs
|
||||
|
||||
p
|
||||
@extend .o-block, .u-text-regular, .has-aside
|
||||
|
||||
.o-content--article &:not([class])
|
||||
@extend .u-text-medium
|
||||
@extend .o-block, .u-text
|
||||
|
||||
|
||||
//- Links
|
||||
|
||||
main p a, main table a, main li a, .c-aside a
|
||||
main p a, main table a, main > *:not(footer) li a, .c-aside a
|
||||
@extend .u-link
|
||||
|
||||
|
||||
|
@ -41,4 +31,3 @@ main p a, main table a, main li a, .c-aside a
|
|||
::selection
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
text-shadow: none
|
||||
|
|
|
@ -1,67 +1,80 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > OBJECTS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > OBJECTS
|
||||
|
||||
//- Containers
|
||||
//- Main container
|
||||
|
||||
.o-content
|
||||
flex: 1 1 auto
|
||||
padding: $nav-height 4rem 8rem
|
||||
width: $content-width - $aside-width
|
||||
.o-main
|
||||
padding: $nav-height 0 0 0
|
||||
max-width: 100%
|
||||
min-height: 100vh
|
||||
|
||||
@include breakpoint(min, md)
|
||||
&.o-content--asides
|
||||
padding-left: 5rem
|
||||
padding-right: $aside-width + $aside-padding * 2
|
||||
&.o-main--sidebar
|
||||
margin-left: $sidebar-width
|
||||
|
||||
//- Header
|
||||
&.o-main--aside
|
||||
margin-right: $aside-width
|
||||
position: relative
|
||||
|
||||
.o-header
|
||||
display: flex
|
||||
justify-content: center
|
||||
flex-flow: column nowrap
|
||||
padding: 3rem 5rem
|
||||
margin-top: $nav-height
|
||||
width: 100%
|
||||
min-height: 250px
|
||||
&:after
|
||||
@include position(absolute, top, left, 0, 100%)
|
||||
@include size($aside-width, 100%)
|
||||
content: ""
|
||||
display: block
|
||||
background: $pattern
|
||||
z-index: -1
|
||||
min-height: 100vh
|
||||
|
||||
|
||||
//- Content container
|
||||
|
||||
.o-content
|
||||
padding: 3rem 7.5rem
|
||||
margin: 0 auto
|
||||
width: $content-width
|
||||
max-width: 100%
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
padding: 3rem
|
||||
|
||||
|
||||
//- Footer
|
||||
|
||||
.o-footer
|
||||
position: relative
|
||||
padding: 5rem 0
|
||||
padding: 2.5rem 0
|
||||
overflow: auto
|
||||
width: 100%
|
||||
z-index: 200
|
||||
|
||||
|
||||
//- Blocks
|
||||
|
||||
.o-block
|
||||
margin-bottom: 5rem
|
||||
margin-bottom: 3rem
|
||||
|
||||
.o-block-small
|
||||
margin-bottom: 2rem
|
||||
|
||||
.o-section
|
||||
margin-bottom: 12.5rem
|
||||
.o-no-block
|
||||
margin-bottom: 0
|
||||
|
||||
.o-responsive
|
||||
overflow: auto
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
.o-card
|
||||
background: $color-back
|
||||
border-radius: 2px
|
||||
|
||||
|
||||
//- Icons
|
||||
|
||||
.o-icon
|
||||
vertical-align: middle
|
||||
|
||||
.o-help-icon
|
||||
cursor: help
|
||||
margin: 0 0.5rem 0 0.25rem
|
||||
|
||||
|
||||
//- Inline List
|
||||
|
||||
.o-inline-list > *
|
||||
display: inline
|
||||
margin-bottom: 3rem
|
||||
|
||||
&:not(:last-child)
|
||||
margin-right: 3rem
|
||||
|
@ -70,9 +83,7 @@
|
|||
//- Logo
|
||||
|
||||
.o-logo
|
||||
@include size(100%, auto)
|
||||
@include size($logo-width, auto)
|
||||
fill: currentColor
|
||||
|
||||
@each $name, $size in $logo-sizes
|
||||
&.o-logo--#{$name}
|
||||
width: $size
|
||||
vertical-align: middle
|
||||
margin: 0 0.5rem
|
||||
|
|
|
@ -1,9 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💥 BASE > RESET
|
||||
//- ----------------------------------
|
||||
|
||||
//- adapted from "normalize.css" by Nicolas Gallagher & Jonathan Neal
|
||||
//- https://github.com/necolas/normalize.css
|
||||
//- 💫 CSS > BASE > RESET
|
||||
|
||||
*
|
||||
box-sizing: border-box
|
||||
|
@ -11,12 +6,14 @@
|
|||
margin: 0
|
||||
border: 0
|
||||
outline: 0
|
||||
-webkit-font-smoothing: antialiased
|
||||
|
||||
html
|
||||
font-family: sans-serif
|
||||
text-rendering: optimizeSpeed
|
||||
-ms-text-size-adjust: 100%
|
||||
-webkit-text-size-adjust: 100%
|
||||
-webkit-font-smoothing: antialiased
|
||||
-moz-osx-font-smoothing: grayscale
|
||||
|
||||
body
|
||||
margin: 0
|
||||
|
@ -64,6 +61,7 @@ img
|
|||
max-width: 100%
|
||||
|
||||
svg
|
||||
max-width: 100%
|
||||
color-interpolation-filters: sRGB
|
||||
fill: currentColor
|
||||
|
||||
|
@ -88,17 +86,15 @@ table
|
|||
max-width: 100%
|
||||
border-collapse: collapse
|
||||
|
||||
td,
|
||||
th
|
||||
td, th
|
||||
vertical-align: top
|
||||
|
||||
ul,
|
||||
ol
|
||||
ul, ol
|
||||
list-style: none
|
||||
|
||||
input,
|
||||
button
|
||||
input, button
|
||||
appearance: none
|
||||
|
||||
button
|
||||
background: transparent
|
||||
cursor: pointer
|
||||
|
|
|
@ -1,80 +1,84 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BASE > UTILITIES
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > BASE > UTILITIES
|
||||
|
||||
//- Text
|
||||
|
||||
%text
|
||||
font-family: $font-primary
|
||||
line-height: 1.5
|
||||
|
||||
.u-text-regular
|
||||
@extend %text
|
||||
font-size: 1.6rem
|
||||
|
||||
.u-text-medium
|
||||
@extend %text
|
||||
font-size: 2rem
|
||||
.u-text
|
||||
font: 1.5rem/#{1.55} $font-primary
|
||||
|
||||
.u-text-small
|
||||
@extend %text
|
||||
font-size: 1.2rem
|
||||
font: 1.4rem/#{1.375} $font-primary
|
||||
|
||||
.u-text-large
|
||||
@extend %text
|
||||
font-size: 2.8rem
|
||||
.u-text-tiny
|
||||
font: 1.1rem/#{1.375} $font-primary
|
||||
|
||||
|
||||
//- Labels & Tags
|
||||
|
||||
.u-text-label
|
||||
@extend %text
|
||||
font-size: 1.4rem
|
||||
font-weight: normal
|
||||
font: normal 600 1.4rem/#{1.5} $font-code
|
||||
text-transform: uppercase
|
||||
|
||||
.u-text-strong
|
||||
font-weight: bold
|
||||
&.u-text-label--dark
|
||||
display: inline-block
|
||||
background: $color-dark
|
||||
box-shadow: inset 1px 1px 1px rgba($color-front, 0.25)
|
||||
color: $color-back
|
||||
padding: 0 0.75rem
|
||||
margin: 1.5rem 0 0 2rem
|
||||
border-radius: 2px
|
||||
|
||||
.u-code-regular
|
||||
font: normal normal 1.3rem/#{2} $font-code
|
||||
.u-text-tag
|
||||
display: inline-block
|
||||
font: 600 1.1rem/#{1} $font-code
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
padding: 0.15em 0.25em
|
||||
border-radius: 2px
|
||||
text-transform: uppercase
|
||||
vertical-align: middle
|
||||
|
||||
.u-code-small
|
||||
font: normal normal 0.85em $font-code
|
||||
line-height: inherit
|
||||
|
||||
.u-link
|
||||
color: $color-theme
|
||||
border-bottom: 1px solid
|
||||
&.u-text-tag--spaced
|
||||
margin-left: 0.75em
|
||||
|
||||
|
||||
//- Headings
|
||||
|
||||
.u-heading
|
||||
margin-bottom: 2rem
|
||||
|
||||
@include breakpoint(max, md)
|
||||
word-wrap: break-word
|
||||
|
||||
&:not(:first-child)
|
||||
padding-top: 3.5rem
|
||||
|
||||
.u-heading-0
|
||||
font: normal bold 7rem/#{1} $font-primary
|
||||
|
||||
@each $level, $size in (1: 5.5, 2: 3, 3: 2.6, 4: 2, 5: 1.8)
|
||||
@each $level, $size in $headings
|
||||
.u-heading-#{$level}
|
||||
font: normal bold #{$size}rem/#{1.25} $font-primary
|
||||
margin-bottom: 2rem
|
||||
|
||||
.u-heading-label
|
||||
@extend .u-text-label
|
||||
margin-bottom: 1rem
|
||||
|
||||
|
||||
//- Permalinks
|
||||
//- Links
|
||||
|
||||
.u-link
|
||||
color: $color-theme
|
||||
border-bottom: 1px solid
|
||||
|
||||
.u-permalink
|
||||
position: relative
|
||||
|
||||
&:target
|
||||
display: inline-block
|
||||
padding-top: $nav-height * 1.5
|
||||
padding-top: $nav-height * 1.25
|
||||
|
||||
& + *
|
||||
margin-top: $nav-height * 1.5
|
||||
margin-top: $nav-height * 1.25
|
||||
|
||||
.u-permalink__icon
|
||||
@include position(absolute, bottom, left, 0.25em, -3.25rem)
|
||||
@include size(2rem)
|
||||
@include position(absolute, bottom, left, 0.35em, -2.75rem)
|
||||
@include size(1.5rem)
|
||||
color: $color-subtle
|
||||
|
||||
.u-permalink:hover &
|
||||
|
@ -89,46 +93,56 @@
|
|||
.u-text-center
|
||||
text-align: center
|
||||
|
||||
.u-float-right
|
||||
float: right
|
||||
.u-text-right
|
||||
text-align: right
|
||||
|
||||
.u-padding
|
||||
padding: 5rem
|
||||
|
||||
.u-padding-small
|
||||
padding: 0.5em 0.75em
|
||||
|
||||
.u-padding-medium
|
||||
padding: 2rem
|
||||
padding: 2.5rem
|
||||
|
||||
.u-padding
|
||||
padding: 5rem
|
||||
.u-inline-block
|
||||
display: inline-block
|
||||
|
||||
.u-no-border
|
||||
border: none
|
||||
|
||||
.u-border
|
||||
border: 1px solid $color-subtle
|
||||
border-radius: 3px
|
||||
|
||||
.u-border-top
|
||||
border-top: 1px solid $color-subtle
|
||||
border-radius: 2px
|
||||
|
||||
.u-border-bottom
|
||||
border-bottom: 1px solid $color-subtle
|
||||
border: 1px solid $color-subtle
|
||||
|
||||
.u-color-theme
|
||||
color: $color-theme
|
||||
.u-border-dotted
|
||||
border-top: 1px dotted $color-subtle
|
||||
|
||||
.u-color-subtle
|
||||
color: $color-subtle-dark
|
||||
@each $name, $color in (theme: $color-theme, subtle: $color-subtle-dark, light: $color-back, red: $color-red, green: $color-green, yellow: $color-yellow)
|
||||
.u-color-#{$name}
|
||||
color: $color
|
||||
|
||||
.u-text-shadow
|
||||
text-shadow: 2px 2px $color-theme-dark
|
||||
.u-grayscale
|
||||
filter: grayscale(100%)
|
||||
transition: filter 0.15s ease
|
||||
user-select: none
|
||||
|
||||
&:hover
|
||||
filter: none
|
||||
|
||||
.u-pattern
|
||||
background: $color-theme url("/assets/img/pattern.jpg")
|
||||
color: $color-back
|
||||
background: $pattern
|
||||
|
||||
|
||||
//- Hidden elements
|
||||
|
||||
.u-hidden
|
||||
display: none
|
||||
|
||||
@each $breakpoint in (sm, md)
|
||||
.u-hidden-#{$breakpoint}
|
||||
@each $breakpoint in (xs, sm, md)
|
||||
.u-hidden-#{$breakpoint}.u-hidden-#{$breakpoint}
|
||||
@include breakpoint(max, $breakpoint)
|
||||
display: none
|
||||
|
|
|
@ -1,37 +1,41 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > ASIDES
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > ASIDES
|
||||
|
||||
//- Aside
|
||||
//- Aside container
|
||||
|
||||
.c-aside
|
||||
@include breakpoint(min, md)
|
||||
@include position(absolute, top, left, 0, calc(100% + #{$aside-padding}))
|
||||
border-left: 1px solid $color-subtle
|
||||
opacity: 0.5
|
||||
transition: opacity 0.25s ease
|
||||
padding: 0 $aside-padding
|
||||
width: $aside-width
|
||||
position: relative
|
||||
|
||||
&:hover
|
||||
opacity: 1
|
||||
|
||||
//- Aside content
|
||||
|
||||
.c-aside__content
|
||||
background: $color-front
|
||||
z-index: 10
|
||||
|
||||
@include breakpoint(min, md)
|
||||
@include position(absolute, top, left, -3rem, calc(100% + 5.5rem))
|
||||
width: calc(#{$aside-width} + 2rem)
|
||||
|
||||
// Banner effect
|
||||
|
||||
&:after
|
||||
$triangle-size: 2rem
|
||||
|
||||
@include position(absolute, bottom, left, -$triangle-size / 2, 0)
|
||||
@include size(0)
|
||||
border-color: transparent
|
||||
border-style: solid
|
||||
border-top-color: $color-dark
|
||||
border-width: $triangle-size / 2 0 0 $triangle-size
|
||||
content: ""
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
display: block
|
||||
margin: type(5) 0
|
||||
margin: 2rem 0
|
||||
|
||||
|
||||
//- Aside label
|
||||
//- Aside text
|
||||
|
||||
.c-aside__label
|
||||
display: block
|
||||
margin-bottom: 1rem
|
||||
|
||||
|
||||
// Aside container
|
||||
|
||||
.has-aside
|
||||
position: relative
|
||||
|
||||
&:hover > .c-aside
|
||||
opacity: 1
|
||||
.c-aside__text
|
||||
color: $color-back
|
||||
padding: 1.5rem 2.5rem 3rem 2rem
|
||||
|
|
|
@ -1,23 +1,23 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > BUTTONS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > BUTTONS
|
||||
|
||||
.c-button
|
||||
display: inline-block
|
||||
font-weight: bold
|
||||
padding: 0.5em 0.75em
|
||||
padding: 0.75em 1em
|
||||
border: 2px solid
|
||||
border-radius: 3px
|
||||
transition: opacity 0.25s ease
|
||||
|
||||
&:hover
|
||||
opacity: 0.8
|
||||
border-radius: 2px
|
||||
text-align: center
|
||||
transition: background 0.25s ease
|
||||
|
||||
&.c-button--primary
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
border-color: $color-theme
|
||||
|
||||
&:hover
|
||||
background: $color-theme-dark
|
||||
border-color: $color-theme-dark
|
||||
|
||||
&.c-button--secondary
|
||||
background: $color-back
|
||||
color: $color-theme
|
||||
|
|
|
@ -1,52 +1,40 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > CODE
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > CODE
|
||||
|
||||
//- Code block
|
||||
|
||||
.c-code-block
|
||||
background: $color-subtle-light
|
||||
padding: 1em 0
|
||||
border-left: 5px solid $color-theme
|
||||
background: $color-front
|
||||
color: $color-back
|
||||
padding: 0.75em 0
|
||||
border-radius: 2px
|
||||
overflow: auto
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
white-space: pre
|
||||
direction: ltr
|
||||
|
||||
:not(.o-block)
|
||||
margin-bottom: 2rem
|
||||
|
||||
|
||||
//- Code block content
|
||||
|
||||
.c-code-block__content
|
||||
display: block
|
||||
padding: 2em 2.5em
|
||||
|
||||
|
||||
//- Code block label
|
||||
|
||||
.c-code-block__label
|
||||
display: inline-block
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
padding: 1rem
|
||||
margin-bottom: 1.5rem
|
||||
font: normal normal 1.1rem/#{2} $font-code
|
||||
padding: 1em 2em
|
||||
|
||||
|
||||
//- Inline code
|
||||
|
||||
:not(.c-code-block) > code
|
||||
@extend .u-code-small
|
||||
|
||||
*:not(.c-code-block) > code
|
||||
font: normal 600 0.8em/#{1} $font-code
|
||||
background: $color-subtle-light
|
||||
box-shadow: 1px 1px 0 $color-subtle
|
||||
color: $color-front
|
||||
padding: 0.15em 0.5em
|
||||
margin: 0 0.25em
|
||||
border-radius: 2px
|
||||
text-shadow: 1px 1px 0 $color-back
|
||||
padding: 0.1em 0.5em
|
||||
margin: 0
|
||||
border-radius: 1px
|
||||
|
||||
.c-aside__content &
|
||||
background: $color-dark
|
||||
color: $color-back
|
||||
|
||||
|
||||
//- Syntax Highlighting
|
||||
|
|
20
website/assets/css/_components/_landing.sass
Normal file
|
@ -0,0 +1,20 @@
|
|||
//- 💫 CSS > COMPONENTS > LANDING
|
||||
|
||||
.c-landing
|
||||
background: $color-theme
|
||||
padding-top: 5rem
|
||||
width: 100%
|
||||
|
||||
.c-landing__wrapper
|
||||
background: $pattern
|
||||
padding-bottom: 6rem
|
||||
width: 100%
|
||||
|
||||
.c-landing__content
|
||||
background: $pattern-overlay
|
||||
width: 100%
|
||||
min-height: 573px
|
||||
|
||||
.c-landing__title
|
||||
color: $color-back
|
||||
text-align: center
|
|
@ -1,6 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > LISTS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > LISTS
|
||||
|
||||
//- List Container
|
||||
|
||||
|
@ -17,16 +15,22 @@
|
|||
|
||||
.c-list__item
|
||||
padding-left: 2rem
|
||||
margin-bottom: 1em
|
||||
margin-bottom: 0.5em
|
||||
margin-left: 1.25rem
|
||||
|
||||
&:before
|
||||
content: '\25CF'
|
||||
display: inline-block
|
||||
font-size: 1.25em
|
||||
font-size: 1em
|
||||
font-weight: bold
|
||||
padding-right: 1.25rem
|
||||
margin-left: -3.75rem
|
||||
text-align: right
|
||||
width: 2.5rem
|
||||
counter-increment: li
|
||||
|
||||
|
||||
//- List icon
|
||||
|
||||
.c-list__icon
|
||||
margin-right: 1rem
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > MISC
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > MISC
|
||||
|
||||
.x-terminal
|
||||
background: $color-subtle
|
||||
background: $color-subtle-light
|
||||
color: $color-front
|
||||
border-radius: 10px
|
||||
padding: 4px
|
||||
border: 1px dotted $color-subtle
|
||||
border-radius: 5px
|
||||
width: 100%
|
||||
|
||||
.x-terminal__icons
|
||||
|
@ -23,22 +23,20 @@
|
|||
|
||||
&:before
|
||||
content: ""
|
||||
background: #e4514f
|
||||
background: $color-red
|
||||
|
||||
span
|
||||
background: #3ec930
|
||||
background: $color-green
|
||||
|
||||
&:after
|
||||
content: ""
|
||||
background: #f4c025
|
||||
background: $color-yellow
|
||||
|
||||
.x-terminal__code
|
||||
background: $color-front
|
||||
color: $color-back
|
||||
margin: 0
|
||||
border: none
|
||||
border-bottom-left-radius: 10px
|
||||
border-bottom-right-radius: 10px
|
||||
border-bottom-left-radius: 5px
|
||||
border-bottom-right-radius: 5px
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
white-space: pre-wrap
|
||||
|
|
|
@ -1,29 +1,26 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > NAVIGATION
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > NAVIGATION
|
||||
|
||||
.c-nav
|
||||
@include position(absolute, top, left, 0, 0)
|
||||
@include size(100%, $nav-height)
|
||||
align-items: center
|
||||
background: $color-back
|
||||
border-color: $color-back
|
||||
color: $color-theme
|
||||
align-items: center
|
||||
display: flex
|
||||
justify-content: space-between
|
||||
padding: 0 2rem
|
||||
z-index: 10
|
||||
padding: 0 2rem 0 1rem
|
||||
z-index: 20
|
||||
width: 100%
|
||||
border-bottom: 1px solid $color-subtle
|
||||
|
||||
&.c-nav--theme
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
border-bottom: none
|
||||
|
||||
&.is-fixed
|
||||
animation: slideInDown 0.5s ease-in-out
|
||||
position: fixed
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
border-color: $color-theme
|
||||
|
||||
@include breakpoint(min, sm)
|
||||
height: $nav-height * 0.8
|
||||
|
||||
.c-nav__menu
|
||||
@include size(100%)
|
||||
|
@ -36,17 +33,7 @@
|
|||
display: flex
|
||||
align-items: center
|
||||
height: 100%
|
||||
text-transform: uppercase
|
||||
|
||||
&:not(:last-child)
|
||||
margin-right: 1em
|
||||
|
||||
&.is-active
|
||||
position: relative
|
||||
font-weight: bold
|
||||
border-color: inherit
|
||||
|
||||
&:after
|
||||
$triangle: 8px
|
||||
|
||||
@include triangle-down($triangle)
|
||||
@include position(absolute, top, left, 100%, calc(50% - #{$triangle}))
|
||||
|
|
|
@ -1,40 +1,40 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > SIDEBAR
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > SIDEBAR
|
||||
|
||||
//- Sidebar container
|
||||
|
||||
.c-sidebar
|
||||
@include breakpoint(min, md)
|
||||
flex: 0 0 $sidebar-width
|
||||
margin-right: 6rem
|
||||
margin-left: 4rem
|
||||
padding-top: $nav-height
|
||||
width: $sidebar-width
|
||||
background: $color-subtle-light
|
||||
overflow-y: auto
|
||||
|
||||
&.is-fixed .c-sidebar__body
|
||||
@include position(fixed, top, left, $nav-height, 4rem)
|
||||
@include size($sidebar-width, calc(100vh - #{$nav-height}))
|
||||
overflow: auto
|
||||
transition: none
|
||||
@include breakpoint(min, md)
|
||||
@include position(fixed, top, left, 0, 0)
|
||||
@include size($sidebar-width, 100vh)
|
||||
flex: 0 0 $sidebar-width
|
||||
padding: calc(#{$nav-height} + 1.5rem) 2rem 2rem
|
||||
z-index: 10
|
||||
border-right: 1px solid $color-subtle
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
flex: 100%
|
||||
width: 100%
|
||||
|
||||
.c-sidebar__body
|
||||
margin-top: $nav-height
|
||||
display: flex
|
||||
flex-flow: row wrap
|
||||
width: 100%
|
||||
|
||||
& > *
|
||||
flex: 1 1 0
|
||||
padding: 1rem
|
||||
border-bottom: 1px solid $color-subtle
|
||||
|
||||
//- Sidebar section
|
||||
|
||||
.c-sidebar__section
|
||||
@include breakpoint(max, sm)
|
||||
flex: 1 1 0
|
||||
padding: 1.25rem
|
||||
border-bottom: 1px solid $color-subtle
|
||||
margin: 0
|
||||
|
||||
&:not(:last-child)
|
||||
border-right: 1px solid $color-subtle
|
||||
|
||||
.c-sidebar__body
|
||||
.is-active
|
||||
font-weight: bold
|
||||
color: $color-theme
|
||||
|
|
|
@ -1,44 +1,68 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 COMPONENTS > TABLES
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > COMPONENTS > TABLES
|
||||
|
||||
// Shadows adapted from "CSS only Responsive Tables" by David Bushell
|
||||
// http://codepen.io/dbushell/pen/wGaamR
|
||||
|
||||
//- Table Container
|
||||
//- Table container
|
||||
|
||||
.c-table
|
||||
vertical-align: top
|
||||
|
||||
@include breakpoint(max, md)
|
||||
|
||||
//- Table row
|
||||
|
||||
.c-table__row
|
||||
&:nth-child(odd)
|
||||
background: lighten($color-subtle-light, 2)
|
||||
|
||||
&.c-table__row--foot
|
||||
background: $color-subtle-light
|
||||
border-top: 2px solid $color-theme
|
||||
|
||||
.c-table__cell:first-child
|
||||
@extend .u-text-label
|
||||
color: $color-theme
|
||||
|
||||
|
||||
//- Table cell
|
||||
|
||||
.c-table__cell
|
||||
padding: 1rem
|
||||
|
||||
&:not(:last-child)
|
||||
border-right: 1px solid $color-subtle
|
||||
|
||||
|
||||
//- Table head cell
|
||||
|
||||
.c-table__head-cell
|
||||
font-weight: bold
|
||||
color: $color-theme
|
||||
background: $color-back
|
||||
padding: 1rem 0.5rem
|
||||
border-bottom: 2px solid $color-theme
|
||||
|
||||
|
||||
//- Responsive table
|
||||
//- Shadows adapted from "CSS only Responsive Tables" by David Bushell
|
||||
//- http://codepen.io/dbushell/pen/wGaamR
|
||||
|
||||
@include breakpoint(max, md)
|
||||
.c-table
|
||||
@include scroll-shadow-base($color-front)
|
||||
display: inline-block
|
||||
overflow-x: auto
|
||||
width: auto
|
||||
-webkit-overflow-scrolling: touch
|
||||
|
||||
|
||||
//- Table Cell
|
||||
|
||||
.c-table__cell
|
||||
padding: 1rem
|
||||
border: 1px solid $color-subtle
|
||||
|
||||
&.c-table__cell--highlight
|
||||
border: 2px solid $color-theme
|
||||
|
||||
@include breakpoint(max, md)
|
||||
.c-table__cell,
|
||||
.c-table__head-cell
|
||||
&:first-child
|
||||
@include scroll-shadow-cover(left, $color-back)
|
||||
|
||||
&:last-child
|
||||
@include scroll-shadow-cover(right, $color-back)
|
||||
|
||||
.c-table__row--foot .c-table__cell
|
||||
&:first-child
|
||||
@include scroll-shadow-cover(left, lighten($color-subtle-light, 2))
|
||||
|
||||
//- Table Head Cell
|
||||
|
||||
.c-table__head-cell
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
padding: 1rem
|
||||
border: 1px solid $color-theme
|
||||
&:last-child
|
||||
@include scroll-shadow-cover(right, lighten($color-subtle-light, 2))
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 MIXINS
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > MIXINS
|
||||
|
||||
// Helper for position
|
||||
// $position - valid position value (static, absolute, fixed, relative)
|
||||
|
@ -38,18 +36,6 @@
|
|||
@content
|
||||
|
||||
|
||||
// Triangle pointing down
|
||||
// $triangle-size - width of the triangle
|
||||
|
||||
@mixin triangle-down($triangle-size)
|
||||
@include size(0)
|
||||
border-color: transparent
|
||||
border-style: solid
|
||||
border-top-color: inherit
|
||||
border-width: $triangle-size $triangle-size 0 $triangle-size
|
||||
content: ""
|
||||
|
||||
|
||||
// Scroll shadows for reponsive tables
|
||||
// adapted from David Bushell, http://codepen.io/dbushell/pen/wGaamR
|
||||
// $scroll-shadow-color - color of shadow
|
||||
|
|
|
@ -1,20 +1,20 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 VARIABLES
|
||||
//- ----------------------------------
|
||||
//- 💫 CSS > VARIABLES
|
||||
|
||||
// Settings and Sizes
|
||||
|
||||
$type-base: 11px
|
||||
|
||||
$nav-height: 55px
|
||||
$content-width: 800px
|
||||
$sidebar-width: 230px
|
||||
$aside-width: 300px
|
||||
$nav-height: 45px
|
||||
$content-width: 1250px
|
||||
$sidebar-width: 200px
|
||||
$aside-width: 500px
|
||||
$aside-padding: 25px
|
||||
|
||||
$logo-sizes: ( large: 500px, medium: 250px, small: 100px, tiny: 65px )
|
||||
$grid: ( third: 3, half: 2, two-thirds: 1.5 )
|
||||
$logo-width: 85px
|
||||
|
||||
$grid: ( quarter: 4, third: 3, half: 2, two-thirds: 1.5, three-quarters: 1.33 )
|
||||
$breakpoints: ( sm: 768px, md: 992px, lg: 1200px )
|
||||
$headings: (1: 3, 2: 2.6, 3: 2, 4: 1.8, 5: 1.5)
|
||||
|
||||
|
||||
// Fonts
|
||||
|
@ -25,13 +25,24 @@ $font-code: 'Source Code Pro', Consolas, 'Andale Mono', Menlo, Monaco, Courier,
|
|||
|
||||
// Colors
|
||||
|
||||
$color-theme: #09a3d5
|
||||
$color-theme-dark: #008ebc
|
||||
$color-back: #fff
|
||||
$color-front: #222
|
||||
$colors: ( blue: #09a3d5, red: #d9515d )
|
||||
|
||||
$color-subtle: #ddd
|
||||
$color-subtle-light: #f6f6f6
|
||||
$color-subtle-dark: #999
|
||||
$color-back: #fff !default
|
||||
$color-front: #1a1e23 !default
|
||||
$color-dark: lighten($color-front, 20) !default
|
||||
|
||||
$syntax-highlighting: ( comment: #999, tag: #3ec930, number: #8130c9, selector: #09a3d5, operator: #e4514f, function: #09a3d5, keyword: #e4514f, regex: #f4c025 )
|
||||
$color-theme: map-get($colors, $theme)
|
||||
$color-theme-dark: darken(map-get($colors, $theme), 5)
|
||||
|
||||
$color-subtle: #ddd !default
|
||||
$color-subtle-light: #f6f6f6 !default
|
||||
$color-subtle-dark: #949e9b !default
|
||||
|
||||
$color-red: #d9515d
|
||||
$color-green: #3ec930
|
||||
$color-yellow: #f4c025
|
||||
|
||||
$syntax-highlighting: ( comment: #949e9b, tag: #3ec930, number: #B084EB, selector: #FFB86C, operator: #FF2C6D, function: #09a3d5, keyword: #45A9F9, regex: #f4c025 )
|
||||
|
||||
$pattern: $color-theme url("/assets/img/pattern_#{$theme}.jpg") center top repeat
|
||||
$pattern-overlay: transparent url("/assets/img/pattern_landing.jpg") center -138px no-repeat
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 STYLE
|
||||
//- ----------------------------------
|
||||
//- 💫 STYLESHEET
|
||||
|
||||
$theme: blue !default
|
||||
|
||||
|
||||
// Variables
|
||||
|
@ -25,6 +25,7 @@
|
|||
@import _components/asides
|
||||
@import _components/buttons
|
||||
@import _components/code
|
||||
@import _components/landing
|
||||
@import _components/lists
|
||||
@import _components/misc
|
||||
@import _components/navigation
|
||||
|
|
4
website/assets/css/style_red.sass
Normal file
|
@ -0,0 +1,4 @@
|
|||
//- 💫 STYLESHEET (RED)
|
||||
|
||||
$theme: red
|
||||
@import style
|
68
website/assets/img/graphics.svg
Normal file
|
@ -0,0 +1,68 @@
|
|||
<svg style="position: absolute; width: 0; height: 0;" width="0" height="0" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<defs>
|
||||
<symbol id="brain" viewBox="0 0 300 150">
|
||||
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
|
||||
<title>brain</title>
|
||||
<path stroke-width="4" stroke-miterlimit="10" fill="none" stroke="currentColor" d="M187.2 76.1h-5c-1.6 0-2.9-1.3-2.9-2.9V62.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 3-2.9 3zM221.1 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM221.1 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM221.1 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 63.8h-5c-1.6 0-2.9-1.3-2.9-2.9V50.5c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 88.3h-5c-1.6 0-2.9-1.3-2.9-2.9V74.9c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM263.2 112.7h-5c-1.6 0-2.9-1.3-2.9-2.9V99.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM263.2 39.4h-5c-1.6 0-2.9-1.3-2.9-2.9V26.1c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM191.5 54.3L207.8 34M195.5 61.1l12.3-4M191.5 80.1l16.3 20.4M195.5 73.3l12.3 4.1M236 39.1l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6L243.4 98c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L232 58.8c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1 0-6.3 4.7-3.7 7.8z"
|
||||
/>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M96.1 124.1H63v-11.7c0-12.6-3.7-25-10.7-35.5l-5.9-8.8c-3.2-4.8-4.9-10.4-4.9-16.1 0-22.3 18.1-40.4 40.4-40.4 17.6 0 33.1 11.4 38.5 28.1l10.8 33.8h-11v16.9c0 3.7-3 6.7-6.7 6.7h-12v12.3H77.3V90.2c0-.8-.2-1.6-.5-2.3l-4.5-11.3c-1.7-4.1 1.4-8.6 5.8-8.6 2 0 4-1 5.1-2.7L91.8 53h15.6c0-14-11.3-25.3-25.3-25.3h-.3c-14 0-25.3 11.3-25.3 25.3v1c0 4 3.2 7.2 7.2 7.2 2.4 0 4.6-1.2 6-3.2l11.2-16.8h10.8M139 68.7h29.4"
|
||||
/>
|
||||
</symbol>
|
||||
|
||||
<symbol id="computer" viewBox="0 0 300 150">
|
||||
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
|
||||
<title>computer</title>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M56.2 87.7h-5c-1.6 0-2.9-1.3-2.9-2.9V74.4c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM90.1 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM90.1 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM90.1 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 75.5h-5c-1.6 0-2.9-1.3-2.9-2.9V62.2c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.6-1.3 2.9-2.9 2.9zM132.2 99.9h-5c-1.6 0-2.9-1.3-2.9-2.9V86.6c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9V97c0 1.6-1.3 2.9-2.9 2.9zM132.2 124.4h-5c-1.6 0-2.9-1.3-2.9-2.9V111c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM132.2 51.1h-5c-1.6 0-2.9-1.3-2.9-2.9V37.7c0-1.6 1.3-2.9 2.9-2.9h5c1.6 0 2.9 1.3 2.9 2.9v10.4c0 1.7-1.3 3-2.9 3zM60.5 66l16.3-20.3M64.5 72.8l12.3-4.1M60.5 91.8l16.3 20.4M64.5 85l12.3 4.1M105 50.8l11.4 13.6c1.4 1.7 1.5 4.2.1 6l-15.6 19.7c-1.4 1.8-1.4 4.3.1 6l11.4 13.6c2.6 3.1.4 7.8-3.7 7.8-4 0-6.3-4.7-3.7-7.8l11.4-13.6c1.4-1.7 1.5-4.2.1-6L101 70.5c-1.4-1.8-1.4-4.4.2-6.1l12-13.5c2.7-3.1.6-7.9-3.6-7.9h-.9c-4.1-.1-6.3 4.7-3.7 7.8z"
|
||||
/>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M195.1 42.4h49v40.5h-49z" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M251.9 116.7h-64.6c-2.2 0-4-1.8-4-4V34.6c0-2.2 1.8-4 4-4h64.6c2.2 0 4 1.8 4 4v78.1c0 2.2-1.8 4-4 4z" />
|
||||
<path fill="currentColor" d="M191.8 103.2h6.8v6.8h-6.8zM235.6 91.3v3.4h-21.9v5.1h21.9v3.4h11.9V91.3" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M245.5 130.2h-51.8c-3.9 0-7-3.1-7-7v-6.5h65.8v6.5c0 3.8-3.1 7-7 7zM146 79.6h25.3" />
|
||||
</symbol>
|
||||
|
||||
<symbol id="eye" viewBox="0 0 300 150">
|
||||
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
|
||||
<title>eye</title>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M40.7 37.1h95.7v71.7H40.7z" />
|
||||
<path fill="currentColor" d="M30.4 43.9h10.2v13.7H30.4z" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M30.4 64.4h10.2v13.7H30.4zM30.4 88.3h10.2V102H30.4zM146 59.3h-9.7V45.6h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.2-2.5 5.7-5.7 5.7zM146 96.9h-9.7V83.2h9.7c3.1 0 5.7 2.5 5.7 5.7v2.3c0 3.1-2.5 5.7-5.7 5.7zM59.5 108.8v15.4M117.5 108.8v15.4M40.7 98.3h72V70.6H125M40.7 50.8h53.6M55.3 68.2h10.8v8.7H55.3zM74.7 68.2h10.8v8.7H74.7z"
|
||||
/>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M101.3 77h-3.6c-2 0-3.6-1.6-3.6-3.6v-1.5c0-2 1.6-3.6 3.6-3.6h3.6c2 0 3.6 1.6 3.6 3.6v1.5c0 1.9-1.6 3.6-3.6 3.6zM40.7 88.3h58.8v-7M80.1 88.3v-7M60.7 88.3v-7M80.1 61.7V50.8M60.7 50.8v10.9M104.1 47.8c2.8 5.1-2.4 10.3-7.6 7.6-.7-.4-1.3-1-1.7-1.7-2.8-5.1 2.4-10.3 7.6-7.6.7.4 1.3 1 1.7 1.7z"
|
||||
/>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M104.9 50.8H125V37.1M136.3 90H123" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M269.5 56.4c15.3 30-14.5 59.8-44.5 44.5-4.9-2.5-8.9-6.5-11.4-11.4C198.2 59.5 228 29.7 258 45c4.9 2.5 8.9 6.5 11.5 11.4z" />
|
||||
<path fill="currentColor" d="M249.5 73c-4.4 0-8-3.6-8-8 0-2.3 1-4.3 2.5-5.8-.8-.1-1.6-.2-2.5-.2-7.8 0-14 6.3-14 14 0 7.8 6.3 14 14 14s14-6.3 14-14c0-.9-.1-1.7-.2-2.5-1.4 1.5-3.5 2.5-5.8 2.5z" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M195 73h-36.7" />
|
||||
</symbol>
|
||||
|
||||
<symbol id="bubble" viewBox="0 0 300 150">
|
||||
<!-- by Kemal Sanli: https://dribbble.com/kemal -->
|
||||
<title>bubble</title>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M153.5 69h-32.2M88.2 68.6c1.2 9.2-6.6 17-15.8 15.8-6.3-.8-11.4-5.9-12.2-12.2C59 63 66.8 55.2 76 56.4c6.3.9 11.4 5.9 12.2 12.2z" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M88.2 68.6c1.2 9.2-6.6 17-15.8 15.8-6.3-.8-11.4-5.9-12.2-12.2C59 63 66.8 55.2 76 56.4c6.3.9 11.4 5.9 12.2 12.2z" />
|
||||
<path fill="currentColor" d="M77.7 70.5c-1.9 0-3.5-1.6-3.5-3.5 0-1 .4-1.9 1.1-2.5-.4-.1-.7-.1-1.1-.1-3.4 0-6.2 2.8-6.2 6.2 0 3.4 2.8 6.2 6.2 6.2s6.2-2.8 6.2-6.2c0-.4 0-.7-.1-1.1-.7.5-1.6 1-2.6 1z" />
|
||||
<path d="M43.9 38.3h60.5v62.6H43.9z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" />
|
||||
<path d="M43.9 112.3c0 5.8 4.7 10.5 10.5 10.5h39.4c5.8 0 10.5-4.7 10.5-10.5v-11.4H43.9v11.4zM93.9 20.2H54.5c-5.8 0-10.5 4.7-10.5 10.5v7.6h60.5v-7.6c-.1-5.8-4.8-10.5-10.6-10.5z" fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10"
|
||||
/>
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M79.3 110.3c.8 3.8-2.5 7.1-6.3 6.3-1.9-.4-3.5-2-3.9-3.9-.8-3.8 2.5-7.1 6.3-6.3 1.9.5 3.4 2 3.9 3.9zM69.3 30.1h9.8" />
|
||||
<path fill="none" stroke="currentColor" stroke-width="4" stroke-miterlimit="10" d="M264 41h-93c-2.3 0-4.2 1.9-4.2 4.2v42.3c0 2.3 1.9 4.2 4.2 4.2h7v22.5l22.5-22.5H264c2.3 0 4.2-1.9 4.2-4.2V45.2c0-2.3-1.9-4.2-4.2-4.2z" />
|
||||
<path fill="currentColor" d="M183.4 53.8c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.5.2 1 .6 1.2 1.2zM189.4 52.2h16.9v5.6h-16.9zM211.9 52.2h33.8v5.6h-33.8zM178.1 74.8h5.6v5.6h-5.6zM189.4 74.8h33.8v5.6h-33.8zM240.1 74.8H257v5.6h-16.9zM251.3 52.2h5.6v5.6h-5.6zM178.1 63.5h22.5v5.6h-22.5zM217.5 63.5h12.7v5.6h-12.7zM234.4 63.5h22.5v5.6h-22.5zM209.2 69.1h-.3c-1.5 0-2.7-1.2-2.7-2.7v-.3c0-1.5 1.2-2.7 2.7-2.7h.3c1.5 0 2.7 1.2 2.7 2.7v.3c0 1.5-1.2 2.7-2.7 2.7zM234.1 76.3c1.2 2.5-1.3 5-3.8 3.8-.5-.2-1-.7-1.2-1.2-1.2-2.5 1.3-5 3.8-3.8.6.2 1 .7 1.2 1.2z"
|
||||
/>
|
||||
</symbol>
|
||||
|
||||
<symbol id="spacy" viewBox="0 0 675 215">
|
||||
<title>spacy</title>
|
||||
<path fill="currentColor" d="M83.6 83.3C68.3 81.5 67.2 61 47.5 62.8c-9.5 0-18.4 4-18.4 12.7 0 13.2 20.3 14.4 32.5 17.7 20.9 6.3 41 10.7 41 33.3 0 28.8-22.6 38.8-52.4 38.8-24.9 0-50.2-8.9-50.2-31.8 0-6.4 6.1-11.3 12-11.3 7.5 0 10.1 3.2 12.7 8.4 5.8 10.2 12.3 15.6 28.3 15.6 10.2 0 20.6-3.9 20.6-12.7 0-12.6-12.8-15.3-26.1-18.4-23.5-6.6-43.6-10-46-36.1C-1 34.5 91.7 32.9 97 71.9c.1 7.1-6.5 11.4-13.4 11.4zm110.2-39c32.5 0 51 27.2 51 60.8 0 33.7-17.9 60.8-51 60.8-18.4 0-29.8-7.8-38.1-19.8v44.5c0 13.4-4.3 19.8-14.1 19.8-11.9 0-14.1-7.6-14.1-19.8V61.3c0-10.6 4.4-17 14.1-17 9.1 0 14.1 7.2 14.1 17v3.6c9.2-11.6 19.7-20.6 38.1-20.6zm-7.7 98.4c19.1 0 27.6-17.6 27.6-38.1 0-20.1-8.6-38.1-27.6-38.1-19.8 0-29 16.3-29 38.1 0 21.2 9.2 38.1 29 38.1zM266.9 76c0-23.4 26.9-31.7 52.9-31.7 36.6 0 51.7 10.7 51.7 46v34c0 8.1 5 24.1 5 29 0 7.4-6.8 12-14.1 12-8.1 0-14.1-9.5-18.4-16.3-11.9 9.5-24.5 16.3-43.8 16.3-21.3 0-38.1-12.6-38.1-33.3 0-18.4 13.2-28.9 29-32.5 0 .1 51-12 51-12.1 0-15.7-5.5-22.6-22-22.6-14.5 0-21.9 4-27.5 12.7-4.5 6.6-4 10.6-12.7 10.6-6.9-.1-13-4.9-13-12.1zm43.6 70.2c22.3 0 31.8-11.8 31.8-35.3v-5c-6 2-30.3 8-36.8 9.1-7 1.4-14.1 6.6-14.1 14.9.1 9.1 9.4 16.3 19.1 16.3zM474.5 0c31.5 0 65.7 18.8 65.7 48.8 0 7.7-5.8 14.1-13.4 14.1-10.3 0-11.8-5.5-16.3-13.4-7.6-13.9-16.5-23.3-36.1-23.3-30.2-.2-43.7 25.6-43.7 57.8 0 32.4 11.2 55.8 42.4 55.8 20.7 0 32.2-12 38.1-27.6 2.4-7.1 6.7-14.1 15.6-14.1 7 0 14.1 7.2 14.1 14.8 0 31.8-32.4 53.8-65.8 53.8-36.5 0-57.2-15.4-68.5-41-5.5-12.2-9.1-24.9-9.1-42.4-.1-49.2 28.6-83.3 77-83.3zm180.3 44.3c8 0 12.7 5.2 12.7 13.4 0 3.3-2.6 9.9-3.6 13.4L625.1 173c-8.6 22.1-15.1 37.4-44.5 37.4-14 0-26.1-1.2-26.1-13.4 0-7 5.3-10.6 12.7-10.6 1.4 0 3.6.7 5 .7 2.1 0 3.6.7 5 .7 14.7 0 16.8-15.1 22-25.5l-37.4-92.6c-2.1-5-3.6-8.4-3.6-11.3 0-8.2 6.4-14.1 14.8-14.1 9.5 0 13.3 7.5 15.6 15.6l24.7 73.5L638 65.5c3.9-10.5 4.2-21.2 16.8-21.2z" />
|
||||
</symbol>
|
||||
|
||||
<symbol id="explosion" viewBox="0 0 500 500">
|
||||
<title>explosion</title>
|
||||
<path fill="currentColor" d="M111.7 74.9L91.2 93.1l9.1 10.2 17.8-15.8 7.4 8.4-17.8 15.8 10.1 11.4 20.6-18.2 7.7 8.7-30.4 26.9-41.9-47.3 30.3-26.9 7.6 8.6zM190.8 59.6L219 84.3l-14.4 4.5-20.4-18.2-6.4 26.6-14.4 4.5 8.9-36.4-26.9-24.1 14.3-4.5L179 54.2l5.7-25.2 14.3-4.5-8.2 35.1zM250.1 21.2l27.1 3.4c6.1.8 10.8 3.1 14 7.2 3.2 4.1 4.5 9.2 3.7 15.5-.8 6.3-3.2 11-7.4 14.1-4.1 3.1-9.2 4.3-15.3 3.5L258 63.2l-2.8 22.3-13-1.6 7.9-62.7zm11.5 13l-2.2 17.5 12.6 1.6c5.1.6 9.1-2 9.8-7.6.7-5.6-2.5-9.2-7.6-9.9l-12.6-1.6zM329.1 95.4l23.8 13.8-5.8 10L312 98.8l31.8-54.6 11.3 6.6-26 44.6zM440.5 145c-1.3 8.4-5.9 15.4-13.9 21.1s-16.2 7.7-24.6 6.1c-8.4-1.6-15.3-6.3-20.8-14.1-5.5-7.9-7.6-16-6.4-24.4 1.3-8.5 6-15.5 14-21.1 8-5.6 16.2-7.7 24.5-6 8.4 1.6 15.4 6.3 20.9 14.2 5.5 7.6 7.6 15.7 6.3 24.2zM412 119c-5.1-.8-10.3.6-15.6 4.4-5.2 3.7-8.4 8.1-9.4 13.2-1 5.2.2 10.1 3.5 14.8 3.4 4.8 7.5 7.5 12.7 8.2 5.2.8 10.4-.7 15.6-4.4 5.3-3.7 8.4-8.1 9.4-13.2 1.1-5.1-.1-9.9-3.4-14.7-3.4-4.8-7.6-7.6-12.8-8.3zM471.5 237.9c-2.8 4.8-7.1 7.6-13 8.7l-2.6-13.1c5.3-.9 8.1-5 7.2-11-.9-5.8-4.3-8.8-8.9-8.2-2.3.3-3.7 1.4-4.5 3.3-.7 1.9-1.4 5.2-1.7 10.1-.8 7.5-2.2 13.1-4.3 16.9-2.1 3.9-5.7 6.2-10.9 7-6.3.9-11.3-.5-15.2-4.4-3.9-3.8-6.3-9-7.3-15.7-1.1-7.4-.2-13.7 2.6-18.8 2.8-5.1 7.4-8.2 13.7-9.2l2.6 13c-5.6 1.1-8.7 6.6-7.7 13.4 1 6.6 3.9 9.5 8.6 8.8 4.4-.7 5.7-4.5 6.7-14.1.3-3.5.7-6.2 1.1-8.4.4-2.2 1.2-4.4 2.2-6.8 2.1-4.7 6-7.2 11.8-8.1 5.4-.8 10.3.4 14.5 3.7 4.2 3.3 6.9 8.5 8 15.6.9 6.9-.1 12.6-2.9 17.3zM408.6 293.5l2.4-12.9 62 11.7-2.4 12.9-62-11.7zM419.6 396.9c-8.3 2-16.5.3-24.8-5-8.2-5.3-13.2-12.1-14.9-20.5-1.6-8.4.1-16.6 5.3-24.6 5.2-8.1 11.9-13.1 20.2-15.1 8.4-1.9 16.6-.3 24.9 5 8.2 5.3 13.2 12.1 14.8 20.5 1.7 8.4 0 16.6-5.2 24.7-5.2 8-12 13-20.3 15zm13.4-36.3c-1.2-5.1-4.5-9.3-9.9-12.8s-10.6-4.7-15.8-3.7-9.3 4-12.4 8.9-4.1 9.8-2.8 14.8c1.2 5.1 4.5 9.3 9.9 12.8 5.5 3.5 10.7 4.8 15.8 3.7 5.1-.9 9.2-3.8 12.3-8.7s4.1-9.9 2.9-15zM303.6 416.5l9.6-5.4 43.3 20.4-19.2-34 11.4-6.4 31 55-9.6 5.4-43.4-20.5 19.2 34.1-11.3 6.4-31-55zM238.2 468.8c-49 0-96.9-17.4-134.8-49-38.3-32-64-76.7-72.5-125.9-2-11.9-3.1-24-3.1-35.9 0-36.5 9.6-72.6 27.9-104.4 2.1-3.6 6.7-4.9 10.3-2.8 3.6 2.1 4.9 6.7 2.8 10.3-16.9 29.5-25.9 63.1-25.9 96.9 0 11.1 1 22.3 2.9 33.4 7.9 45.7 31.8 87.2 67.3 116.9 35.2 29.3 79.6 45.5 125.1 45.5 11.1 0 22.3-1 33.4-2.9 4.1-.7 8 2 8.7 6.1.7 4.1-2 8-6.1 8.7-11.9 2-24 3.1-36 3.1z"/>
|
||||
</symbol>
|
||||
|
||||
<symbol id="matt-signature" viewBox="0 0 500 250">
|
||||
<title>matt-signature</title>
|
||||
<path fill="currentColor" d="M18.6 207c-.3-18.8-.8-37.5-1.4-56.2-.6-18.7-1-37.5-1-56.2v-7.2c0-3.5 0-7 .2-11v-18c.8-2.7 1.8-5 3-6.5 1.6-2 3.6-3 6.4-3 3 0 5.4 1 7.6 2 2.2 2 4 4 5.3 6l36.6 71 1.8 3c1 1 2 3 3 3h1l1 1 1-3 22-76c2-3 3-5 4-8l2-9c1-3 2-6 4-8 1-3 4-5 7-7h2c5 0 8 1 10 4 3 2 4 5 5 9 1 3 2 7 1 12v11l1 7c0 3 0 7 1 12 0 4 1 9 1 14l1 14.2 1 12 .6 6v1l1 7.5 1 11.6 1.4 12 1.4 8 1 4 1.7 5.5 1.7 6c.7 1.7 1 3 1.5 3.6-.5 4-1.5 7-3 9-1 2-4 3-8 3h-6l-3-3c-1-1.4-2-2.3-2-3l-4-14-7.6-58V88c0-3.5-1-7-2-10l-2 1.7-18 74v6c0 2-.2 4-1 6 0 2-1 3.5-3 5-1 1.3-3 2-5 2.2-1 0-2 0-3-1l-3.4-2-3-3c-1-1-1.7-2-2-3l-35-52-5.3-10.6v22c0 10.2.2 20.3.6 30.2.4 10 .6 20 .6 30.2v22c0 2-1 4-3 5.4s-3 3-5 3c-3 0-5 0-7-1-1-1-3-3-4-5zm205-63.2c-1.6 2.7-3.4 6-5.3 9.8l-6.2 12.2c-2 4.3-4 8.6-7 13-2 4.2-5 8.2-8 11.7s-5 6.6-9 9c-3 2.5-6 4-9 4.4-1 0-3-1-4-1l-5-2c-1-1-3-2-4-3s-1-3-1-5c1-18 2-33 4-47s6-27 11-38 12-20 20-27 18-12 29-15l2-1h2c5 0 9 2 11 7s4 12 5 23c1 10 2 24 2 40 1 16 2 36 3 59l1 4v5c0 2.6-1 4.5-2 6s-3 2-5 2c-5 0-8-1.7-10-4s-3-6.6-4-11v-4l-1-9s-1-6.7-1-10l-1-8.5v-1l-.2-6-1-7-.5-8.6-1-1zM218 93.5c-4.7 3.4-9.2 8-13.6 13.7-4.4 5.8-7.5 11.3-9.4 16.8-.8 2.5-1.8 6-2.8 10.4-1 4.4-2 8.8-2.7 13l-2 12-.7 7c.2 0 .4-.2.6-.5l.6-1c10.5-10 18-21 22.2-33 4.6-12 7-25 7.7-39zm72 47c-2.3 0-4.4.6-6.2 1.8-2 1.2-4 1.8-6.6 1.8h-5.4c-.7-1-1.4-1-2.3-2l-2.5-2c-.8 0-1.6-1-2.2-2-.6-1-1-2-1-3 0-2 1-4 3-6 2-1 4.5-3 7.2-4l8.3-3s5-2 6.7-3v-11c0-12-.6-25-1.8-38-1.2-12-1.8-25-1.8-37 0-3 .8-6 2.5-7 1-1 4-1 6-1 3 0 6 1 7 3s2 4 3 7c0 3 1 6 1 9v20l1 18 1 18 1 12 4-1 6-2 6-2 4-1 14-6c4-2.3 9-3.4 14-3.4 3 0 6 1 7 3.5s3 5 3 8c0 2-1 4-3 5l-6 3-46 17-1.5 1s-1 0-1.5 1v8c0 6 0 12 .5 18s1 12.3 2 18.3l3 15c1 5 1.4 10 1.4 15 0 1.4-.6 3.5-1.6 6s-2 4-4.7 4c-5 0-8.7-1.6-11.6-4-3-3-4.3-6.6-4.6-11l-2.2-29-2.7-30h-1zm112 0c-2.4 0-4.5.6-6.3 1.8-2 1.2-4 1.8-6.6 1.8h-5c0-1-1-1-2-2l-2-2c-1 0-1-1-2-2 0-1-1-2-1-3 0-2 1-4 3-6 2-1 5-3 7-4l8-3s5-2 7-3v-11c0-12 0-25-2-38-1-12-1-25-1-37 0-3 1-6 3-7s4-1 7-1c4 0 6 1 8 3s3 4 3 7c1 3 1 6 1 9s0 6 1 8v11l1 18 1 18 1 12 4-1 6-2 6-2 4-1 14-6c4-2 9-4 14-4 4 0 6 1 8 4s3 5 3 8c0 2-1 4-2 5l-5.3 3-49 13.8-1.5 1s-1 .5-1.5 1V157l1 18.3c0 5 1 10 2 15s1 10 1 15c0 1.5-1 3.6-2 6s-3 4-5 4c-5 0-9-1.5-12-4.2s-5-6-5-11l-3-28.3-3-30.3h-1z"/>
|
||||
</defs>
|
||||
</svg>
|
After Width: | Height: | Size: 15 KiB |
|
@ -1,29 +1,32 @@
|
|||
<svg style="position: absolute; width: 0; height: 0;" width="0" height="0" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<defs>
|
||||
<symbol id="icon-mail" viewBox="0 0 32 32">
|
||||
<title>mail</title>
|
||||
<path class="path1" d="M29 4h-26c-1.657 0-3 1.343-3 3v18c0 1.656 1.343 3 3 3h26c1.657 0 3-1.344 3-3v-18c0-1.657-1.343-3-3-3zM2.741 25.99l-0.731-0.732 8.249-8.248 0.731 0.732-8.249 8.248zM29.259 25.99l-8.249-8.248 0.731-0.732 8.249 8.248-0.731 0.732zM17 19.325v0.675h-2v-0.675l-12.997-12.050 1.272-1.272 12.725 11.798 12.725-11.798 1.272 1.272-12.997 12.050z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-menu" viewBox="0 0 24 24">
|
||||
<title>menu</title>
|
||||
<path class="path1" d="M3 5h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293zM3 17h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293zM3 11h18q0.414 0 0.707 0.293t0.293 0.707-0.293 0.707-0.707 0.293h-18q-0.414 0-0.707-0.293t-0.293-0.707 0.293-0.707 0.707-0.293z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-link" viewBox="0 0 32 32">
|
||||
<title>link</title>
|
||||
<path class="path1" d="M13.757 19.868c-0.416 0-0.832-0.159-1.149-0.476-2.973-2.973-2.973-7.81 0-10.783l6-6c1.44-1.44 3.355-2.233 5.392-2.233s3.951 0.793 5.392 2.233c2.973 2.973 2.973 7.81 0 10.783l-2.743 2.743c-0.635 0.635-1.663 0.635-2.298 0s-0.635-1.663 0-2.298l2.743-2.743c1.706-1.706 1.706-4.481 0-6.187-0.826-0.826-1.925-1.281-3.094-1.281s-2.267 0.455-3.094 1.281l-6 6c-1.706 1.706-1.706 4.481 0 6.187 0.635 0.635 0.635 1.663 0 2.298-0.317 0.317-0.733 0.476-1.149 0.476z"></path>
|
||||
<path class="path2" d="M8 31.625c-2.037 0-3.952-0.793-5.392-2.233-2.973-2.973-2.973-7.81 0-10.783l2.743-2.743c0.635-0.635 1.664-0.635 2.298 0s0.635 1.663 0 2.298l-2.743 2.743c-1.706 1.706-1.706 4.481 0 6.187 0.826 0.826 1.925 1.281 3.094 1.281s2.267-0.455 3.094-1.281l6-6c1.706-1.706 1.706-4.481 0-6.187-0.635-0.635-0.635-1.663 0-2.298s1.663-0.635 2.298 0c2.973 2.973 2.973 7.81 0 10.783l-6 6c-1.44 1.44-3.355 2.233-5.392 2.233z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-github" viewBox="0 0 27 32">
|
||||
<title>github</title>
|
||||
<path class="path1" d="M13.714 2.286q3.732 0 6.884 1.839t4.991 4.991 1.839 6.884q0 4.482-2.616 8.063t-6.759 4.955q-0.482 0.089-0.714-0.125t-0.232-0.536q0-0.054 0.009-1.366t0.009-2.402q0-1.732-0.929-2.536 1.018-0.107 1.83-0.321t1.679-0.696 1.446-1.188 0.946-1.875 0.366-2.688q0-2.125-1.411-3.679 0.661-1.625-0.143-3.643-0.5-0.161-1.446 0.196t-1.643 0.786l-0.679 0.429q-1.661-0.464-3.429-0.464t-3.429 0.464q-0.286-0.196-0.759-0.482t-1.491-0.688-1.518-0.241q-0.804 2.018-0.143 3.643-1.411 1.554-1.411 3.679 0 1.518 0.366 2.679t0.938 1.875 1.438 1.196 1.679 0.696 1.83 0.321q-0.696 0.643-0.875 1.839-0.375 0.179-0.804 0.268t-1.018 0.089-1.17-0.384-0.991-1.116q-0.339-0.571-0.866-0.929t-0.884-0.429l-0.357-0.054q-0.375 0-0.518 0.080t-0.089 0.205 0.161 0.25 0.232 0.214l0.125 0.089q0.393 0.179 0.777 0.679t0.563 0.911l0.179 0.411q0.232 0.679 0.786 1.098t1.196 0.536 1.241 0.125 0.991-0.063l0.411-0.071q0 0.679 0.009 1.58t0.009 0.973q0 0.321-0.232 0.536t-0.714 0.125q-4.143-1.375-6.759-4.955t-2.616-8.063q0-3.732 1.839-6.884t4.991-4.991 6.884-1.839zM5.196 21.982q0.054-0.125-0.125-0.214-0.179-0.054-0.232 0.036-0.054 0.125 0.125 0.214 0.161 0.107 0.232-0.036zM5.75 22.589q0.125-0.089-0.036-0.286-0.179-0.161-0.286-0.054-0.125 0.089 0.036 0.286 0.179 0.179 0.286 0.054zM6.286 23.393q0.161-0.125 0-0.339-0.143-0.232-0.304-0.107-0.161 0.089 0 0.321t0.304 0.125zM7.036 24.143q0.143-0.143-0.071-0.339-0.214-0.214-0.357-0.054-0.161 0.143 0.071 0.339 0.214 0.214 0.357 0.054zM8.054 24.589q0.054-0.196-0.232-0.286-0.268-0.071-0.339 0.125t0.232 0.268q0.268 0.107 0.339-0.107zM9.179 24.679q0-0.232-0.304-0.196-0.286 0-0.286 0.196 0 0.232 0.304 0.196 0.286 0 0.286-0.196zM10.214 24.5q-0.036-0.196-0.321-0.161-0.286 0.054-0.25 0.268t0.321 0.143 0.25-0.25z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-twitter" viewBox="0 0 30 32">
|
||||
<title>twitter</title>
|
||||
<path class="path1" d="M28.929 7.286q-1.196 1.75-2.893 2.982 0.018 0.25 0.018 0.75 0 2.321-0.679 4.634t-2.063 4.437-3.295 3.759-4.607 2.607-5.768 0.973q-4.839 0-8.857-2.589 0.625 0.071 1.393 0.071 4.018 0 7.161-2.464-1.875-0.036-3.357-1.152t-2.036-2.848q0.589 0.089 1.089 0.089 0.768 0 1.518-0.196-2-0.411-3.313-1.991t-1.313-3.67v-0.071q1.214 0.679 2.607 0.732-1.179-0.786-1.875-2.054t-0.696-2.75q0-1.571 0.786-2.911 2.161 2.661 5.259 4.259t6.634 1.777q-0.143-0.679-0.143-1.321 0-2.393 1.688-4.080t4.080-1.688q2.5 0 4.214 1.821 1.946-0.375 3.661-1.393-0.661 2.054-2.536 3.179 1.661-0.179 3.321-0.893z"></path>
|
||||
<symbol id="icon-code" viewBox="0 0 20 20">
|
||||
<title>code</title>
|
||||
<path class="path1" d="M5.719 14.75c-0.236 0-0.474-0.083-0.664-0.252l-5.060-4.498 5.341-4.748c0.412-0.365 1.044-0.33 1.411 0.083s0.33 1.045-0.083 1.412l-3.659 3.253 3.378 3.002c0.413 0.367 0.45 0.999 0.083 1.412-0.197 0.223-0.472 0.336-0.747 0.336zM14.664 14.748l5.341-4.748-5.060-4.498c-0.413-0.367-1.045-0.33-1.411 0.083s-0.33 1.045 0.083 1.412l3.378 3.003-3.659 3.252c-0.413 0.367-0.45 0.999-0.083 1.412 0.197 0.223 0.472 0.336 0.747 0.336 0.236 0 0.474-0.083 0.664-0.252zM9.986 16.165l2-12c0.091-0.545-0.277-1.060-0.822-1.151-0.547-0.092-1.061 0.277-1.15 0.822l-2 12c-0.091 0.545 0.277 1.060 0.822 1.151 0.056 0.009 0.11 0.013 0.165 0.013 0.48 0 0.904-0.347 0.985-0.835z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-reddit" viewBox="0 0 32 32">
|
||||
<title>reddit</title>
|
||||
<path class="path1" d="M19.554 20.839q0.286 0.286 0 0.554-1.107 1.107-3.554 1.107t-3.554-1.107q-0.286-0.268 0-0.554 0.107-0.107 0.268-0.107t0.268 0.107q0.857 0.875 3.018 0.875 2.143 0 3.018-0.875 0.107-0.107 0.268-0.107t0.268 0.107zM14.071 17.607q0 0.661-0.464 1.125t-1.125 0.464-1.134-0.464-0.473-1.125q0-0.679 0.473-1.143t1.134-0.464 1.125 0.473 0.464 1.134zM21.125 17.607q0 0.661-0.473 1.125t-1.134 0.464-1.125-0.464-0.464-1.125 0.464-1.134 1.125-0.473 1.134 0.464 0.473 1.143zM25.607 15.464q0-0.875-0.625-1.5t-1.518-0.625-1.536 0.643q-2.321-1.607-5.554-1.714l1.125-5.054 3.571 0.804q0 0.661 0.464 1.125t1.125 0.464 1.134-0.473 0.473-1.134-0.473-1.134-1.134-0.473q-0.964 0-1.429 0.893l-3.946-0.875q-0.339-0.089-0.446 0.286l-1.232 5.571q-3.214 0.125-5.518 1.732-0.625-0.661-1.554-0.661-0.893 0-1.518 0.625t-0.625 1.5q0 0.625 0.33 1.143t0.884 0.786q-0.107 0.482-0.107 1 0 2.536 2.5 4.339t6.018 1.804q3.536 0 6.036-1.804t2.5-4.339q0-0.571-0.125-1.018 0.536-0.268 0.857-0.777t0.321-1.134zM32 16q0 3.25-1.268 6.214t-3.411 5.107-5.107 3.411-6.214 1.268-6.214-1.268-5.107-3.411-3.411-5.107-1.268-6.214 1.268-6.214 3.411-5.107 5.107-3.411 6.214-1.268 6.214 1.268 5.107 3.411 3.411 5.107 1.268 6.214z"></path>
|
||||
<symbol id="icon-anchor" viewBox="0 0 16 16">
|
||||
<title>anchor</title>
|
||||
<path class="path1" d="M14.779 12.779c-1.471 1.993-4.031 3.245-6.779 3.221-2.748 0.023-5.309-1.229-6.779-3.221l-1.221 1.221v-4h4l-1.1 1.099c0.882 1.46 2.357 2.509 4.1 2.807v-6.047c-1.723-0.446-3-1.997-3-3.858 0-2.209 1.791-4 4-4s4 1.791 4 4c0 1.862-1.277 3.413-3 3.858v6.047c1.742-0.297 3.218-1.347 4.099-2.807l-1.1-1.099h4v4l-1.221-1.221zM10 4c0-1.104-0.895-2-2-2s-2 0.895-2 2c0 1.104 0.895 2 2 2s2-0.896 2-2z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-book" viewBox="0 0 24 24">
|
||||
<title>book</title>
|
||||
<path class="path1" d="M18.984 6.984v-1.969h-9.984v1.969h9.984zM15 15v-2.016h-6v2.016h6zM18.984 11.016v-2.016h-9.984v2.016h9.984zM20.016 2.016c1.078 0 1.969 0.891 1.969 1.969v12c0 1.078-0.891 2.016-1.969 2.016h-12c-1.078 0-2.016-0.938-2.016-2.016v-12c0-1.078 0.938-1.969 2.016-1.969h12zM3.984 6v14.016h14.016v1.969h-14.016c-1.078 0-1.969-0.891-1.969-1.969v-14.016h1.969z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-pro" viewBox="0 0 20 20">
|
||||
<title>pro</title>
|
||||
<path class="path1" d="M10 1.6c-4.639 0-8.4 3.761-8.4 8.4s3.761 8.4 8.4 8.4 8.4-3.761 8.4-8.4c0-4.639-3.761-8.4-8.4-8.4zM15 11h-4v4h-2v-4h-4v-2h4v-4h2v4h4v2z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-con" viewBox="0 0 20 20">
|
||||
<title>con</title>
|
||||
<path class="path1" d="M10 1.6c-4.639 0-8.4 3.761-8.4 8.4s3.761 8.4 8.4 8.4 8.4-3.761 8.4-8.4c0-4.639-3.761-8.4-8.4-8.4zM15 11h-10v-2h10v2z"></path>
|
||||
</symbol>
|
||||
<symbol id="icon-neutral" viewBox="0 0 20 20">
|
||||
<title>neutral</title>
|
||||
<path class="path1" d="M9.999 0.8c-5.081 0-9.199 4.119-9.199 9.201 0 5.080 4.118 9.199 9.199 9.199s9.2-4.119 9.2-9.199c0-5.082-4.119-9.201-9.2-9.201zM10 13.001c-1.657 0-3-1.344-3-3s1.343-3 3-3c1.656 0 3 1.344 3 3s-1.344 3-3 3z"></path>
|
||||
</symbol>
|
||||
</defs>
|
||||
</svg>
|
||||
|
|
Before Width: | Height: | Size: 6.0 KiB After Width: | Height: | Size: 4.7 KiB |
Before Width: | Height: | Size: 4.1 KiB |
BIN
website/assets/img/logos/indico.png
Normal file
After Width: | Height: | Size: 1.2 KiB |
Before Width: | Height: | Size: 644 B |
Before Width: | Height: | Size: 1.2 KiB After Width: | Height: | Size: 1.2 KiB |
Before Width: | Height: | Size: 217 KiB |
BIN
website/assets/img/pattern_blue.jpg
Normal file
After Width: | Height: | Size: 225 KiB |
BIN
website/assets/img/pattern_landing.jpg
Normal file
After Width: | Height: | Size: 182 KiB |
BIN
website/assets/img/pattern_red.jpg
Normal file
After Width: | Height: | Size: 180 KiB |
BIN
website/assets/img/profile_matt.png
Normal file
After Width: | Height: | Size: 108 KiB |
BIN
website/assets/img/showcase/displacy-ent.jpg
Normal file
After Width: | Height: | Size: 32 KiB |
BIN
website/assets/img/showcase/displacy.jpg
Normal file
After Width: | Height: | Size: 16 KiB |
BIN
website/assets/img/showcase/foxtype.jpg
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
website/assets/img/showcase/indico.jpg
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
website/assets/img/showcase/kip.jpg
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
website/assets/img/showcase/laice.jpg
Normal file
After Width: | Height: | Size: 9.0 KiB |
BIN
website/assets/img/showcase/sense2vec.jpg
Normal file
After Width: | Height: | Size: 32 KiB |
BIN
website/assets/img/showcase/textanalysis.jpg
Normal file
After Width: | Height: | Size: 16 KiB |
BIN
website/assets/img/social.jpg
Normal file
After Width: | Height: | Size: 364 KiB |
Before Width: | Height: | Size: 247 KiB |
BIN
website/assets/img/social_docs.jpg
Normal file
After Width: | Height: | Size: 252 KiB |
Before Width: | Height: | Size: 504 KiB |
|
@ -2,54 +2,24 @@
|
|||
//- 💫 MAIN JAVASCRIPT
|
||||
//- ----------------------------------
|
||||
|
||||
'use strict';
|
||||
|
||||
const $ = document.querySelector.bind(document);
|
||||
const $$ = document.querySelectorAll.bind(document);
|
||||
'use strict'
|
||||
|
||||
{
|
||||
const updateVh = () => Math.max(document.documentElement.clientHeight, window.innerHeight || 0);
|
||||
const nav = document.querySelector('.js-nav')
|
||||
const fixedClass = 'is-fixed'
|
||||
let vh, scrollY = 0, scrollUp = false
|
||||
|
||||
const nav = $('.js-nav');
|
||||
const sidebar = $('.js-sidebar');
|
||||
const vhPadding = 525;
|
||||
|
||||
let vh = updateVh();
|
||||
let scrollY = 0;
|
||||
let scrollUp = false;
|
||||
const updateVh = () => Math.max(document.documentElement.clientHeight, window.innerHeight || 0)
|
||||
|
||||
const updateNav = () => {
|
||||
const vh = updateVh();
|
||||
const newScrollY = (window.pageYOffset || document.scrollTop) - (document.clientTop || 0);
|
||||
scrollUp = newScrollY <= scrollY;
|
||||
scrollY = newScrollY;
|
||||
const vh = updateVh()
|
||||
const newScrollY = (window.pageYOffset || document.scrollTop) - (document.clientTop || 0)
|
||||
scrollUp = newScrollY <= scrollY
|
||||
scrollY = newScrollY
|
||||
|
||||
if(scrollUp && !(isNaN(scrollY) || scrollY <= vh)) nav.classList.add('is-fixed');
|
||||
else if(!scrollUp || (isNaN(scrollY) || scrollY <= vh/2)) nav.classList.remove('is-fixed');
|
||||
if(scrollUp && !(isNaN(scrollY) || scrollY <= vh)) nav.classList.add(fixedClass)
|
||||
else if(!scrollUp || (isNaN(scrollY) || scrollY <= vh/2)) nav.classList.remove(fixedClass)
|
||||
}
|
||||
|
||||
const updateSidebar = () => {
|
||||
const sidebar = $('.js-sidebar');
|
||||
if(sidebar.offsetTop - scrollY <= 0) sidebar.classList.add('is-fixed');
|
||||
else sidebar.classList.remove('is-fixed');
|
||||
|
||||
[...$$('[data-section]')].map(el => {
|
||||
const trigger = el.getAttribute('data-section');
|
||||
|
||||
if(trigger) {
|
||||
const target = $(`#${trigger}`);
|
||||
const offset = parseInt(target.offsetTop);
|
||||
const height = parseInt(target.scrollHeight);
|
||||
|
||||
if((offset - scrollY) <= vh/2 && (offset - scrollY) > -height + vhPadding) {
|
||||
[...$$('[data-section]')].forEach(item => item.classList.remove('is-active'));
|
||||
$(`[data-section="${trigger}"]`).classList.add('is-active');
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
window.addEventListener('resize', () => vh = updateVh());
|
||||
window.addEventListener('scroll', updateNav);
|
||||
if($('.js-sidebar')) window.addEventListener('scroll', updateSidebar);
|
||||
window.addEventListener('scroll', () => requestAnimationFrame(updateNav))
|
||||
}
|
||||
|
|
|
@ -1,10 +0,0 @@
|
|||
{
|
||||
|
||||
"index": {
|
||||
"title" : "Blog"
|
||||
},
|
||||
|
||||
"announcement" : {
|
||||
"title": "Important Announcement"
|
||||
}
|
||||
}
|
|
@ -1,12 +0,0 @@
|
|||
include ../_includes/_mixins
|
||||
|
||||
.u-padding
|
||||
+label #[+date("2016-08-09")]
|
||||
|
||||
p.u-text-large Dear spaCy users,
|
||||
|
||||
p.u-text-medium Unfortunately, we (Henning Peters and Matthew Honnibal) are parting ways. Breaking up is never easy, and it's taken us a while to get our stuff together. Hopefully, you didn't notice anything was up — if you did, we hope you haven't been inconvenienced.
|
||||
|
||||
p.u-text-medium Here's how this is going to work: Matt will continue to develop and maintain spaCy and all related projects under his name. Nothing will change for you. Henning will take over our legal structure and start a new business under a new name.
|
||||
|
||||
p.u-text-medium Sincerely,#[br] Henning Peters and Matthew Honnibal
|
|
@ -1,5 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 BLOG INDEX (REDIRECT)
|
||||
//- ----------------------------------
|
||||
|
||||
script window.location = '!{SITE_URL}'
|
|
@ -1,167 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > ANNOTATION SPECS
|
||||
//- ----------------------------------
|
||||
|
||||
+section("annotation")
|
||||
+h(2, "annotation").
|
||||
Annotation Specifications
|
||||
|
||||
p.
|
||||
This document describes the target annotations spaCy is trained to predict.
|
||||
This is currently a work in progress. Please ask questions on the
|
||||
#[+a("https://github.com/" + SOCIAL.github + "/spaCy/issues") issue tracker],
|
||||
so that the answers can be integrated here to improve the documentation.
|
||||
|
||||
+section("annotation-tokenization")
|
||||
+h(3, "annotation-tokenization").
|
||||
Tokenization
|
||||
|
||||
p.
|
||||
Tokenization standards are based on the OntoNotes 5 corpus. The
|
||||
tokenizer differs from most by including tokens for significant
|
||||
whitespace. Any sequence of whitespace characters beyond a single
|
||||
space (' ') is included as a token. For instance:
|
||||
|
||||
+code.
|
||||
from spacy.en import English
|
||||
nlp = English(parser=False)
|
||||
tokens = nlp('Some\nspaces and\ttab characters')
|
||||
print([t.orth_ for t in tokens])
|
||||
|
||||
p Which produces:
|
||||
|
||||
+code.
|
||||
['Some', '\n', 'spaces', ' ', 'and', '\t', 'tab', 'characters']
|
||||
|
||||
p.
|
||||
The whitespace tokens are useful for much the same reason punctuation
|
||||
is – it's often an important delimiter in the text. By preserving it
|
||||
in the token output, we are able to maintain a simple alignment between
|
||||
the tokens and the original string, and we ensure that no information
|
||||
is lost during processing.
|
||||
|
||||
+section("annotation-sentence-boundary")
|
||||
+h(3, "annotation-sentence-boundary").
|
||||
Sentence boundary detection
|
||||
|
||||
p.
|
||||
Sentence boundaries are calculated from the syntactic parse tree, so
|
||||
features such as punctuation and capitalisation play an important but
|
||||
non-decisive role in determining the sentence boundaries. Usually
|
||||
this means that the sentence boundaries will at least coincide with
|
||||
clause boundaries, even given poorly punctuated text.
|
||||
|
||||
+section("annotation-pos-tagging")
|
||||
+h(3, "annotation-pos-tagging").
|
||||
Part-of-speech Tagging
|
||||
|
||||
p.
|
||||
The part-of-speech tagger uses the OntoNotes 5 version of the Penn
|
||||
Treebank tag set. We also map the tags to the simpler Google Universal
|
||||
POS Tag set. Details #[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tagger.pyx") here].
|
||||
|
||||
+section("annotation-lemmatization")
|
||||
+h(3, "annotation-lemmatization").
|
||||
Lemmatization
|
||||
|
||||
p A "lemma" is the uninflected form of a word. In English, this means:
|
||||
|
||||
+list
|
||||
+item #[strong Adjectives:] The form like "happy", not "happier" or "happiest"
|
||||
+item #[strong Adverbs:] The form like "badly", not "worse" or "worst"
|
||||
+item #[strong Nouns:] The form like "dog", not "dogs"; like "child", not "children"
|
||||
+item #[strong Verbs:] The form like "write", not "writes", "writing", "wrote" or "written"
|
||||
|
||||
p.
|
||||
The lemmatization data is taken from WordNet. However, we also add a
|
||||
special case for pronouns: all pronouns are lemmatized to the special
|
||||
token #[code -PRON-].
|
||||
|
||||
+section("annotation-dependency")
|
||||
+h(3, "annotation-dependency").
|
||||
Syntactic Dependency Parsing
|
||||
|
||||
p.
|
||||
The parser is trained on data produced by the ClearNLP converter.
|
||||
Details of the annotation scheme can be found
|
||||
#[+a("http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf") here].
|
||||
|
||||
+section("annotation-ner")
|
||||
+h(3, "annotation-ner").
|
||||
Named Entity Recognition
|
||||
|
||||
+table(["Entity Type", "Description"])
|
||||
+row
|
||||
+cell PERSON
|
||||
+cell People, including fictional.
|
||||
|
||||
+row
|
||||
+cell NORP
|
||||
+cell Nationalities or religious or political groups.
|
||||
|
||||
+row
|
||||
+cell FAC
|
||||
+cell Facilities, such as buildings, airports, highways, bridges, etc.
|
||||
|
||||
+row
|
||||
+cell ORG
|
||||
+cell Companies, agencies, institutions, etc.
|
||||
|
||||
+row
|
||||
+cell GPE
|
||||
+cell Countries, cities, states.
|
||||
|
||||
+row
|
||||
+cell LOC
|
||||
+cell Non-GPE locations, mountain ranges, bodies of water.
|
||||
|
||||
+row
|
||||
+cell PRODUCT
|
||||
+cell Vehicles, weapons, foods, etc. (Not services)
|
||||
|
||||
+row
|
||||
+cell EVENT
|
||||
+cell Named hurricanes, battles, wars, sports events, etc.
|
||||
|
||||
+row
|
||||
+cell WORK_OF_ART
|
||||
+cell Titles of books, songs, etc.
|
||||
|
||||
+row
|
||||
+cell LAW
|
||||
+cell Named documents made into laws
|
||||
|
||||
+row
|
||||
+cell LANGUAGE
|
||||
+cell Any named language
|
||||
|
||||
p The following values are also annotated in a style similar to names:
|
||||
|
||||
+table(["Entity Type", "Description"])
|
||||
+row
|
||||
+cell DATE
|
||||
+cell Absolute or relative dates or periods
|
||||
|
||||
+row
|
||||
+cell TIME
|
||||
+cell Times smaller than a day
|
||||
|
||||
+row
|
||||
+cell PERCENT
|
||||
+cell Percentage (including “%”)
|
||||
|
||||
+row
|
||||
+cell MONEY
|
||||
+cell Monetary values, including unit
|
||||
|
||||
+row
|
||||
+cell QUANTITY
|
||||
+cell Measurements, as of weight or distance
|
||||
|
||||
+row
|
||||
+cell ORDINAL
|
||||
+cell "first", "second"
|
||||
|
||||
+row
|
||||
+cell CARDINAL
|
||||
+cell Numerals that do not fall under another type
|
|
@ -1,305 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > DOC
|
||||
//- ----------------------------------
|
||||
|
||||
+section("doc")
|
||||
+h(2, "doc", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/doc.pyx")
|
||||
| #[+tag class] Doc
|
||||
|
||||
p
|
||||
| A sequence of #[code Token] objects. Access sentences and named entities,
|
||||
| export annotations to numpy arrays, losslessly serialize to compressed
|
||||
| binary strings.
|
||||
|
||||
+aside.
|
||||
Internally, the #[code Doc] object holds an array of #[code TokenC] structs.
|
||||
The Python-level #[code Token] and #[code Span] objects are views of this
|
||||
array, i.e. they don't own the data themselves.
|
||||
|
||||
+code("python", "Overview").
|
||||
class Doc:
|
||||
def __init__(self, vocab, orths_and_spaces=None):
|
||||
return self
|
||||
|
||||
def __getitem__(self, int i):
|
||||
return Token()
|
||||
def __getitem__(self, slice i_j):
|
||||
return Span()
|
||||
def __iter__(self):
|
||||
yield Token()
|
||||
def __len__(self):
|
||||
return int
|
||||
|
||||
def __unicode__(self):
|
||||
return unicode
|
||||
def __bytes__(self):
|
||||
return utf8
|
||||
def __repr__(self):
|
||||
return unicode
|
||||
|
||||
@property
|
||||
def text(self):
|
||||
return unicode
|
||||
@property
|
||||
def text_with_ws(self):
|
||||
return unicode
|
||||
|
||||
@property
|
||||
def vector(self):
|
||||
return numpy.ndarray(dtype='float32')
|
||||
@property
|
||||
def vector_norm(self):
|
||||
return float
|
||||
@property
|
||||
def ents(self):
|
||||
yield Span()
|
||||
@property
|
||||
def noun_chunks(self):
|
||||
yield Span()
|
||||
@property
|
||||
def sents(self):
|
||||
yield Span()
|
||||
|
||||
def similarity(self, other):
|
||||
return float
|
||||
|
||||
def merge(self, start_char, end_char, tag, lemma, ent_type):
|
||||
return None
|
||||
|
||||
def to_array(self, attr_ids):
|
||||
return numpy.ndarray(shape=(len(self), len(attr_ids)), dtype='int64')
|
||||
|
||||
def count_by(self, attr_id, exclude=None, counts=None):
|
||||
return dict
|
||||
|
||||
def to_bytes(self):
|
||||
return bytes
|
||||
|
||||
def from_array(self, attrs, array):
|
||||
return None
|
||||
|
||||
def from_bytes(self, data):
|
||||
return self
|
||||
|
||||
@staticmethod
|
||||
def read_bytes(file_):
|
||||
yield bytes
|
||||
|
||||
+section("doc-init")
|
||||
+h(3, "doc-init")
|
||||
| #[+tag method] Doc.__init__
|
||||
|
||||
.has-aside
|
||||
+code("python", "Definition").
|
||||
def __init__(self, vocab, orths_and_spaces=None):
|
||||
return Doc
|
||||
|
||||
+aside("Implementation").
|
||||
This method of constructing a #[code Doc] object is usually only used
|
||||
for deserialization. Standard usage is to construct the document via
|
||||
a call to the language object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell vocab
|
||||
+cell.
|
||||
A Vocabulary object, which must match any models you want to
|
||||
use (e.g. tokenizer, parser, entity recognizer).
|
||||
|
||||
+row
|
||||
+cell orths_and_spaces
|
||||
+cell.
|
||||
A list of tokens in the document as a sequence of
|
||||
#[code (orth_id, has_space)] tuples, where #[code orth_id]
|
||||
is an integer and #[code has_space] is a boolean, indicating
|
||||
whether the token has a trailing space.
|
||||
|
||||
+section("doc-sequenceapi")
|
||||
+h(3, "doc-sequenceapi")
|
||||
| #[+tag Section] Sequence API
|
||||
|
||||
+table(["Example", "Description"])
|
||||
+row
|
||||
+cell #[code doc[i]]
|
||||
+cell.
|
||||
Get the Token object at position i, where i is an integer.
|
||||
Negative indexing is supported, and follows the usual Python
|
||||
semantics, i.e. doc[-2] is doc[len(doc) - 2].
|
||||
|
||||
+row
|
||||
+cell #[code doc[start : end]]
|
||||
+cell.
|
||||
Get a #[code Span] object, starting at position #[code start]
|
||||
and ending at position #[code end], where #[code start] and
|
||||
#[code end] are token indices. For instance,
|
||||
#[code doc[2:5]] produces a span consisting of
|
||||
tokens 2, 3 and 4. Stepped slices (e.g. #[code doc[start : end : step]])
|
||||
are not supported, as #[code Span] objects must be contiguous
|
||||
(cannot have gaps). You can use negative indices and open-ended
|
||||
ranges, which have their normal Python semantics.
|
||||
|
||||
+row
|
||||
+cell #[code for token in doc]
|
||||
+cell.
|
||||
Iterate over Token objects, from which the annotations can
|
||||
be easily accessed. This is the main way of accessing Token
|
||||
objects, which are the main way annotations are accessed from
|
||||
Python. If faster-than-Python speeds are required, you can
|
||||
instead access the annotations as a numpy array, or access the
|
||||
underlying C data directly from Cython.
|
||||
|
||||
+row
|
||||
+cell #[code len(doc)]
|
||||
+cell.
|
||||
The number of tokens in the document.
|
||||
|
||||
+section("doc-spans")
|
||||
+h(3, "doc-spans-sents")
|
||||
| #[+tag property] Doc.sents
|
||||
|
||||
p.
|
||||
Yields sentence #[code Span] objects. Sentence spans have no label.
|
||||
To improve accuracy on informal texts, spaCy calculates sentence
|
||||
boundaries from the syntactic dependency parse. If the parser is disabled,
|
||||
the #[code sents] iterator will be unavailable.
|
||||
|
||||
+code("python", "Example").
|
||||
from spacy.en import English
|
||||
nlp = English()
|
||||
doc = nlp("This is a sentence. Here's another...")
|
||||
assert [s.root.orth_ for s in doc.sents] == ["is", "'s"]
|
||||
|
||||
+h(3, "doc-spans-ents")
|
||||
| #[+tag property] Doc.ents
|
||||
|
||||
p.
|
||||
Yields named-entity #[code Span] objects, if the entity recognizer
|
||||
has been applied to the document. Iterate over the span to get
|
||||
individual Token objects, or access the label:
|
||||
|
||||
+code("python", "Example").
|
||||
from spacy.en import English
|
||||
nlp = English()
|
||||
tokens = nlp(u'Mr. Best flew to New York on Saturday morning.')
|
||||
ents = list(tokens.ents)
|
||||
assert ents[0].label == 346
|
||||
assert ents[0].label_ == 'PERSON'
|
||||
assert ents[0].orth_ == 'Best'
|
||||
assert ents[0].text == 'Mr. Best'
|
||||
|
||||
+h(3, "doc-spans-nounchunks")
|
||||
| #[+tag property] Doc.noun_chunks
|
||||
|
||||
p.
|
||||
Yields base noun-phrase #[code Span] objects, if the document
|
||||
has been syntactically parsed. A base noun phrase, or
|
||||
'NP chunk', is a noun phrase that does not permit other NPs to
|
||||
be nested within it – so no NP-level coordination, no prepositional
|
||||
phrases, and no relative clauses. For example:
|
||||
|
||||
+code("python", "Example").
|
||||
from spacy.en import English
|
||||
nlp = English()
|
||||
doc = nlp(u'The sentence in this example has three noun chunks.')
|
||||
for chunk in doc.noun_chunks:
|
||||
print(chunk.label_, chunk.orth_, '<--', chunk.root.head.orth_)
|
||||
|
||||
+section("doc-exportimport-toarray")
|
||||
+h(3, "doc-exportimport-toarray")
|
||||
| #[+tag method] Doc.to_array
|
||||
|
||||
p.
|
||||
Given a list of M attribute IDs, export the tokens to a numpy
|
||||
#[code ndarray] of shape #[code N*M], where #[code N] is the length
|
||||
of the document. The values will be 32-bit integers.
|
||||
|
||||
+code("python", "Example").
|
||||
from spacy import attrs
|
||||
doc = nlp(text)
|
||||
# All strings mapped to integers, for easy export to numpy
|
||||
np_array = doc.to_array([attrs.LOWER, attrs.POS, attrs.ENT_TYPE, attrs.IS_ALPHA])
|
||||
|
||||
+code("python", "Definition").
|
||||
def to_array(self, attr_ids):
|
||||
return numpy.ndarray(shape=(len(self), len(attr_ids)), dtype='int64')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell attr_ids
|
||||
+cell list of ints
|
||||
+cell.
|
||||
A list of attribute ID ints. Attribute IDs can be imported
|
||||
from #[code spacy.attrs] or #[code spacy.symbols].
|
||||
|
||||
+section("doc-exportimport-countby")
|
||||
+h(4, "doc-exportimport-countby")
|
||||
| #[+tag method] Doc.count_by
|
||||
|
||||
p.
|
||||
Produce a dict of #[code {attribute (int): count (ints)}] frequencies,
|
||||
keyed by the values of the given attribute ID.
|
||||
|
||||
+code("python", "Example").
|
||||
def count_by(self, attr_id):
|
||||
return dict
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell attr_id
|
||||
+cell int
|
||||
+cell.
|
||||
The attribute ID to key the counts.
|
||||
|
||||
+section("doc-exportimport-fromarray")
|
||||
+h(4, "doc-exportimport-fromarray")
|
||||
| #[+tag method] Doc.from_array
|
||||
|
||||
p Write to a #[code Doc] object, from an M*N array of attributes.
|
||||
|
||||
+code("python", "Definition").
|
||||
def from_array(self, attrs, array):
|
||||
return None
|
||||
|
||||
+section("doc-exportimport-frombytes")
|
||||
+h(4, "doc-exportimport-frombytes") Doc.from_bytes
|
||||
|
||||
p Deserialize, loading from bytes.
|
||||
|
||||
+code("python", "Definition").
|
||||
def from_bytes(self, byte_string):
|
||||
return Doc
|
||||
|
||||
+section("doc-exportimport-tobytes")
|
||||
+h(4, "doc-exportimport-tobytes")
|
||||
| #[+tag method] Doc.to_bytes
|
||||
|
||||
p Serialize, producing a byte string.
|
||||
|
||||
+code("python", "Definition").
|
||||
def to_bytes(self):
|
||||
return bytes
|
||||
|
||||
+section("doc-exportimport-readbytes")
|
||||
+h(4, "doc-exportimport-readbytes")
|
||||
| #[+tag method] Doc.read_bytes
|
||||
|
||||
p.
|
||||
A static method, used to read serialized #[code Doc] objects from
|
||||
a file. For example:
|
||||
|
||||
+code("python", "Example").
|
||||
from spacy.tokens.doc import Doc
|
||||
loc = 'test_serialize.bin'
|
||||
with open(loc, 'wb') as file_:
|
||||
file_.write(nlp(u'This is a document.').to_bytes())
|
||||
file_.write(nlp(u'This is another.').to_bytes())
|
||||
docs = []
|
||||
with open(loc, 'rb') as file_:
|
||||
for byte_string in Doc.read_bytes(file_):
|
||||
docs.append(Doc(nlp.vocab).from_bytes(byte_string))
|
||||
assert len(docs) == 2
|
||||
|
||||
+code("python", "Definition").
|
||||
@staticmethod
|
||||
def read_bytes(file_):
|
||||
yield bytes
|
|
@ -1,258 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > LANGUAGE
|
||||
//- ----------------------------------
|
||||
|
||||
+section("language")
|
||||
+h(2, "language", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/language.py")
|
||||
| #[+tag class] Language
|
||||
|
||||
p.
|
||||
A pipeline that transforms text strings into annotated spaCy Doc objects. Usually you'll load the Language pipeline once and pass the instance around your program.
|
||||
|
||||
+code("python", "Overview").
|
||||
class Language:
|
||||
Defaults = BaseDefaults
|
||||
|
||||
def __init__(self, path=True, **overrides):
|
||||
self.vocab = Vocab()
|
||||
self.tokenizer = Tokenizer()
|
||||
self.tagger = Tagger()
|
||||
self.parser = DependencyParser()
|
||||
self.entity = EntityRecognizer()
|
||||
self.make_doc = lambda text: Doc()
|
||||
self.pipeline = [self.tagger, self.parser, self.entity]
|
||||
|
||||
def __call__(self, text, **toggle):
|
||||
doc = self.make_doc(text)
|
||||
for proc in self.pipeline:
|
||||
if toggle.get(process.name, True):
|
||||
process(doc)
|
||||
return doc
|
||||
|
||||
def pipe(self, texts_iterator, batch_size=1000, n_threads=2, **toggle):
|
||||
docs = (self.make_doc(text) for text in texts_iterator)
|
||||
for process in self.pipeline:
|
||||
if toggle.get(process.name, True):
|
||||
docs = process.pipe(docs, batch_size=batch_size, n_threads=n_threads)
|
||||
for doc in self.docs:
|
||||
yield doc
|
||||
|
||||
def end_training(self, path=None):
|
||||
return None
|
||||
|
||||
class English(Language):
|
||||
class Defaults(BaseDefaults):
|
||||
pass
|
||||
|
||||
class German(Language):
|
||||
class Defaults(BaseDefaults):
|
||||
pass
|
||||
|
||||
+section("english-init")
|
||||
+h(3, "english-init")
|
||||
| #[+tag method] Language.__init__
|
||||
|
||||
p
|
||||
| Load the pipeline. You can disable components by passing None as a value,
|
||||
| e.g. pass parser=None, vectors=None to save memory if you're not using
|
||||
| those components. You can also pass an object as the value.
|
||||
| Pass a function create_pipeline to use a custom pipeline --- see
|
||||
| the custom pipeline tutorial.
|
||||
|
||||
+aside("Efficiency").
|
||||
Loading takes 10-20 seconds, and the instance consumes 2 to 3
|
||||
gigabytes of memory. Intended use is for one instance to be
|
||||
created for each language per process, but you can create more
|
||||
if you're doing something unusual. You may wish to make the
|
||||
instance a global variable or "singleton".
|
||||
|
||||
+table(["Example", "Description"])
|
||||
+row
|
||||
+cell #[code nlp = English()]
|
||||
+cell Load everything, from default path.
|
||||
|
||||
+row
|
||||
+cell #[code nlp = English(path='my_data')]
|
||||
+cell Load everything, from specified path
|
||||
|
||||
+row
|
||||
+cell #[code nlp = English(path=path_obj)]
|
||||
+cell Load everything, from an object that follows the #[code pathlib.Path] protocol.
|
||||
|
||||
+row
|
||||
+cell #[code nlp = English(parser=False, vectors=False)]
|
||||
+cell Load everything except the parser and the word vectors.
|
||||
|
||||
+row
|
||||
+cell #[code nlp = English(parser=my_parser)]
|
||||
+cell Load everything, and use a custom parser.
|
||||
|
||||
+row
|
||||
+cell #[code nlp = English(create_pipeline=my_pipeline)]
|
||||
+cell Load everything, and use a custom pipeline.
|
||||
|
||||
+code("python", "Definition").
|
||||
def __init__(self, path=True, **overrides):
|
||||
D = self.Defaults
|
||||
self.vocab = Vocab(path=path, parent=self, **D.vocab) \
|
||||
if 'vocab' not in overrides \
|
||||
else overrides['vocab']
|
||||
self.tokenizer = Tokenizer(self.vocab, path=path, **D.tokenizer) \
|
||||
if 'tokenizer' not in overrides \
|
||||
else overrides['tokenizer']
|
||||
self.tagger = Tagger(self.vocab, path=path, **D.tagger) \
|
||||
if 'tagger' not in overrides \
|
||||
else overrides['tagger']
|
||||
self.parser = DependencyParser(self.vocab, path=path, **D.parser) \
|
||||
if 'parser' not in overrides \
|
||||
else overrides['parser']
|
||||
self.entity = EntityRecognizer(self.vocab, path=path, **D.entity) \
|
||||
if 'entity' not in overrides \
|
||||
else overrides['entity']
|
||||
self.matcher = Matcher(self.vocab, path=path, **D.matcher) \
|
||||
if 'matcher' not in overrides \
|
||||
else overrides['matcher']
|
||||
|
||||
if 'make_doc' in overrides:
|
||||
self.make_doc = overrides['make_doc']
|
||||
elif 'create_make_doc' in overrides:
|
||||
self.make_doc = overrides['create_make_doc'](self)
|
||||
else:
|
||||
self.make_doc = lambda text: self.tokenizer(text)
|
||||
if 'pipeline' in overrides:
|
||||
self.pipeline = overrides['pipeline']
|
||||
elif 'create_pipeline' in overrides:
|
||||
self.pipeline = overrides['create_pipeline'](self)
|
||||
else:
|
||||
self.pipeline = [self.tagger, self.parser, self.matcher, self.entity]
|
||||
|
||||
+section("language-call")
|
||||
+h(3, "language-call")
|
||||
| #[+tag method] Language.__call__
|
||||
|
||||
p
|
||||
| The main entry point to spaCy. Takes raw unicode text, and returns
|
||||
| a #[code Doc] object, which can be iterated to access #[code Token]
|
||||
| and #[code Span] objects.
|
||||
|
||||
+aside("Efficiency").
|
||||
spaCy's algorithms are all linear-time, so you can supply
|
||||
documents of arbitrary length, e.g. whole novels.
|
||||
|
||||
+table(["Example", "Description"], "code")
|
||||
+row
|
||||
+cell #[ doc = nlp(u'Some text.')]
|
||||
+cell Apply the full pipeline.
|
||||
+row
|
||||
+cell #[ doc = nlp(u'Some text.', parse=False)]
|
||||
+cell Applies tagger and entity, not parser
|
||||
+row
|
||||
+cell #[ doc = nlp(u'Some text.', entity=False)]
|
||||
+cell Applies tagger and parser, not entity.
|
||||
+row
|
||||
+cell #[ doc = nlp(u'Some text.', tag=False)]
|
||||
+cell Does not apply tagger, entity or parser
|
||||
+row
|
||||
+cell #[ doc = nlp(u'')]
|
||||
+cell Zero-length tokens, not an error
|
||||
+row
|
||||
+cell #[ doc = nlp(b'Some text')]
|
||||
+cell Error: need unicode
|
||||
+row
|
||||
+cell #[ doc = nlp(b'Some text'.decode('utf8'))]
|
||||
+cell Decode bytes into unicode first.
|
||||
|
||||
+code("python", "Definition").
|
||||
def __call__(self, text, tag=True, parse=True, entity=True, matcher=True):
|
||||
return self
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell text
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell.
|
||||
The text to be processed. spaCy expects raw unicode text
|
||||
– you don"t necessarily need to, say, split it into paragraphs.
|
||||
However, depending on your documents, you might be better
|
||||
off applying custom pre-processing. Non-text formatting,
|
||||
e.g. from HTML mark-up, should be removed before sending
|
||||
the document to spaCy. If your documents have a consistent
|
||||
format, you may be able to improve accuracy by pre-processing.
|
||||
For instance, if the first word of your documents are always
|
||||
in upper-case, it may be helpful to normalize them before
|
||||
supplying them to spaCy.
|
||||
|
||||
+row
|
||||
+cell tag
|
||||
+cell #[+a(link_bool) bool]
|
||||
+cell.
|
||||
Whether to apply the part-of-speech tagger. Required for
|
||||
parsing and entity recognition.
|
||||
|
||||
+row
|
||||
+cell parse
|
||||
+cell #[+a(link_bool) bool]
|
||||
+cell.
|
||||
Whether to apply the syntactic dependency parser.
|
||||
|
||||
+row
|
||||
+cell entity
|
||||
+cell #[+a(link_bool) bool]
|
||||
+cell.
|
||||
Whether to apply the named entity recognizer.
|
||||
|
||||
+section("english-pipe")
|
||||
+h(3, "english-pipe")
|
||||
| #[+tag method] English.pipe
|
||||
|
||||
p
|
||||
| Parse a sequence of texts into a sequence of #[code Doc] objects.
|
||||
| Accepts a generator as input, and produces a generator as output.
|
||||
| Internally, it accumulates a buffer of #[code batch_size]
|
||||
| texts, works on them with #[code n_threads] workers in parallel,
|
||||
| and then yields the #[code Doc] objects one by one.
|
||||
|
||||
+aside("Efficiency").
|
||||
spaCy releases the global interpreter lock around the parser and
|
||||
named entity recognizer, allowing shared-memory parallelism via
|
||||
OpenMP. However, OpenMP is not supported on OSX — so multiple
|
||||
threads will only be used on Linux and Windows.
|
||||
|
||||
+table(["Example", "Description"], "usage")
|
||||
+row
|
||||
+cell #[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/examples/parallel_parse.py") parallel_parse.py]
|
||||
+cell Parse comments from Reddit in parallel.
|
||||
|
||||
+code("python", "Definition").
|
||||
def pipe(self, texts, n_threads=2, batch_size=1000):
|
||||
yield Doc()
|
||||
|
||||
+table(["Arg", "Type", "Description"])
|
||||
+row
|
||||
+cell texts
|
||||
+cell
|
||||
+cell.
|
||||
A sequence of unicode objects. Usually you will want this
|
||||
to be a generator, so that you don"t need to have all of
|
||||
your texts in memory.
|
||||
|
||||
+row
|
||||
+cell n_threads
|
||||
+cell #[+a(link_int) int]
|
||||
+cell.
|
||||
The number of worker threads to use. If -1, OpenMP will
|
||||
decide how many to use at run time. Default is 2.
|
||||
|
||||
+row
|
||||
+cell batch_size
|
||||
+cell #[+a(link_int) int]
|
||||
+cell.
|
||||
The number of texts to buffer. Let"s say you have a
|
||||
#[code batch_size] of 1,000. The input, #[code texts], is
|
||||
a generator that yields the texts one-by-one. We want to
|
||||
operate on them in parallel. So, we accumulate a work queue.
|
||||
Instead of taking one document from #[code texts] and
|
||||
operating on it, we buffer #[code batch_size] documents,
|
||||
work on them in parallel, and then yield them one-by-one.
|
||||
Higher #[code batch_size] therefore often results in better
|
||||
parallelism, up to a point.
|
|
@ -1,194 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > LEXEME
|
||||
//- ----------------------------------
|
||||
|
||||
+section("lexeme")
|
||||
+h(2, "lexeme", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/lexeme.pyx")
|
||||
| #[+tag class] Lexeme
|
||||
|
||||
p.
|
||||
The Lexeme object represents a lexical type, stored in the vocabulary –
|
||||
as opposed to a token, occurring in a document.
|
||||
|
||||
p.
|
||||
Each Token object receives a reference to a lexeme object (specifically,
|
||||
it receives a pointer to a #[code LexemeC] struct). This allows features
|
||||
to be computed and saved once per type, rather than once per token. As
|
||||
job sizes grow, this amounts to substantial efficiency improvements, as
|
||||
the vocabulary size (number of types) will be much smaller than the total
|
||||
number of words processed (number of tokens).
|
||||
|
||||
p.
|
||||
All Lexeme attributes are therefore context independent, as a single lexeme
|
||||
is reused for all usages of that word. Lexemes are keyed by the #[code orth]
|
||||
attribute.
|
||||
|
||||
p.
|
||||
Most Lexeme attributes can be set, with the exception of the primary key,
|
||||
#[code orth]. Assigning to an attribute of the #[code Lexeme] object writes
|
||||
to the underlying struct, so all tokens that are backed by that
|
||||
#[code Lexeme] will inherit the new value.
|
||||
|
||||
+code("python", "Overview").
|
||||
class Lexeme:
|
||||
def __init__(self, vocab, key):
|
||||
return self
|
||||
|
||||
int rank
|
||||
|
||||
int orth, lower, shape, prefix, suffix
|
||||
|
||||
unicode orth_, lower_, shape_, prefix_, suffix_
|
||||
|
||||
bool is_alpha, is_ascii, is_lower, is_title, is_punct, is_space, like_url, like_num, like_email, is_oov, is_stop
|
||||
|
||||
float prob
|
||||
int cluster
|
||||
numpy.ndarray[float64] vector
|
||||
bool has_vector
|
||||
|
||||
def set_flag(self, flag_id, value):
|
||||
return None
|
||||
|
||||
def check_flag(self, flag_id):
|
||||
return bool
|
||||
|
||||
def similarity(self, other):
|
||||
return float
|
||||
|
||||
+table(["Example", "Description"])
|
||||
+row
|
||||
+cell #[code.lang-python lexeme = nlp.vocab[string]]
|
||||
+cell Lookup by string
|
||||
+row
|
||||
+cell #[code.lang-python lexeme = vocab[i]]
|
||||
+cell Lookup by integer
|
||||
|
||||
+section("lexeme-stringfeatures")
|
||||
+h(3, "lexeme-stringfeatures").
|
||||
String Features
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell orth / orth_
|
||||
+cell.
|
||||
The form of the word with no string normalization or processing,
|
||||
as it appears in the string, without trailing whitespace.
|
||||
|
||||
+row
|
||||
+cell lower / lower_
|
||||
+cell.
|
||||
The form of the word, but forced to lower-case, i.e.
|
||||
#[code lower = word.orth_.lower()]
|
||||
|
||||
+row
|
||||
+cell shape / shape_
|
||||
+cell.
|
||||
A transform of the word's string, to show orthographic features.
|
||||
The characters a-z are mapped to x, A-Z is mapped to X, 0-9
|
||||
is mapped to d. After these mappings, sequences of 4 or more
|
||||
of the same character are truncated to length 4. Examples:
|
||||
C3Po --> XdXx, favorite --> xxxx, :) --> :)
|
||||
|
||||
+row
|
||||
+cell prefix / prefix_
|
||||
+cell.
|
||||
A length-N substring from the start of the word. Length may
|
||||
vary by language; currently for English n=1, i.e.
|
||||
#[code prefix = word.orth_[:1]]
|
||||
|
||||
+row
|
||||
+cell suffix / suffix_
|
||||
+cell.
|
||||
A length-N substring from the end of the word. Length may vary
|
||||
by language; currently for English n=3, i.e.
|
||||
#[code suffix = word.orth_[-3:]]
|
||||
|
||||
+section("lexeme-booleanflags")
|
||||
+h(3, "lexeme-booleanflags")
|
||||
| Boolean Flags
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell is_alpha
|
||||
+cell Equivalent to #[code word.orth_.isalpha()]
|
||||
|
||||
+row
|
||||
+cell is_ascii
|
||||
+cell Equivalent to any(ord(c) >= 128 for c in word.orth_)]
|
||||
|
||||
+row
|
||||
+cell is_digit
|
||||
+cell Equivalent to #[code word.orth_.isdigit()]
|
||||
|
||||
+row
|
||||
+cell is_lower
|
||||
+cell Equivalent to #[code word.orth_.islower()]
|
||||
|
||||
+row
|
||||
+cell is_title
|
||||
+cell Equivalent to #[code word.orth_.istitle()]
|
||||
|
||||
+row
|
||||
+cell is_punct
|
||||
+cell Equivalent to #[code word.orth_.ispunct()]
|
||||
|
||||
+row
|
||||
+cell is_space
|
||||
+cell Equivalent to #[code word.orth_.isspace()]
|
||||
|
||||
+row
|
||||
+cell like_url
|
||||
+cell Does the word resemble a URL?
|
||||
|
||||
+row
|
||||
+cell like_num
|
||||
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
|
||||
|
||||
+row
|
||||
+cell like_email
|
||||
+cell Does the word resemble an email?
|
||||
|
||||
+row
|
||||
+cell is_oov
|
||||
+cell Is the word out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell is_stop
|
||||
+cell.
|
||||
Is the word part of a "stop list"? Stop lists are used to
|
||||
improve the quality of topic models, by filtering out common,
|
||||
domain-general words.
|
||||
|
||||
+section("lexeme-distributional")
|
||||
+h(3, "lexeme-distributional")
|
||||
| Distributional Features
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell prob
|
||||
+cell.
|
||||
The unigram log-probability of the word, estimated from
|
||||
counts from a large corpus, smoothed using Simple Good Turing
|
||||
estimation.
|
||||
|
||||
+row
|
||||
+cell cluster
|
||||
+cell.
|
||||
The Brown cluster ID of the word. These are often useful features
|
||||
for linear models. If you’re using a non-linear model, particularly
|
||||
a neural net or random forest, consider using the real-valued
|
||||
word representation vector, in #[code Token.repvec], instead.
|
||||
|
||||
+row
|
||||
+cell vector
|
||||
+cell.
|
||||
A "word embedding" representation: a dense real-valued vector
|
||||
that supports similarity queries between words. By default,
|
||||
spaCy currently loads vectors produced by the Levy and
|
||||
Goldberg (2014) dependency-based word2vec model.
|
||||
|
||||
+row
|
||||
+cell has_vector
|
||||
+cell.
|
||||
A boolean value indicating whether a vector.
|
|
@ -1,81 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > MATCHER
|
||||
//- ----------------------------------
|
||||
|
||||
+section("matcher")
|
||||
+h(2, "matcher", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/matcher.pyx")
|
||||
| #[+tag class] Matcher
|
||||
|
||||
p A full example can be found #[a(href="https://github.com/" + SOCIAL.github + "/spaCy/blob/master/examples/matcher_example.py") here].
|
||||
|
||||
+table(["Usage", "Description"])
|
||||
+row
|
||||
+cell #[code.lang-python nlp(doc)]
|
||||
+cell As part of annotation pipeline.
|
||||
|
||||
+row
|
||||
+cell #[code.lang-python nlp.matcher(doc)]
|
||||
+cell Explicit invocation.
|
||||
|
||||
+row
|
||||
+cell #[code.lang-python nlp.matcher.add(u'FooCorp', u'ORG', {}, [[{u'ORTH': u'Foo'}]])]
|
||||
+cell Add a pattern to match.
|
||||
|
||||
+section("matcher-init")
|
||||
+h(3, "matcher-init") __init__(self, vocab, patterns)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell vocab
|
||||
+cell #[code.lang-python spacy.vocab.Vocab]
|
||||
+cell Reference to the shared vocabulary object.
|
||||
|
||||
+row
|
||||
+cell patterns
|
||||
+cell #[code {entity_key: (etype, attrs, specs)}]
|
||||
+cell.
|
||||
Initial patterns to match. See #[code Matcher.add]
|
||||
|
||||
+section("matcher-add")
|
||||
+h(3, "matcher-add") add(self, entity_key, etype, attrs, specs)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell entity_key
|
||||
+cell unicode or int
|
||||
+cell Your arbitrary ID string (or its integer encoding)
|
||||
+row
|
||||
+cell etype
|
||||
+cell unicode or int
|
||||
+cell A pre-registered entity type, e.g. u'PERSON', u'ORG', etc.
|
||||
+row
|
||||
+cell attrs
|
||||
+cell #[code dict]
|
||||
+cell Placeholder for future support of entity attributes.
|
||||
+row
|
||||
+cell specs
|
||||
+cell #[code [[{int: unicode}]]]
|
||||
+cell A list of surface forms, where each surface form is defined as a list of token definitions, and each token definition is a dictionary mapping attribute IDs to attribute values.
|
||||
|
||||
+section("matcher-saveload")
|
||||
+h(3, "matcher-saveload")
|
||||
| Save and Load
|
||||
|
||||
+section("matcher-saveload-dump")
|
||||
+h(4, "matcher-saveload-dump") dump(loc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell Path to save the gazetteer.json file.
|
||||
|
||||
+section("matcher-saveload-load")
|
||||
+h(4, "matcher-saveload-load") load(loc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell.
|
||||
Path to load the gazetteer.json file from.
|
|
@ -1,305 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > SPAN
|
||||
//- ----------------------------------
|
||||
|
||||
+section("span")
|
||||
+h(2, "span", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/span.pyx")
|
||||
| #[+tag class] Span
|
||||
|
||||
p.
|
||||
A slice of a #[code Doc] object, consisting of zero or
|
||||
more tokens. Spans are usually used to represent sentences, named entities,
|
||||
phrases.
|
||||
|
||||
+aside("Implementation").
|
||||
#[code Span] objects are views — that is, they do not copy the
|
||||
underlying C data. This makes them cheap to construct, as internally are
|
||||
simply a reference to the #[code Doc] object, a start position, an end
|
||||
position, and a label ID.
|
||||
|
||||
+code("python", "Overview").
|
||||
class Span:
|
||||
doc = Doc
|
||||
start = int
|
||||
end = int
|
||||
label = int
|
||||
|
||||
def __init__(self, doc, start, end, label=0, vector=None, vector_norm=None):
|
||||
return self
|
||||
|
||||
def __len__(self):
|
||||
return int
|
||||
def __getitem__(self, i):
|
||||
return Token()
|
||||
def __iter__(self):
|
||||
yield Token()
|
||||
|
||||
def similarity(self, other):
|
||||
return float
|
||||
|
||||
def merge(self, tag, lemma, ent_type):
|
||||
return None
|
||||
|
||||
@property
|
||||
def label_(self):
|
||||
return unicode
|
||||
|
||||
@property
|
||||
def vector(self):
|
||||
return numpy.ndarray(dtype="float64")
|
||||
@property
|
||||
def vector_norm(self):
|
||||
return float
|
||||
|
||||
@property
|
||||
def text(self):
|
||||
return unicode
|
||||
@property
|
||||
def text_with_ws(self):
|
||||
return unicode
|
||||
@property
|
||||
def orth_(self):
|
||||
return unicode
|
||||
@property
|
||||
def lemma_(self):
|
||||
return unicode
|
||||
|
||||
@property
|
||||
def root(self):
|
||||
return Token()
|
||||
@property
|
||||
def lefts(self):
|
||||
yield Token()
|
||||
@property
|
||||
def rights(self):
|
||||
yield Token()
|
||||
@property
|
||||
def subtree(self):
|
||||
yield Token()
|
||||
|
||||
+section("span-create")
|
||||
+h(3, "span-init")
|
||||
| #[+tag Section] Create a Span
|
||||
|
||||
p.
|
||||
Span instances are usually created via the #[code Doc] object.
|
||||
|
||||
+table(["Example", "Description"])
|
||||
+row
|
||||
+cell #[code.lang-python span = doc[4 : 7]]
|
||||
+cell Produce a span with tokens 4, 5 and 6.
|
||||
+row
|
||||
+cell #[code.lang-python span = Span(doc, start, end, label=spacy.symbols.PERSON)]
|
||||
+cell Calling #[code Span.__init__] directly allows you to set a label.
|
||||
+row
|
||||
+cell #[code.lang-python for entity in doc.ents]
|
||||
+cell See #[a(href="/docs#doc-spans-ents") Doc.ents]
|
||||
+row
|
||||
+cell #[code.lang-python for sentence in doc.sents]
|
||||
+cell See #[a(href="/docs#doc-spans-sents") Doc.sents]
|
||||
+row
|
||||
+cell #[code.lang-python for noun_phrase in doc.noun_chunks]
|
||||
+cell See #[a(href="/docs#doc-spans-nounchunks") Doc.noun_chunks]
|
||||
|
||||
+code("python", "Definition").
|
||||
def __init__(self, doc, start, end, label=0, vector=None, vector_norm=None):
|
||||
return Span()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell doc
|
||||
+cell Doc
|
||||
+cell The parent doc object, to slice from.
|
||||
+row
|
||||
+cell start
|
||||
+cell int
|
||||
+cell The index of the first token in the slice.
|
||||
+row
|
||||
+cell end
|
||||
+cell int
|
||||
+cell The index of the first token #[em outside] the slice (since ranges are exclusive in Python).
|
||||
+row
|
||||
+cell label
|
||||
+cell int or unicode
|
||||
+cell A label for the span. Either a string, or an integer ID, that should refer to a string mapped by the #[code Doc] object"s #[code StringStore].
|
||||
+row
|
||||
+cell vector
|
||||
+cell
|
||||
+cell
|
||||
+row
|
||||
+cell vector_norm
|
||||
+cell
|
||||
+cell
|
||||
|
||||
+section("span-merge")
|
||||
+h(3, "span-merge")
|
||||
| #[+tag method] Span.merge
|
||||
|
||||
p.
|
||||
Merge the span into a single token, modifying the underlying
|
||||
#[code.lang-python Doc] object in place.
|
||||
|
||||
+aside("Caveat").
|
||||
Magic is done to allow you to call #[code.lang-python merge()]
|
||||
without invalidating other #[code.lang-python Span] objects.
|
||||
However, it"s difficult to ensure all indices are recomputed
|
||||
correctly. Please report any errors encountered on the issue
|
||||
tracker.
|
||||
|
||||
+code("python", "Example").
|
||||
for ent in doc.ents:
|
||||
ent.merge(ent.root.tag_, ent.text, ent.label_)
|
||||
for np in doc.noun_chunks:
|
||||
while len(np) > 1 and np[0].dep_ not in ('advmod', 'amod', 'compound'):
|
||||
np = np[1:]
|
||||
np.merge(np.root.tag_, np.text, np.root.ent_type_)
|
||||
|
||||
+code("python", "Definition").
|
||||
def merge(self, tag, lemma, ent_type):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell tag
|
||||
+cell unicode
|
||||
+cell The fine-grained part-of-speech tag to assign to the new token.
|
||||
+row
|
||||
+cell lemma
|
||||
+cell unicode
|
||||
+cell The lemma string for the new token.
|
||||
+row
|
||||
+cell ent_type
|
||||
+cell unicode
|
||||
+cell The named entity type to assign to the new token.
|
||||
|
||||
+section("span-similarity")
|
||||
+h(3, "span-similarity")
|
||||
| #[+tag method] Span.similarity
|
||||
|
||||
p Estimate the semantic similarity between the span and another #[code Span], #[code Doc], #[code Token] or #[code Lexeme].
|
||||
|
||||
+aside("Algorithm").
|
||||
Similarity is estimated
|
||||
using the cosine metric, between #[code Span.vector] and #[code other.vector].
|
||||
By default, #[code Span.vector] is computed by averaging the vectors
|
||||
of its tokens.
|
||||
|
||||
+code("python", "Example").
|
||||
doc = nlp("Apples and oranges are similar. Boots and hippos aren't.")
|
||||
apples_sent, boots_sent = doc.sents
|
||||
fruit = doc.vocab[u'fruit']
|
||||
assert apples_sent.similarity(fruit) > boot_sent.similarity(fruit)
|
||||
|
||||
+code("python", "Definition").
|
||||
def similarity(self, other):
|
||||
return float
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell other
|
||||
+cell Token, Span, Doc or Lexeme
|
||||
+cell The other object to judge similarity with.
|
||||
|
||||
+section("span-sequence")
|
||||
+h(3, "span-sequence")
|
||||
| #[+tag section] Span as a Sequence
|
||||
|
||||
p.
|
||||
#[code Span] objects act as a sequence of #[code Token] objects. In
|
||||
this way they mirror the API of the #[code Doc] object.
|
||||
|
||||
+table(["Name", "Description"], "params")
|
||||
+row
|
||||
+cell #[code.lang-python token = span[i]]
|
||||
+cell.
|
||||
Get the #[code Token] object at position #[em i], where
|
||||
#[code i] is an offset within the #[code Span], not the
|
||||
document. That is, if you have #[code.lang-python span = doc[4:6]],
|
||||
then #[code.lang-python span[0].i == 4]
|
||||
|
||||
+row
|
||||
+cell #[code.lang-python for token in span]
|
||||
+cell.
|
||||
Iterate over the #[code Token] objects in the span.
|
||||
|
||||
+row
|
||||
+cell __len__
|
||||
+cell Number of tokens in the span.
|
||||
|
||||
+row
|
||||
+cell text
|
||||
+cell.
|
||||
The text content of the span, obtained from
|
||||
#[code ''.join(token.text_with_ws for token in span)].
|
||||
|
||||
+row
|
||||
+cell start
|
||||
+cell.
|
||||
The start offset of the span, i.e. #[code span[0].i].
|
||||
|
||||
+row
|
||||
+cell end
|
||||
+cell.
|
||||
The end offset of the span, i.e. #[code span[-1].i + 1].
|
||||
|
||||
+section("span-navigating-parse")
|
||||
+h(3, "span-navigativing-parse")
|
||||
| #[+tag Section] Span and the Syntactic Parse
|
||||
|
||||
p.
|
||||
Span objects allow similar access to the syntactic parse as individual
|
||||
tokens.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell root
|
||||
+cell #[code.lang-python Token]
|
||||
+cell.
|
||||
The word with the shortest path to the root of the sentence is
|
||||
the root of the span.
|
||||
+row
|
||||
+cell lefts
|
||||
+cell #[code.lang-python yield Token]
|
||||
+cell Tokens that are to the left of the span, whose head is within it.
|
||||
+row
|
||||
+cell rights
|
||||
+cell #[code.lang-python yield Token]
|
||||
+cell Tokens that are to the right of the span, whose head is within it.
|
||||
|
||||
+row
|
||||
+cell subtree
|
||||
+cell #[code.lang-python yield Token]
|
||||
+cell.
|
||||
Tokens in the range #[code (start, end+1)], where #[code start]
|
||||
is the index of the leftmost word descended from a token in the
|
||||
span, and #[code end] is the index of the rightmost token descended
|
||||
from a token in the span.
|
||||
|
||||
+section("span-strings")
|
||||
+h(3, "span-strings")
|
||||
| #[+tag section] Span"s Strings API
|
||||
|
||||
p.
|
||||
You can access the textual content of the span, and different view of
|
||||
it, with the following properties.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell text_with_ws
|
||||
+cell unicode
|
||||
+cell.
|
||||
The form of the span as it appears in the string, including
|
||||
trailing whitespace. This is useful when you need to use linguistic
|
||||
features to add inline mark-up to the string.
|
||||
|
||||
+row
|
||||
+cell lemma / lemma_
|
||||
+cell int / unicode
|
||||
+cell.
|
||||
Whitespace-concatenated lemmas of each token in the span.
|
||||
|
||||
+row
|
||||
+cell label / label_
|
||||
+cell int / unicode
|
||||
+cell.
|
||||
The span label, used particularly for named entities.
|
|
@ -1,105 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > STRINGSTORE
|
||||
//- ----------------------------------
|
||||
|
||||
+section("stringstore")
|
||||
+h(2, "stringstore", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/strings.pyx")
|
||||
| #[+tag class] StringStore
|
||||
|
||||
p Intern strings, and map them to sequential integer IDs.
|
||||
|
||||
p.
|
||||
Only the integer IDs are held by spaCy's data
|
||||
classes (#[code Doc], #[code Token], #[code Span] and #[code Lexeme])
|
||||
– when you use a string-valued attribute like #[code token.orth_],
|
||||
you access a property that computes #[code token.strings[token.orth]].
|
||||
|
||||
+aside("Efficiency").
|
||||
The mapping table is very efficient , and a small-string optimization
|
||||
is used to maintain a small memory footprint.
|
||||
|
||||
|
||||
+table(["Usage", "Description"])
|
||||
+row
|
||||
+cell #[code string = string_store[int_id]]
|
||||
+cell.
|
||||
Retrieve a string from a given integer ID. If the integer ID
|
||||
is not found, raise #[code IndexError].
|
||||
|
||||
+row
|
||||
+cell #[code int_id = string_store[unicode_string]]
|
||||
+cell.
|
||||
Map a unicode string to an integer ID. If the string is
|
||||
previously unseen, it is interned, and a new ID is returned.
|
||||
|
||||
+row
|
||||
+cell #[code int_id = string_store[utf8_byte_string]]
|
||||
+cell.
|
||||
Byte strings are assumed to be in UTF-8 encoding. Strings
|
||||
encoded with other codecs may fail silently. Given a utf8
|
||||
string, the behaviour is the same as for unicode strings.
|
||||
Internally, strings are stored in UTF-8 format. So if you start
|
||||
with a UTF-8 byte string, it's less efficient to first decode
|
||||
it as unicode, as StringStore will then have to encode it as
|
||||
UTF-8 once again.
|
||||
|
||||
+row
|
||||
+cell #[code n_strings = len(string_store)]
|
||||
+cell.
|
||||
Number of strings in the string-store.
|
||||
|
||||
+row
|
||||
+cell #[code for string in string_store]
|
||||
+cell
|
||||
p.
|
||||
Iterate over strings in the string store, in order, such
|
||||
that the ith string in the sequence has the ID #[code i]:
|
||||
|
||||
+code.code-block-small.no-block.
|
||||
string_store = doc.vocab.strings
|
||||
for i, string in enumerate(string_store):
|
||||
assert i == string_store[string]
|
||||
|
||||
+section("stringstore-init")
|
||||
+h(3, "stringstore-init")
|
||||
| #[+tag method] StringStore.__init__
|
||||
|
||||
+code("python", "Definition").
|
||||
def __init__(self):
|
||||
return self
|
||||
|
||||
+section("stringstore-dump")
|
||||
+h(3, "stringstore-dump")
|
||||
| #[+tag method] StringStore.dump
|
||||
|
||||
p Save the string-to-int mapping to the given file.
|
||||
|
||||
+code("python", "Definition").
|
||||
def dump(self, file):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell str
|
||||
+cell.
|
||||
The file to write the data to.
|
||||
|
||||
+section("stringstore-load")
|
||||
+h(3, "stringstore-load")
|
||||
| #[+tag method] StringStore.load
|
||||
|
||||
p Load the strings from the given file.
|
||||
|
||||
+code("python", "Definition").
|
||||
def load(self, file):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell file
|
||||
+cell file
|
||||
+cell.
|
||||
File-like object to load the data from. The format is subject
|
||||
to change; so if you need to read/write compatible files, please
|
||||
find details in the strings.pyx source.
|
|
@ -1,321 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > TOKEN
|
||||
//- ----------------------------------
|
||||
|
||||
+section("token")
|
||||
+h(2, "token", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/tokens/token.pyx")
|
||||
| #[+tag class] Token
|
||||
|
||||
p.
|
||||
A Token represents a single word, punctuation or significant whitespace
|
||||
symbol. Integer IDs are provided for all string features. The (unicode)
|
||||
string is provided by an attribute of the same name followed by an underscore,
|
||||
e.g. #[code token.orth] is an integer ID, #[code token.orth_] is the unicode
|
||||
value. The only exception is the #[code token.text] attribute, which is (unicode)
|
||||
string-typed.
|
||||
|
||||
+section("token-init")
|
||||
+h(3, "token-init")
|
||||
| Token.__init__
|
||||
|
||||
+code("python", "Definition").
|
||||
def __init__(vocab, doc, offset):
|
||||
return Token()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell vocab
|
||||
+cell Vocab
|
||||
+cell A Vocab object
|
||||
|
||||
+row
|
||||
+cell doc
|
||||
+cell Doc
|
||||
+cell The parent sequence
|
||||
|
||||
+row
|
||||
+cell offset
|
||||
+cell #[+a(link_int) int]
|
||||
+cell The index of the token within the document
|
||||
|
||||
+section("token-stringfeatures")
|
||||
+h(3, "token-stringfeatures")
|
||||
| String Features
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell lemma / lemma_
|
||||
+cell.
|
||||
The "base" of the word, with no inflectional suffixes, e.g.
|
||||
the lemma of "developing" is "develop", the lemma of "geese"
|
||||
is "goose", etc. Note that #[em derivational] suffixes are
|
||||
not stripped, e.g. the lemma of "instutitions" is "institution",
|
||||
not "institute". Lemmatization is performed using the WordNet
|
||||
data, but extended to also cover closed-class words such as
|
||||
pronouns. By default, the WN lemmatizer returns "hi" as the
|
||||
lemma of "his". We assign pronouns the lemma #[code -PRON-].
|
||||
|
||||
+row
|
||||
+cell orth / orth_
|
||||
+cell.
|
||||
The form of the word with no string normalization or processing,
|
||||
as it appears in the string, without trailing whitespace.
|
||||
|
||||
+row
|
||||
+cell lower / lower_
|
||||
+cell.
|
||||
The form of the word, but forced to lower-case, i.e.
|
||||
#[code lower = word.orth_.lower()]
|
||||
|
||||
+row
|
||||
+cell shape / shape_
|
||||
+cell.
|
||||
A transform of the word's string, to show orthographic features.
|
||||
The characters a-z are mapped to x, A-Z is mapped to X, 0-9
|
||||
is mapped to d. After these mappings, sequences of 4 or more
|
||||
of the same character are truncated to length 4. Examples:
|
||||
C3Po --> XdXx, favorite --> xxxx, :) --> :)
|
||||
|
||||
+row
|
||||
+cell prefix / prefix_
|
||||
+cell.
|
||||
A length-N substring from the start of the word. Length may
|
||||
vary by language; currently for English n=1, i.e.
|
||||
#[code prefix = word.orth_[:1]]
|
||||
|
||||
+row
|
||||
+cell suffix / suffix_
|
||||
+cell.
|
||||
A length-N substring from the end of the word. Length may
|
||||
vary by language; currently for English n=3, i.e.
|
||||
#[code suffix = word.orth_[-3:]]
|
||||
|
||||
+section("token-booleanflags")
|
||||
+h(3, "token-booleanflags")
|
||||
| Boolean Flags
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell is_alpha
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.isalpha()]
|
||||
|
||||
+row
|
||||
+cell is_ascii
|
||||
+cell.
|
||||
Equivalent to any(ord(c) >= 128 for c in word.orth_)]
|
||||
|
||||
+row
|
||||
+cell is_digit
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.isdigit()]
|
||||
|
||||
+row
|
||||
+cell is_lower
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.islower()]
|
||||
|
||||
+row
|
||||
+cell is_title
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.istitle()]
|
||||
|
||||
+row
|
||||
+cell is_punct
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.ispunct()]
|
||||
|
||||
+row
|
||||
+cell is_space
|
||||
+cell.
|
||||
Equivalent to #[code word.orth_.isspace()]
|
||||
|
||||
+row
|
||||
+cell like_url
|
||||
+cell.
|
||||
Does the word resemble a URL?
|
||||
|
||||
+row
|
||||
+cell like_num
|
||||
+cell.
|
||||
Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
|
||||
|
||||
+row
|
||||
+cell like_email
|
||||
+cell.
|
||||
Does the word resemble an email?
|
||||
|
||||
+row
|
||||
+cell is_oov
|
||||
+cell.
|
||||
Is the word out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell is_stop
|
||||
+cell.
|
||||
Is the word part of a "stop list"? Stop lists are used to
|
||||
improve the quality of topic models, by filtering out common,
|
||||
domain-general words.
|
||||
|
||||
+section("token-distributional")
|
||||
+h(3, "token-distributional")
|
||||
| Distributional Features
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell prob
|
||||
+cell.
|
||||
The unigram log-probability of the word, estimated from
|
||||
counts from a large corpus, smoothed using Simple Good Turing
|
||||
estimation.
|
||||
|
||||
+row
|
||||
+cell cluster
|
||||
+cell.
|
||||
The Brown cluster ID of the word. These are often useful features
|
||||
for linear models. If you’re using a non-linear model, particularly
|
||||
a neural net or random forest, consider using the real-valued
|
||||
word representation vector, in #[code Token.repvec], instead.
|
||||
|
||||
+row
|
||||
+cell vector
|
||||
+cell.
|
||||
A "word embedding" representation: a dense real-valued vector
|
||||
that supports similarity queries between words. By default,
|
||||
spaCy currently loads vectors produced by the Levy and
|
||||
Goldberg (2014) dependency-based word2vec model.
|
||||
|
||||
+row
|
||||
+cell has_vector
|
||||
+cell.
|
||||
A boolean value indicating whether a vector.
|
||||
|
||||
+section("token-alignment")
|
||||
+h(3, "token-alignment")
|
||||
| Alignment and Output
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell idx
|
||||
+cell.
|
||||
Start index of the token in the string
|
||||
|
||||
+row
|
||||
+cell len(token)
|
||||
+cell.
|
||||
Length of the token's orth string, in unicode code-points.
|
||||
|
||||
+row
|
||||
+cell unicode(token)
|
||||
+cell.
|
||||
Same as #[code token.orth_].
|
||||
|
||||
+row
|
||||
+cell str(token)
|
||||
+cell.
|
||||
In Python 3, returns #[code token.orth_]. In Python 2, returns
|
||||
#[code token.orth_.encode('utf8')].
|
||||
|
||||
+row
|
||||
+cell text
|
||||
+cell.
|
||||
An alias for #[code token.orth_].
|
||||
|
||||
+row
|
||||
+cell text_with_ws
|
||||
+cell.
|
||||
#[code token.orth_ + token.whitespace_], i.e. the form of the
|
||||
word as it appears in the string, trailing whitespace. This is
|
||||
useful when you need to use linguistic features to add inline
|
||||
mark-up to the string.
|
||||
|
||||
+row
|
||||
+cell whitespace_
|
||||
+cell.
|
||||
The number of immediate syntactic children following the word
|
||||
in the string.
|
||||
|
||||
+section("token-postags")
|
||||
+h(3, "token-postags")
|
||||
| Part-of-Speech Tags
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell pos / pos_
|
||||
+cell.
|
||||
A coarse-grained, less detailed tag that represents the
|
||||
word-class of the token. The set of #[code .pos] tags are
|
||||
consistent across languages. The available tags are #[code ADJ],
|
||||
#[code ADP], #[code ADV], #[code AUX], #[code CONJ], #[code DET],
|
||||
#[code INTJ], #[code NOUN], #[code NUM], #[code PART],
|
||||
#[code PRON], #[code PROPN], #[code PUNCT], #[code SCONJ],
|
||||
#[code SYM], #[code VERB], #[code X], #[code EOL], #[code SPACE].
|
||||
|
||||
+row
|
||||
+cell tag / tag_
|
||||
+cell.
|
||||
A fine-grained, more detailed tag that represents the
|
||||
word-class and some basic morphological information for the
|
||||
token. These tags are primarily designed to be good features
|
||||
for subsequent models, particularly the syntactic parser.
|
||||
They are language and treebank dependent. The tagger is
|
||||
trained to predict these fine-grained tags, and then a
|
||||
mapping table is used to reduce them to the coarse-grained
|
||||
#[code .pos] tags.
|
||||
|
||||
+section("token-navigating")
|
||||
+h(3, "token-navigating") Navigating the Parse Tree
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell dep / dep_
|
||||
+cell.
|
||||
The syntactic relation type, aka the dependency label, connecting the word to its head.
|
||||
+row
|
||||
+cell head
|
||||
+cell.
|
||||
The immediate syntactic head of the token. If the token is the
|
||||
root of its sentence, it is the token itself, i.e.
|
||||
#[code root_token.head is root_token].
|
||||
|
||||
+row
|
||||
+cell children
|
||||
+cell.
|
||||
An iterator that yields from lefts, and then yields from rights.
|
||||
|
||||
+row
|
||||
+cell subtree
|
||||
+cell.
|
||||
An iterator for the part of the sentence syntactically governed
|
||||
by the word, including the word itself.
|
||||
|
||||
+row
|
||||
+cell left_edge
|
||||
+cell.
|
||||
The leftmost edge of the token's subtree.
|
||||
|
||||
+row
|
||||
+cell right_edge
|
||||
+cell.
|
||||
The rightmost edge of the token's subtree.
|
||||
|
||||
+row
|
||||
+cell nbor(i=1)
|
||||
+cell.
|
||||
Get the #[code i]#[sup th] next / previous neighboring token.
|
||||
|
||||
+section("token-namedentities")
|
||||
+h(3, "token-namedentities")
|
||||
| Named Entity Recognition
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell ent_type
|
||||
+cell.
|
||||
If the token is part of an entity, its entity type.
|
||||
|
||||
+row
|
||||
+cell ent_iob
|
||||
+cell.
|
||||
The IOB (inside, outside, begin) entity recognition tag for
|
||||
the token.
|
|
@ -1,154 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > API > VOCAB
|
||||
//- ----------------------------------
|
||||
|
||||
+section("vocab")
|
||||
+h(2, "vocab", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/vocab.pyx")
|
||||
| #[+tag class] Vocab
|
||||
|
||||
p
|
||||
| A look-up table that allows you to access #[code.lang-python Lexeme]
|
||||
| objects. The #[code.lang-python Vocab] instance also provides access to
|
||||
| the #[code.lang-python StringStore], and owns underlying C-data that
|
||||
| is shared between #[code.lang-python Doc] objects.
|
||||
|
||||
+aside('Caveat').
|
||||
You should avoid working with #[code Doc], #[code Token] or #[code Span]
|
||||
objects backed by multiple different #[code Vocab] instances, as
|
||||
they may assume inconsistent string-to-integer encodings. All #[code Doc]
|
||||
objects produced by the same #[code Language] instance will hold
|
||||
a reference to the same #[code Vocab] instance.
|
||||
|
||||
+code("python", "Overview").
|
||||
class Vocab:
|
||||
StringStore strings
|
||||
Morphology morphology
|
||||
dict get_lex_attr
|
||||
int vectors_length
|
||||
|
||||
def __init__(self, get_lex_attr=None, tag_map=None, lemmatizer=None, serializer_freqs=None):
|
||||
return self
|
||||
|
||||
@classmethod
|
||||
def load(cls, data_dir, get_lex_attr):
|
||||
return Vocab()
|
||||
|
||||
@classmethod
|
||||
def from_package(cls, package, get_lx_attr=None, vectors_package=None):
|
||||
return Vocab()
|
||||
|
||||
property serializer:
|
||||
return Packer()
|
||||
|
||||
def __len__(self):
|
||||
return int
|
||||
|
||||
def __contains__(self, string):
|
||||
return bool
|
||||
|
||||
def __getitem__(self, id_or_string):
|
||||
return Lexeme()
|
||||
|
||||
def dump(self, loc):
|
||||
return None
|
||||
|
||||
def load_lexemes(self, loc):
|
||||
return None
|
||||
|
||||
def dump_vectors(self, out_loc):
|
||||
return None
|
||||
|
||||
def load_vectors(self, file_):
|
||||
return int
|
||||
|
||||
def load_vectors_from_bin_loc(self, loc):
|
||||
return int
|
||||
|
||||
+table(["Example", "Description"])
|
||||
+row
|
||||
+cell #[code.lang-python lexeme = vocab[integer_id]]
|
||||
+cell.
|
||||
Get a lexeme by its orth ID.
|
||||
|
||||
+row
|
||||
+cell #[code.lang-python lexeme = vocab[string]]
|
||||
+cell.
|
||||
Get a lexeme by the string corresponding to its orth ID.
|
||||
|
||||
+row
|
||||
+cell #[code.lang-python for lexeme in vocab]
|
||||
+cell.
|
||||
Iterate over #[code Lexeme] objects.
|
||||
+row
|
||||
+cell #[code.lang-python int_id = vocab.strings[u'dog']]
|
||||
+cell.
|
||||
Access the #[code StringStore] via #[code vocab.strings]
|
||||
+row
|
||||
+cell #[code.lang-python nlp.vocab is nlp.tokenizer.vocab]
|
||||
+cell.
|
||||
Access the from #[code.lang-python Doc]
|
||||
|
||||
+section("vocab-dump")
|
||||
+h(3, "vocab-dump")
|
||||
| #[+tag method] Vocab.dump
|
||||
|
||||
+code("python", "Definition").
|
||||
def dump(self, loc):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell Path where the vocabulary should be saved.
|
||||
|
||||
+section("vocab-load_lexemes")
|
||||
+h(3, "vocab-load_lexemes")
|
||||
| #[+tag method] Vocab.load_lexemes
|
||||
|
||||
+code("python", "Definition").
|
||||
def load_lexemes(self, loc):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell Path to load the lexemes.bin file from.
|
||||
|
||||
+section("vocab-dump_vectors")
|
||||
+h(3, "vocab-dump_vectors")
|
||||
| #[+tag method] Vocab.dump_vectors
|
||||
|
||||
+code("python", "Definition").
|
||||
def dump_vectors(self, loc):
|
||||
return None
|
||||
|
||||
+section("vocab-loadvectors")
|
||||
+h(3, "vocab-loadvectors")
|
||||
| #[+tag method] Vocab.load_vectors
|
||||
|
||||
+code("python", "Definition").
|
||||
def load_vectors(self, file_):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell file
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell A file-like object, to load word vectors from.
|
||||
|
||||
+section("vocab-loadvectorsfrombinloc")
|
||||
+h(3, "vocab-saveload-loadvectorsfrom")
|
||||
| #[+tag method] Vocab.load_vectors_from_bin_loc
|
||||
|
||||
+code("python", "Definition").
|
||||
def load_vectors_from_bin_loc(self, loc):
|
||||
return None
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell loc
|
||||
+cell #[+a(link_unicode) unicode]
|
||||
+cell.
|
||||
A path to a file, in spaCy's binary word-vectors file format.
|
|
@ -2,29 +2,27 @@
|
|||
"index": {
|
||||
"title" : "Documentation",
|
||||
|
||||
"sidebar": {
|
||||
"Quickstart": [
|
||||
["Getting started", "#getting-started", "getting-started"],
|
||||
["Usage Examples", "#examples", "examples"]
|
||||
],
|
||||
"API": [
|
||||
["Language", "#language", "language"],
|
||||
["Doc", "#doc", "doc"],
|
||||
["Token", "#token", "token"],
|
||||
["Span", "#span", "span"],
|
||||
["Lexeme", "#lexeme", "lexeme"],
|
||||
["Vocab", "#vocab", "vocab"],
|
||||
["StringStore", "#stringstore", "stringstore"],
|
||||
["Matcher", "#matcher", "matcher"]
|
||||
],
|
||||
"More": [
|
||||
["Annotation Specs", "#annotation", "annotation"],
|
||||
["Tutorials", "#tutorials", "tutorials"]
|
||||
],
|
||||
"Feedback": [
|
||||
["Suggest Edits", "https://github.com/spacy-io/spaCy/tree/master/website/docs"],
|
||||
["Github Issue Tracker", "https://github.com/spacy-io/spaCy/issues"]
|
||||
]
|
||||
"sections": {
|
||||
"Usage": {
|
||||
"url": "/docs/usage",
|
||||
"svg": "computer",
|
||||
"description": "How to use spaCy and its features."
|
||||
},
|
||||
"API": {
|
||||
"url": "/docs/api",
|
||||
"svg": "brain",
|
||||
"description": "The detailed reference for spaCy's API."
|
||||
},
|
||||
"Tutorials": {
|
||||
"url": "/docs/usage/tutorials",
|
||||
"svg": "eye",
|
||||
"description": "End-to-end examples, with code you can modify and run."
|
||||
},
|
||||
"Showcase & Demos": {
|
||||
"url": "/docs/usage/showcase",
|
||||
"svg": "bubble",
|
||||
"description": "Demos, libraries and products from the spaCy community."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,176 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > QUICKSTART > USAGE EXAMPLES
|
||||
//- ----------------------------------
|
||||
|
||||
+section("examples")
|
||||
+h(2, "examples").
|
||||
Usage Examples
|
||||
|
||||
+h(3, "examples-resources") Load resources and process text
|
||||
|
||||
+code.
|
||||
import spacy
|
||||
en_nlp = spacy.load('en')
|
||||
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
|
||||
de_doc = de_nlp(u'ich bin ein Berliner.')
|
||||
|
||||
+h(3, "multi-threaded") Multi-threaded generator (using OpenMP. No GIL!)
|
||||
|
||||
+code.
|
||||
texts = [u'One document.', u'...', u'Lots of documents']
|
||||
# .pipe streams input, and produces streaming output
|
||||
iter_texts = (texts[i % 3] for i in xrange(100000000))
|
||||
for i, doc in enumerate(nlp.pipe(iter_texts, batch_size=50, n_threads=4)):
|
||||
assert doc.is_parsed
|
||||
if i == 100:
|
||||
break
|
||||
|
||||
+h(3, "examples-tokens-sentences") Get tokens and sentences
|
||||
|
||||
+code.
|
||||
token = doc[0]
|
||||
sentence = next(doc.sents)
|
||||
assert token is sentence[0]
|
||||
assert sentence.text == 'Hello, world.'
|
||||
|
||||
+h(3, "examples-integer-ids") Use integer IDs for any string
|
||||
|
||||
+code.
|
||||
hello_id = nlp.vocab.strings['Hello']
|
||||
hello_str = nlp.vocab.strings[hello_id]
|
||||
|
||||
assert token.orth == hello_id == 3125
|
||||
assert token.orth_ == hello_str == 'Hello'
|
||||
|
||||
+h(3, "examples-string-views-flags") Get and set string views and flags
|
||||
|
||||
+code.
|
||||
assert token.shape_ == 'Xxxxx'
|
||||
for lexeme in nlp.vocab:
|
||||
if lexeme.is_alpha:
|
||||
lexeme.shape_ = 'W'
|
||||
elif lexeme.is_digit:
|
||||
lexeme.shape_ = 'D'
|
||||
elif lexeme.is_punct:
|
||||
lexeme.shape_ = 'P'
|
||||
else:
|
||||
lexeme.shape_ = 'M'
|
||||
assert token.shape_ == 'W'
|
||||
|
||||
+h(3, "examples-numpy-arrays") Export to numpy arrays
|
||||
|
||||
+code.
|
||||
from spacy.attrs import ORTH, LIKE_URL, IS_OOV
|
||||
|
||||
attr_ids = [ORTH, LIKE_URL, IS_OOV]
|
||||
doc_array = doc.to_array(attr_ids)
|
||||
assert doc_array.shape == (len(doc), len(attr_ids))
|
||||
assert doc[0].orth == doc_array[0, 0]
|
||||
assert doc[1].orth == doc_array[1, 0]
|
||||
assert doc[0].like_url == doc_array[0, 1]
|
||||
assert list(doc_array[:, 1]) == [t.like_url for t in doc]
|
||||
|
||||
+h(3, "examples-word-vectors") Word vectors
|
||||
|
||||
+code.
|
||||
doc = nlp("Apples and oranges are similar. Boots and hippos aren't.")
|
||||
|
||||
apples = doc[0]
|
||||
oranges = doc[2]
|
||||
boots = doc[6]
|
||||
hippos = doc[8]
|
||||
|
||||
assert apples.similarity(oranges) > boots.similarity(hippos)
|
||||
|
||||
+h(3, "examples-pos-tags") Part-of-speech tags
|
||||
|
||||
+code.
|
||||
from spacy.parts_of_speech import ADV
|
||||
|
||||
def is_adverb(token):
|
||||
return token.pos == spacy.parts_of_speech.ADV
|
||||
|
||||
# These are data-specific, so no constants are provided. You have to look
|
||||
# up the IDs from the StringStore.
|
||||
NNS = nlp.vocab.strings['NNS']
|
||||
NNPS = nlp.vocab.strings['NNPS']
|
||||
def is_plural_noun(token):
|
||||
return token.tag == NNS or token.tag == NNPS
|
||||
|
||||
def print_coarse_pos(token):
|
||||
print(token.pos_)
|
||||
|
||||
def print_fine_pos(token):
|
||||
print(token.tag_)
|
||||
|
||||
+h(3, "examples-dependencies") Syntactic dependencies
|
||||
|
||||
+code.
|
||||
def dependency_labels_to_root(token):
|
||||
'''Walk up the syntactic tree, collecting the arc labels.'''
|
||||
dep_labels = []
|
||||
while token.head is not token:
|
||||
dep_labels.append(token.dep)
|
||||
token = token.head
|
||||
return dep_labels
|
||||
|
||||
+h(3, "examples-entities") Named entities
|
||||
|
||||
+code.
|
||||
def iter_products(docs):
|
||||
for doc in docs:
|
||||
for ent in doc.ents:
|
||||
if ent.label_ == 'PRODUCT':
|
||||
yield ent
|
||||
|
||||
def word_is_in_entity(word):
|
||||
return word.ent_type != 0
|
||||
|
||||
def count_parent_verb_by_person(docs):
|
||||
counts = defaultdict(defaultdict(int))
|
||||
for doc in docs:
|
||||
for ent in doc.ents:
|
||||
if ent.label_ == 'PERSON' and ent.root.head.pos == VERB:
|
||||
counts[ent.orth_][ent.root.head.lemma_] += 1
|
||||
return counts
|
||||
|
||||
+h(3, "examples-inline") Calculate inline mark-up on original string
|
||||
|
||||
+code.
|
||||
def put_spans_around_tokens(doc, get_classes):
|
||||
'''Given some function to compute class names, put each token in a
|
||||
span element, with the appropriate classes computed.
|
||||
|
||||
All whitespace is preserved, outside of the spans. (Yes, I know HTML
|
||||
won't display it. But the point is no information is lost, so you can
|
||||
calculate what you need, e.g. <br /> tags, <p> tags, etc.)
|
||||
'''
|
||||
output = []
|
||||
template = '<span classes="{classes}">{word}</span>{space}'
|
||||
for token in doc:
|
||||
if token.is_space:
|
||||
output.append(token.orth_)
|
||||
else:
|
||||
output.append(
|
||||
template.format(
|
||||
classes=' '.join(get_classes(token)),
|
||||
word=token.orth_,
|
||||
space=token.whitespace_))
|
||||
string = ''.join(output)
|
||||
string = string.replace('\n', '')
|
||||
string = string.replace('\t', ' ')
|
||||
return string
|
||||
|
||||
+h(3, "examples-binary") Efficient binary serialization
|
||||
|
||||
+code.
|
||||
import spacy
|
||||
from spacy.tokens.doc import Doc
|
||||
|
||||
byte_string = doc.to_bytes()
|
||||
open('moby_dick.bin', 'wb').write(byte_string)
|
||||
|
||||
nlp = spacy.load('en')
|
||||
for byte_string in Doc.read_bytes(open('moby_dick.bin', 'rb')):
|
||||
doc = Doc(nlp.vocab)
|
||||
doc.from_bytes(byte_string)
|
|
@ -1,122 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 QUICKSTART > GETTING STARTED
|
||||
//- ----------------------------------
|
||||
|
||||
+section("getting-started")
|
||||
+h(2, "getting-started")
|
||||
| Getting started
|
||||
|
||||
+section("install-spacy")
|
||||
+h(3, "install-spacy")
|
||||
| Install spaCy
|
||||
|
||||
p.
|
||||
spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux,
|
||||
OS X and Windows. The latest spaCy releases are currently only available as source packages over #[+a("https://pypy.python.org/pypi/spacy") pip]. Installaton requires a working build environment. See notes on #[a(href="/docs#install-source-ubuntu") Ubuntu],
|
||||
#[a(href="/docs#install-source-osx") OS X] and
|
||||
#[a(href="/docs#install-source-windows") Windows] for details.
|
||||
|
||||
+code("bash", "pip").
|
||||
pip install -U spacy
|
||||
|
||||
p.
|
||||
After installation you need to download a language model. Models for English (#[code en]) and German (#[code de]) are available.
|
||||
|
||||
+code("bash").
|
||||
# English:
|
||||
# - Install tagger, parser, NER and GloVe vectors:
|
||||
python -m spacy.en.download all
|
||||
# - OR install English tagger, parser and NER
|
||||
python -m spacy.en.download parser
|
||||
# - OR install English GloVe vectors
|
||||
python -m spacy.en.download glove
|
||||
# German:
|
||||
# - Install German tagger, parser, NER and word vectors
|
||||
python -m spacy.de.download all
|
||||
# Upgrade/overwrite existing data
|
||||
python -m spacy.en.download --force
|
||||
# Check whether the model was successfully installed
|
||||
python -c "import spacy; spacy.load('en'); print('OK')"
|
||||
|
||||
p.
|
||||
The download command fetches and installs about 1 GB of data which it installs
|
||||
within the #[code spacy] package directory.
|
||||
|
||||
+section("install-source")
|
||||
+h(3, "install-source")
|
||||
| Compile from source
|
||||
|
||||
p.
|
||||
The other way to install spaCy is to clone its
|
||||
#[a(href="https://github.com/spacy-io/spaCy") GitHub repository] and
|
||||
build it from source. That is the common way if you want to make changes
|
||||
to the code base.
|
||||
|
||||
p.
|
||||
You'll need to make sure that you have a development enviroment consisting
|
||||
of a Python distribution including header files, a compiler, pip,
|
||||
virtualenv and git installed. The compiler
|
||||
part is the trickiest. How to do that depends on your system. See
|
||||
notes on #[a(href="/docs#install-source-ubuntu") Ubuntu],
|
||||
#[a(href="/docs#install-source-osx") OS X] and
|
||||
#[a(href="/docs#install-source-windows") Windows] for details.
|
||||
|
||||
+code("bash").
|
||||
# make sure you are using recent pip/virtualenv versions
|
||||
python -m pip install -U pip virtualenv
|
||||
|
||||
# find git install instructions at https://git-scm.com/downloads
|
||||
git clone https://github.com/spacy-io/spaCy.git
|
||||
|
||||
cd spaCy
|
||||
virtualenv .env && source .env/bin/activate
|
||||
pip install -r requirements.txt
|
||||
pip install -e .
|
||||
|
||||
p.
|
||||
Compared to regular install via #[code pip] and #[code conda]
|
||||
#[+a("https://github.com/" + SOCIAL.github + "/spaCy/blob/master/requirements.txt") requirements.txt]
|
||||
additionally installs developer dependencies such as #[code cython].
|
||||
|
||||
+h(4, "install-source-ubuntu")
|
||||
| Ubuntu
|
||||
|
||||
p Install system-level dependencies via #[code apt-get]:
|
||||
|
||||
+code("bash").
|
||||
sudo apt-get install build-essential python-dev git
|
||||
|
||||
+h(4, "install-source-osx")
|
||||
| OS X
|
||||
|
||||
p.
|
||||
Install a recent version of XCode, including the so-called "Command Line Tools". OS X
|
||||
ships with Python and git preinstalled.
|
||||
|
||||
+h(4, "install-source-windows")
|
||||
| Windows
|
||||
|
||||
p.
|
||||
Install a version of Visual Studio Express or higher that matches the version that was
|
||||
used to compile your Python interpreter. For official distributions
|
||||
these are VS 2008 (Python 2.7), VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
|
||||
|
||||
+section("run-tests")
|
||||
+h(3, "run-tests")
|
||||
| Run tests
|
||||
|
||||
p.
|
||||
spaCy comes with an extensive test suite. First, find out where spaCy is installed:
|
||||
|
||||
+code("bash").
|
||||
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
||||
|
||||
p.
|
||||
Then run #[code pytest] on that directory. The flags #[code --vectors],
|
||||
#[code --slow] and #[code --model] are optional and enable additional tests:
|
||||
|
||||
+code("bash").
|
||||
# make sure you are using recent pytest version
|
||||
python -m pip install -U pytest
|
||||
|
||||
python -m pytest <spacy-directory> --vectors --model --slow
|
|
@ -1,12 +0,0 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS > TUTORIALS
|
||||
//- ----------------------------------
|
||||
|
||||
+section("tutorials")
|
||||
+h(2, "tutorials") Tutorials
|
||||
|
||||
each post, slug in public.docs.tutorials._data
|
||||
if slug != 'index'
|
||||
a.o-block(href='/docs/tutorials/' + slug)
|
||||
+h(3)=post.title
|
||||
p=post.description
|
103
website/docs/api/_data.json
Normal file
|
@ -0,0 +1,103 @@
|
|||
{
|
||||
"sidebar": {
|
||||
"Introduction": {
|
||||
"Facts & Figures": "./",
|
||||
"Philosophy": "philosophy"
|
||||
},
|
||||
"Classes": {
|
||||
"Doc": "doc",
|
||||
"Token": "token",
|
||||
"Span": "span",
|
||||
"Language": "language",
|
||||
"Tagger": "tagger",
|
||||
"DependencyParser": "dependencyparser",
|
||||
"EntityRecognizer": "entityrecognizer",
|
||||
"Matcher": "matcher",
|
||||
"Lexeme": "lexeme",
|
||||
"Vocab": "vocab",
|
||||
"StringStore": "stringstore",
|
||||
"GoldParse": "goldparse"
|
||||
},
|
||||
"Other": {
|
||||
"Annotation Specs": "annotation"
|
||||
}
|
||||
},
|
||||
|
||||
"index": {
|
||||
"title": "Facts & Figures",
|
||||
"next": "philosophy"
|
||||
},
|
||||
|
||||
"philosophy": {
|
||||
"title": "Philosophy"
|
||||
},
|
||||
|
||||
"language": {
|
||||
"title": "Language",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"doc": {
|
||||
"title": "Doc",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"token": {
|
||||
"title": "Token",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"span": {
|
||||
"title": "Span",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"lexeme": {
|
||||
"title": "Lexeme",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"vocab": {
|
||||
"title": "Vocab",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"stringstore": {
|
||||
"title": "StringStore",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"matcher": {
|
||||
"title": "Matcher",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"dependenyparser": {
|
||||
"title": "DependencyParser",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"entityrecognizer": {
|
||||
"title": "EntityRecognizer",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"dependencyparser": {
|
||||
"title": "DependencyParser",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"tagger": {
|
||||
"title": "Tagger",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"goldparse": {
|
||||
"title": "GoldParse",
|
||||
"tag": "class"
|
||||
},
|
||||
|
||||
"annotation": {
|
||||
"title": "Annotation Specifications"
|
||||
}
|
||||
}
|
148
website/docs/api/annotation.jade
Normal file
|
@ -0,0 +1,148 @@
|
|||
//- 💫 DOCS > API > ANNOTATION SPECS
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p This document describes the target annotations spaCy is trained to predict.
|
||||
|
||||
+h(2, "tokenization") Tokenization
|
||||
|
||||
p
|
||||
| Tokenization standards are based on the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus.
|
||||
| The tokenizer differs from most by including tokens for significant
|
||||
| whitespace. Any sequence of whitespace characters beyond a single space
|
||||
| (#[code ' ']) is included as a token.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.en import English
|
||||
nlp = English(parser=False)
|
||||
tokens = nlp('Some\nspaces and\ttab characters')
|
||||
print([t.orth_ for t in tokens])
|
||||
# ['Some', '\n', 'spaces', ' ', 'and', '\t', 'tab', 'characters']
|
||||
|
||||
p
|
||||
| The whitespace tokens are useful for much the same reason punctuation is
|
||||
| – it's often an important delimiter in the text. By preserving it in the
|
||||
| token output, we are able to maintain a simple alignment between the
|
||||
| tokens and the original string, and we ensure that no information is
|
||||
| lost during processing.
|
||||
|
||||
+h(2, "sentence-boundary") Sentence boundary detection
|
||||
|
||||
p
|
||||
| Sentence boundaries are calculated from the syntactic parse tree, so
|
||||
| features such as punctuation and capitalisation play an important but
|
||||
| non-decisive role in determining the sentence boundaries. Usually this
|
||||
| means that the sentence boundaries will at least coincide with clause
|
||||
| boundaries, even given poorly punctuated text.
|
||||
|
||||
+h(2, "pos-tagging") Part-of-speech Tagging
|
||||
|
||||
p
|
||||
| The part-of-speech tagger uses the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] version of
|
||||
| the Penn Treebank tag set. We also map the tags to the simpler Google
|
||||
| Universal POS Tag set. See
|
||||
| #[+src(gh("spaCy", "spacy/tagger.pyx")) tagger.pyx] for details.
|
||||
|
||||
+h(2, "lemmatization") Lemmatization
|
||||
|
||||
p A "lemma" is the uninflected form of a word. In English, this means:
|
||||
|
||||
+list
|
||||
+item #[strong Adjectives]: The form like "happy", not "happier" or "happiest"
|
||||
+item #[strong Adverbs]: The form like "badly", not "worse" or "worst"
|
||||
+item #[strong Nouns]: The form like "dog", not "dogs"; like "child", not "children"
|
||||
+item #[strong Verbs]: The form like "write", not "writes", "writing", "wrote" or "written"
|
||||
|
||||
p
|
||||
| The lemmatization data is taken from
|
||||
| #[+a("https://wordnet.princeton.edu") WordNet]. However, we also add a
|
||||
| special case for pronouns: all pronouns are lemmatized to the special
|
||||
| token #[code -PRON-].
|
||||
|
||||
+h(2, "dependency-parsing") Syntactic Dependency Parsing
|
||||
|
||||
p
|
||||
| The parser is trained on data produced by the
|
||||
| #[+a("http://www.clearnlp.com") ClearNLP] converter. Details of the
|
||||
| annotation scheme can be found
|
||||
| #[+a("http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf") here].
|
||||
|
||||
+h(2, "named-entities") Named Entity Recognition
|
||||
|
||||
+table(["Entity Type", "Description"])
|
||||
+row
|
||||
+cell #[code PERSON]
|
||||
+cell People, including fictional.
|
||||
|
||||
+row
|
||||
+cell #[code NORP]
|
||||
+cell Nationalities or religious or political groups.
|
||||
|
||||
+row
|
||||
+cell #[code FAC]
|
||||
+cell Facilities, such as buildings, airports, highways, bridges, etc.
|
||||
|
||||
+row
|
||||
+cell #[code ORG]
|
||||
+cell Companies, agencies, institutions, etc.
|
||||
|
||||
+row
|
||||
+cell #[code GPE]
|
||||
+cell Countries, cities, states.
|
||||
|
||||
+row
|
||||
+cell #[code LOC]
|
||||
+cell Non-GPE locations, mountain ranges, bodies of water.
|
||||
|
||||
+row
|
||||
+cell #[code PRODUCT]
|
||||
+cell Vehicles, weapons, foods, etc. (Not services)
|
||||
|
||||
+row
|
||||
+cell #[code EVENT]
|
||||
+cell Named hurricanes, battles, wars, sports events, etc.
|
||||
|
||||
+row
|
||||
+cell #[code WORK_OF_ART]
|
||||
+cell Titles of books, songs, etc.
|
||||
|
||||
+row
|
||||
+cell #[code LAW]
|
||||
+cell Named documents made into laws
|
||||
|
||||
+row
|
||||
+cell #[code LANGUAGE]
|
||||
+cell Any named language
|
||||
|
||||
p The following values are also annotated in a style similar to names:
|
||||
|
||||
+table(["Entity Type", "Description"])
|
||||
+row
|
||||
+cell #[code DATE]
|
||||
+cell Absolute or relative dates or periods
|
||||
|
||||
+row
|
||||
+cell #[code TIME]
|
||||
+cell Times smaller than a day
|
||||
|
||||
+row
|
||||
+cell #[code PERCENT]
|
||||
+cell Percentage (including “%”)
|
||||
|
||||
+row
|
||||
+cell #[code MONEY]
|
||||
+cell Monetary values, including unit
|
||||
|
||||
+row
|
||||
+cell #[code QUANTITY]
|
||||
+cell Measurements, as of weight or distance
|
||||
|
||||
+row
|
||||
+cell #[code ORDINAL]
|
||||
+cell "first", "second"
|
||||
|
||||
+row
|
||||
+cell #[code CARDINAL]
|
||||
+cell Numerals that do not fall under another type
|
135
website/docs/api/dependencyparser.jade
Normal file
|
@ -0,0 +1,135 @@
|
|||
//- 💫 DOCS > API > DEPENDENCYPARSER
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Annotate syntactic dependencies on #[code Doc] objects.
|
||||
|
||||
+h(2, "load") DependencyParser.load
|
||||
+tag classmethod
|
||||
|
||||
p Load the statistical model from the supplied path.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell #[code Path]
|
||||
+cell The path to load from.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared by the documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code require]
|
||||
+cell bool
|
||||
+cell Whether to raise an error if the files are not found.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code DependencyParser]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "init") DependencyParser.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code DependencyParser].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared with documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell #[thinc.linear.AveragedPerceptron]
|
||||
+cell The statistical model.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code DependencyParser]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") DependencyParser.__call__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Apply the dependency parser, setting the heads and dependency relations
|
||||
| onto the #[code Doc] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to be processed.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "pipe") DependencyParser.pipe
|
||||
+tag method
|
||||
|
||||
p Process a stream of documents.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code stream]
|
||||
+cell -
|
||||
+cell The sequence of documents to process.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "update") DependencyParser.update
|
||||
+tag method
|
||||
|
||||
p Update the statistical model.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The example document for the update.
|
||||
|
||||
+row
|
||||
+cell #[code gold]
|
||||
+cell #[code GoldParse]
|
||||
+cell The gold-standard annotations, to calculate the loss.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The loss on this example.
|
||||
|
||||
+h(2, "step_through") DependencyParser.step_through
|
||||
+tag method
|
||||
|
||||
p Set up a stepwise state, to introspect and control the transition sequence.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to step through.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code StepwiseState]
|
||||
+cell A state object, to step through the annotation process.
|
416
website/docs/api/doc.jade
Normal file
|
@ -0,0 +1,416 @@
|
|||
//- 💫 DOCS > API > DOC
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p A container for accessing linguistic annotations.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code Pool]
|
||||
+cell The document's local memory heap, for all C data it owns.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The store of lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code user_data]
|
||||
+cell -
|
||||
+cell A generic storage area, for user custom data.
|
||||
|
||||
+row
|
||||
+cell #[code is_tagged]
|
||||
+cell bool
|
||||
+cell
|
||||
| A flag indicating that the document has been part-of-speech
|
||||
| tagged.
|
||||
|
||||
+row
|
||||
+cell #[code is_parsed]
|
||||
+cell bool
|
||||
+cell A flag indicating that the document has been syntactically parsed.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell The document's positivity/negativity score, if available.
|
||||
|
||||
+row
|
||||
+cell #[code user_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of the #[code Doc]'s
|
||||
| properties.
|
||||
|
||||
+row
|
||||
+cell #[code user_token_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of properties of
|
||||
| #[code Token] chldren.
|
||||
|
||||
+row
|
||||
+cell #[code user_span_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of properties of
|
||||
| #[code Span] chldren.
|
||||
|
||||
+h(2, "init") Doc.__init__
|
||||
+tag method
|
||||
|
||||
p Construct a #[code Doc] object.
|
||||
|
||||
+aside("Note")
|
||||
| The most common way to get a #[code Doc] object is via the #[code nlp]
|
||||
| object. This method is usually only used for deserialization or preset
|
||||
| tokenization.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A storage container for lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code words]
|
||||
+cell -
|
||||
+cell A list of strings to add to the container.
|
||||
|
||||
+row
|
||||
+cell #[code spaces]
|
||||
+cell -
|
||||
+cell
|
||||
| A list of boolean values indicating whether each word has a
|
||||
| subsequent space. Must have the same length as #[code words], if
|
||||
| specified. Defaults to a sequence of #[code True].
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Doc]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "getitem") Doc.__getitem__
|
||||
+tag method
|
||||
|
||||
p Get a #[code Token] object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
assert doc[0].text == 'Give'
|
||||
assert doc[-1].text == '.'
|
||||
span = doc[1:1]
|
||||
assert span.text == 'it back'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The token at #[code doc[i]].
|
||||
|
||||
p Get a #[code Span] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_end]
|
||||
+cell tuple
|
||||
+cell The slice of the document to get.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Span]
|
||||
+cell The span at #[code doc[start : end]].
|
||||
|
||||
+h(2, "iter") Doc.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over #[code Token] objects.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A #[code Token] object.
|
||||
|
||||
+h(2, "len") Doc.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of tokens in the document.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of tokens in the document.
|
||||
|
||||
+h(2, "similarity") Doc.similarity
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Make a semantic similarity estimate. The default estimate is cosine
|
||||
| similarity using an average of word vectors.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code other]
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "to_array") Doc.to_array
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Export the document annotations to a numpy array of shape #[code N*M]
|
||||
| where #[code N] is the length of the document and #[code M] is the number
|
||||
| of attribute IDs to export. The values will be 32-bit integers.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy import attrs
|
||||
doc = nlp(text)
|
||||
# All strings mapped to integers, for easy export to numpy
|
||||
np_array = doc.to_array([attrs.LOWER, attrs.POS,
|
||||
attrs.ENT_TYPE, attrs.IS_ALPHA])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_ids]
|
||||
+cell ints
|
||||
+cell A list of attribute ID ints.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code numpy.ndarray[ndim=2, dtype='int32']]
|
||||
+cell
|
||||
| The exported attributes as a 2D numpy array, with one row per
|
||||
| token and one column per attribute.
|
||||
|
||||
+h(2, "count_by") Doc.count_by
|
||||
+tag method
|
||||
|
||||
p Count the frequencies of a given attribute.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_id]
|
||||
+cell int
|
||||
+cell The attribute ID
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell dict
|
||||
+cell A dictionary mapping attributes to integer counts.
|
||||
|
||||
+h(2, "from_array") Doc.from_array
|
||||
+tag method
|
||||
|
||||
p Load attributes from a numpy array.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_ids]
|
||||
+cell ints
|
||||
+cell A list of attribute ID ints.
|
||||
|
||||
+row
|
||||
+cell #[code values]
|
||||
+cell #[code numpy.ndarray[ndim=2, dtype='int32']]
|
||||
+cell The attribute values to load.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "to_bytes") Doc.to_bytes
|
||||
+tag method
|
||||
|
||||
p Export the document contents to a binary string.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bytes
|
||||
+cell
|
||||
| A losslessly serialized copy of the #[code Doc] including all
|
||||
| annotations.
|
||||
|
||||
+h(2, "from_bytes") Doc.from_bytes
|
||||
+tag method
|
||||
|
||||
p Import the document contents from a binary string.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code byte_string]
|
||||
+cell bytes
|
||||
+cell The string to load from.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Doc]
|
||||
+cell The #[code self] variable.
|
||||
|
||||
+h(2, "merge") Doc.merge
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retokenize the document, such that the span at
|
||||
| #[code doc.text[start_idx : end_idx]] is merged into a single token. If
|
||||
| #[code start_idx] and #[end_idx] do not mark start and end token
|
||||
| boundaries, the document remains unchanged.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_idx]
|
||||
+cell int
|
||||
+cell The character index of the start of the slice to merge.
|
||||
|
||||
+row
|
||||
+cell #[code end_idx]
|
||||
+cell int
|
||||
+cell The character index after the end of the slice to merge.
|
||||
|
||||
+row
|
||||
+cell #[code **attributes]
|
||||
+cell -
|
||||
+cell
|
||||
| Attributes to assign to the merged token. By default,
|
||||
| attributes are inherited from the syntactic root token of
|
||||
| the span.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell
|
||||
| The newly merged token, or None if the start and end
|
||||
| indices did not fall at token boundaries
|
||||
|
||||
+h(2, "read_bytes") Doc.read_bytes
|
||||
+tag staticmethod
|
||||
|
||||
p A static method, used to read serialized #[code Doc] objects from a file.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens.doc import Doc
|
||||
loc = 'test_serialize.bin'
|
||||
with open(loc, 'wb') as file_:
|
||||
file_.write(nlp(u'This is a document.').to_bytes())
|
||||
file_.write(nlp(u'This is another.').to_bytes())
|
||||
docs = []
|
||||
with open(loc, 'rb') as file_:
|
||||
for byte_string in Doc.read_bytes(file_):
|
||||
docs.append(Doc(nlp.vocab).from_bytes(byte_string))
|
||||
assert len(docs) == 2
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell file
|
||||
+cell buffer
|
||||
+cell A binary buffer to read the serialized annotations from.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell bytes
|
||||
+cell Binary strings from with documents can be loaded.
|
||||
|
||||
+h(2, "text") Doc.text
|
||||
+tag property
|
||||
|
||||
p A unicode representation of the document text.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell unicode
|
||||
+cell The original verbatim text of the document.
|
||||
|
||||
+h(2, "text_with_ws") Doc.text_with_ws
|
||||
+tag property
|
||||
|
||||
p
|
||||
| An alias of #[code Doc.text], provided for duck-type compatibility with
|
||||
| #[code Span] and #[code Token].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell unicode
|
||||
+cell The original verbatim text of the document.
|
||||
|
||||
+h(2, "sents") Doc.sents
|
||||
+tag property
|
||||
|
||||
p Iterate over the sentences in the document.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Span]
|
||||
+cell Sentences in the document.
|
||||
|
||||
+h(2, "ents") Doc.ents
|
||||
+tag property
|
||||
|
||||
p Iterate over the entities in the document.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Span]
|
||||
+cell Entities in the document.
|
||||
|
||||
+h(2, "noun_chunks") Doc.noun_chunks
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Iterate over the base noun phrases in the document. A base noun phrase,
|
||||
| or "NP chunk", is a noun phrase that does not permit other NPs to be
|
||||
| nested within it.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Span]
|
||||
+cell Noun chunks in the document
|
||||
|
||||
+h(2, "vector") Doc.vector
|
||||
+tag property
|
||||
|
||||
p
|
||||
| A real-valued meaning representation. Defaults to an average of the
|
||||
| token vectors.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the document's semantics.
|
||||
|
||||
+h(2, "has_vector") Doc.has_vector
|
||||
+tag property
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the document has a vector data attached.
|
133
website/docs/api/entityrecognizer.jade
Normal file
|
@ -0,0 +1,133 @@
|
|||
//- 💫 DOCS > API > ENTITYRECOGNIZER
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Annotate named entities on #[code Doc] objects.
|
||||
|
||||
+h(2, "load") EntityRecognizer.load
|
||||
+tag classmethod
|
||||
|
||||
p Load the statistical model from the supplied path.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell #[code Path]
|
||||
+cell The path to load from.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared by the documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code require]
|
||||
+cell bool
|
||||
+cell Whether to raise an error if the files are not found.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code EntityRecognizer]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "init") EntityRecognizer.__init__
|
||||
+tag method
|
||||
|
||||
p Create an #[code EntityRecognizer].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared with documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell #[thinc.linear.AveragedPerceptron]
|
||||
+cell The statistical model.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code EntityRecognizer]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") EntityRecognizer.__call__
|
||||
+tag method
|
||||
|
||||
p Apply the entity recognizer, setting the NER tags onto the #[code Doc] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to be processed.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "pipe") EntityRecognizer.pipe
|
||||
+tag method
|
||||
|
||||
p Process a stream of documents.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code stream]
|
||||
+cell -
|
||||
+cell The sequence of documents to process.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "update") EntityRecognizer.update
|
||||
+tag method
|
||||
|
||||
p Update the statistical model.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The example document for the update.
|
||||
|
||||
+row
|
||||
+cell #[code gold]
|
||||
+cell #[code GoldParse]
|
||||
+cell The gold-standard annotations, to calculate the loss.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The loss on this example.
|
||||
|
||||
+h(2, "step_through") EntityRecognizer.step_through
|
||||
+tag method
|
||||
|
||||
p Set up a stepwise state, to introspect and control the transition sequence.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to step through.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code StepwiseState]
|
||||
+cell A state object, to step through the annotation process.
|
103
website/docs/api/goldparse.jade
Normal file
|
@ -0,0 +1,103 @@
|
|||
//- 💫 DOCS > API > GOLDPARSE
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Collection for training annotations.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code tags]
|
||||
+cell list
|
||||
+cell The part-of-speech tag annotations.
|
||||
|
||||
+row
|
||||
+cell #[code heads]
|
||||
+cell list
|
||||
+cell The syntactic head annotations.
|
||||
|
||||
+row
|
||||
+cell #[code labels]
|
||||
+cell list
|
||||
+cell The syntactic relation-type annotations.
|
||||
|
||||
+row
|
||||
+cell #[code ents]
|
||||
+cell list
|
||||
+cell The named entity annotations.
|
||||
|
||||
+row
|
||||
+cell #[code cand_to_gold]
|
||||
+cell list
|
||||
+cell The alignment from candidate tokenization to gold tokenization.
|
||||
|
||||
+row
|
||||
+cell #[code gold_to_cand]
|
||||
+cell list
|
||||
+cell The alignment from gold tokenization to candidate tokenization.
|
||||
|
||||
+h(2, "init") GoldParse.__init__
|
||||
+tag method
|
||||
|
||||
p Create a GoldParse.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document the annotations refer to.
|
||||
|
||||
+row
|
||||
+cell #[code words]
|
||||
+cell -
|
||||
+cell A sequence of unicode word strings.
|
||||
|
||||
+row
|
||||
+cell #[code tags]
|
||||
+cell -
|
||||
+cell A sequence of strings, representing tag annotations.
|
||||
|
||||
+row
|
||||
+cell #[code heads]
|
||||
+cell -
|
||||
+cell A sequence of integers, representing syntactic head offsets.
|
||||
|
||||
+row
|
||||
+cell #[code deps]
|
||||
+cell -
|
||||
+cell A sequence of strings, representing the syntactic relation types.
|
||||
|
||||
+row
|
||||
+cell #[code entities]
|
||||
+cell -
|
||||
+cell A sequence of named entity annotations, either as BILUO tag strings, or as #[code (start_char, end_char, label)] tuples, representing the entity positions.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code GoldParse]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") GoldParse.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of gold-standard tokens.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of gold-standard tokens.
|
||||
|
||||
+h(2, "is_projective") GoldParse.is_projective
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Whether the provided syntactic annotations form a projective dependency
|
||||
| tree.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether annotations form projective tree.
|
239
website/docs/api/index.jade
Normal file
|
@ -0,0 +1,239 @@
|
|||
//- 💫 DOCS > API > FACTS & FIGURES
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
+h(2, "comparison") Feature comparison
|
||||
|
||||
p
|
||||
| Here's a quick comparison of the functionalities offered by spaCy,
|
||||
| #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") SyntaxNet],
|
||||
| #[+a("http://www.nltk.org/py-modindex.html") NLTK] and
|
||||
| #[+a("http://stanfordnlp.github.io/CoreNLP/") CoreNLP].
|
||||
|
||||
+table([ "", "spaCy", "SyntaxNet", "NLTK", "CoreNLP"])
|
||||
+row
|
||||
+cell Easy installation
|
||||
each icon in [ "pro", "con", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Python API
|
||||
each icon in [ "pro", "con", "pro", "con" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Multi-language support
|
||||
each icon in [ "con", "pro", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Tokenization
|
||||
each icon in [ "pro", "pro", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Part-of-speech tagging
|
||||
each icon in [ "pro", "pro", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Sentence segmentation
|
||||
each icon in [ "pro", "pro", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Dependency parsing
|
||||
each icon in [ "pro", "pro", "con", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Entity Regonition
|
||||
each icon in [ "pro", "con", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Integrated word vectors
|
||||
each icon in [ "pro", "con", "con", "con" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Sentiment analysis
|
||||
each icon in [ "pro", "con", "pro", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell Coreference resolution
|
||||
each icon in [ "con", "con", "con", "pro" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+h(2, "benchmarks") Benchmarks
|
||||
|
||||
p
|
||||
| Two peer-reviewed papers in 2015 confirm that it offers the
|
||||
| #[strong fastest syntactic parser in the world] and that
|
||||
| #[strong its accuracy is within 1% of the best] available. The few
|
||||
| systems that are more accurate are 20× slower or more.
|
||||
|
||||
+aside("About the evaluation")
|
||||
| The first of the evaluations was published by #[strong Yahoo! Labs] and
|
||||
| #[strong Emory University], as part of a survey of current parsing
|
||||
| technologies #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") (Choi et al., 2015)].
|
||||
| Their results and subsequent discussions helped us develop a novel
|
||||
| psychologically-motivated technique to improve spaCy's accuracy, which
|
||||
| we published in joint work with Macquarie University
|
||||
| #[+a("https://aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)].
|
||||
|
||||
+table([ "System", "Language", "Accuracy", "Speed (wps)"])
|
||||
+row
|
||||
each data in [ "spaCy", "Cython", "91.8", "13,963" ]
|
||||
+cell #[strong=data]
|
||||
+row
|
||||
each data in [ "ClearNLP", "Java", "91.7", "10,271" ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
each data in [ "CoreNLP", "Java", "89.6", "8,602"]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
each data in [ "MATE", "Java", "92.5", "550"]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
each data in [ "Turbo", "C++", "92.4", "349" ]
|
||||
+cell=data
|
||||
|
||||
+h(3, "parse-accuracy") Parse accuracy
|
||||
|
||||
p
|
||||
| In 2016, Google released their
|
||||
| #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") SyntaxNet]
|
||||
| library, setting a new state of the art for syntactic dependency parsing
|
||||
| accuracy. SyntaxNet's algorithm is very similar to spaCy's. The main
|
||||
| difference is that SyntaxNet uses a neural network while spaCy uses a
|
||||
| sparse linear model.
|
||||
|
||||
+aside("Methodology")
|
||||
| #[+a("http://arxiv.org/abs/1603.06042") Andor et al. (2016)] chose
|
||||
| slightly different experimental conditions from
|
||||
| #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al. (2015)],
|
||||
| so the two accuracy tables here do not present directly comparable
|
||||
| figures. We have only evaluated spaCy in the "News" condition following
|
||||
| the SyntaxNet methodology. We don't yet have benchmark figures for the
|
||||
| "Web" and "Questions" conditions.
|
||||
|
||||
+table([ "System", "News", "Web", "Questions" ])
|
||||
+row
|
||||
+cell spaCy
|
||||
each data in [ 92.8, "n/a", "n/a" ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[+a("https://github.com/tensorflow/models/tree/master/syntaxnet") Parsey McParseface]
|
||||
each data in [ 94.15, 89.08, 94.77 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[+a("http://www.cs.cmu.edu/~ark/TurboParser/") Martins et al. (2013)]
|
||||
each data in [ 93.10, 88.23, 94.21 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[+a("http://research.google.com/pubs/archive/38148.pdf") Zhang and McDonald (2014)]
|
||||
each data in [ 93.32, 88.65, 93.37 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[+a("http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf") Weiss et al. (2015)]
|
||||
each data in [ 93.91, 89.29, 94.17 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[strong #[+a("http://arxiv.org/abs/1603.06042") Andor et al. (2016)]]
|
||||
each data in [ 94.44, 90.17, 95.40 ]
|
||||
+cell #[strong=data]
|
||||
|
||||
+h(3, "speed-comparison") Detailed speed comparison
|
||||
|
||||
p
|
||||
| Here we compare the per-document processing time of various spaCy
|
||||
| functionalities against other NLP libraries. We show both absolute
|
||||
| timings (in ms) and relative performance (normalized to spaCy). Lower is
|
||||
| better.
|
||||
|
||||
+aside("Methodology")
|
||||
| #[strong Set up:] 100,000 plain-text documents were streamed from an
|
||||
| SQLite3 database, and processed with an NLP library, to one of three
|
||||
| levels of detail — tokenization, tagging, or parsing. The tasks are
|
||||
| additive: to parse the text you have to tokenize and tag it. The
|
||||
| pre-processing was not subtracted from the times — I report the time
|
||||
| required for the pipeline to complete. I report mean times per document,
|
||||
| in milliseconds.#[br]#[br]
|
||||
| #[strong Hardware]: Intel i7-3770 (2012)#[br]
|
||||
| #[strong Implementation]: #[+src(gh("spacy-benchmarks")) spacy-benchmarks]
|
||||
|
||||
+table
|
||||
+row.u-text-label.u-text-center
|
||||
th.c-table__head-cell
|
||||
th.c-table__head-cell(colspan="3") Absolute (ms per doc)
|
||||
th.c-table__head-cell(colspan="3") Relative (to spaCy)
|
||||
|
||||
+row
|
||||
each column in ["System", "Tokenize", "Tag", "Parse", "Tokenize", "Tag", "Parse"]
|
||||
th.c-table__head-cell.u-text-label=column
|
||||
|
||||
+row
|
||||
+cell #[strong spaCy]
|
||||
each data in [ "0.2ms", "1ms", "19ms"]
|
||||
+cell #[strong=data]
|
||||
|
||||
each data in [ "1x", "1x", "1x" ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
each data in [ "CoreNLP", "2ms", "10ms", "49ms", "10x", "10x", "2.6x"]
|
||||
+cell=data
|
||||
+row
|
||||
each data in [ "ZPar", "1ms", "8ms", "850ms", "5x", "8x", "44.7x" ]
|
||||
+cell=data
|
||||
+row
|
||||
each data in [ "NLTK", "4ms", "443ms", "n/a", "20x", "443x", "n/a" ]
|
||||
+cell=data
|
||||
|
||||
+h(3, "ner") Named entity comparison
|
||||
|
||||
p
|
||||
| #[+a("https://aclweb.org/anthology/W/W16/W16-2703.pdf") Jiang et al. (2016)]
|
||||
| present several detailed comparisons of the named entity recognition
|
||||
| models provided by spaCy, CoreNLP, NLTK and LingPipe. Here we show their
|
||||
| evaluation of person, location and organization accuracy on Wikipedia.
|
||||
|
||||
+aside("Methodology")
|
||||
| Making a meaningful comparison of different named entity recognition
|
||||
| systems is tricky. Systems are often trained on different data, which
|
||||
| usually have slight differences in annotation style. For instance, some
|
||||
| corpora include titles as part of person names, while others don't.
|
||||
| These trivial differences in convention can distort comparisons
|
||||
| significantly. Jiang et al.'s #[em partial overlap] metric goes a long
|
||||
| way to solving this problem.
|
||||
|
||||
+table([ "System", "Precision", "Recall", "F-measure" ])
|
||||
+row
|
||||
+cell spaCy
|
||||
each data in [ 0.7240, 0.6514, 0.6858 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell #[strong CoreNLP]
|
||||
each data in [ 0.7914, 0.7327, 0.7609 ]
|
||||
+cell #[strong=data]
|
||||
|
||||
+row
|
||||
+cell NLTK
|
||||
each data in [ 0.5136, 0.6532, 0.5750 ]
|
||||
+cell=data
|
||||
|
||||
+row
|
||||
+cell LingPipe
|
||||
each data in [ 0.5412, 0.5357, 0.5384 ]
|
||||
+cell=data
|
138
website/docs/api/language.jade
Normal file
|
@ -0,0 +1,138 @@
|
|||
//- 💫 DOCS > API > LANGUAGE
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p A text processing pipeline.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A container for the lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code tokenizer]
|
||||
+cell #[code Tokenizer]
|
||||
+cell Find word boundaries and create #[code Doc] object.
|
||||
|
||||
+row
|
||||
+cell #[code tagger]
|
||||
+cell #[code Tagger]
|
||||
+cell Annotate #[code Doc] objects with POS tags.
|
||||
|
||||
+row
|
||||
+cell #[code parser]
|
||||
+cell #[code DependencyParser]
|
||||
+cell Annotate #[code Doc] objects with syntactic dependencies.
|
||||
|
||||
+row
|
||||
+cell #[code entity]
|
||||
+cell #[code EntityRecognizer]
|
||||
+cell Annotate #[code Doc] objects with named entities.
|
||||
|
||||
+row
|
||||
+cell #[code matcher]
|
||||
+cell #[code Matcher]
|
||||
+cell Rule-based sequence matcher.
|
||||
|
||||
+row
|
||||
+cell #[code make_doc]
|
||||
+cell #[code lambda text: Doc]
|
||||
+cell Create a #[code Doc] object from unicode text.
|
||||
|
||||
+row
|
||||
+cell #[code pipeline]
|
||||
+cell -
|
||||
+cell Sequence of annotation functions.
|
||||
|
||||
|
||||
+h(2, "init") Language.__init__
|
||||
+tag method
|
||||
|
||||
p Create or load the pipeline.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **kwrags]
|
||||
+cell -
|
||||
+cell Keyword arguments indicating which defaults to override.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Language]
|
||||
+cell #[code self]
|
||||
|
||||
+h(2, "call") Language.__call__
|
||||
+tag method
|
||||
|
||||
p Apply the pipeline to a single text.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.en import English
|
||||
nlp = English()
|
||||
doc = nlp('An example sentence. Another example sentence.')
|
||||
doc[0].orth_, doc[0].head.tag_
|
||||
# ('An', 'NN')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell The text to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code tag]
|
||||
+cell bool
|
||||
+cell Whether to apply the part-of-speech tagger.
|
||||
|
||||
+row
|
||||
+cell #[code parse]
|
||||
+cell bool
|
||||
+cell Whether to apply the syntactic dependency parser.
|
||||
|
||||
+row
|
||||
+cell #[code entity]
|
||||
+cell bool
|
||||
+cell Whether to apply the named entity recognizer.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Doc]
|
||||
+cell A container for accessing the linguistic annotations.
|
||||
|
||||
+h(2, "pipe") Language.pipe
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Process texts as a stream, and yield #[code Doc] objects in order.
|
||||
| Supports GIL-free multi-threading.
|
||||
|
||||
+aside-code("Example").
|
||||
texts = [u'One document.', u'...', u'Lots of documents']
|
||||
for doc in nlp.pipe(texts, batch_size=50, n_threads=4):
|
||||
assert doc.is_parsed
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code texts]
|
||||
+cell -
|
||||
+cell A sequence of unicode objects.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of worker threads to use. If #[code -1], OpenMP will
|
||||
| decide how many to use at run time. Default is #[code 2].
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of texts to buffer.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Doc]
|
||||
+cell Containers for accessing the linguistic annotations.
|
239
website/docs/api/lexeme.jade
Normal file
|
@ -0,0 +1,239 @@
|
|||
//- 💫 DOCS > API > LEXEME
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p An entry in the vocabulary.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell
|
||||
|
||||
+row
|
||||
+cell #[code lower]
|
||||
+cell int
|
||||
+cell Lower-case form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code lower_]
|
||||
+cell unicode
|
||||
+cell Lower-case form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell int
|
||||
+cell Transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code shape_]
|
||||
+cell unicode
|
||||
+cell Transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code prefix]
|
||||
+cell int
|
||||
+cell Length-N substring from the start of the word. Defaults to #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code prefix_]
|
||||
+cell unicode
|
||||
+cell Length-N substring from the start of the word. Defaults to #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code suffix]
|
||||
+cell int
|
||||
+cell Length-N substring from the end of the word. Defaults to #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code suffix_]
|
||||
+cell unicode
|
||||
+cell Length-N substring from the start of the word. Defaults to #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code is_alpha]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isalpha()].
|
||||
|
||||
+row
|
||||
+cell #[code is_ascii]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code [any(ord(c) >= 128 for c in word.orth_)]].
|
||||
|
||||
+row
|
||||
+cell #[code is_digit]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isdigit()].
|
||||
|
||||
+row
|
||||
+cell #[code is_lower]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.islower()].
|
||||
|
||||
+row
|
||||
+cell #[code is_title]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.istitle()].
|
||||
|
||||
+row
|
||||
+cell #[code is_punct]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.ispunct()].
|
||||
|
||||
+row
|
||||
+cell #[code is_space]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isspace()].
|
||||
|
||||
+row
|
||||
+cell #[code like_url]
|
||||
+cell bool
|
||||
+cell Does the word resemble a URL?
|
||||
|
||||
+row
|
||||
+cell #[code like_num]
|
||||
+cell bool
|
||||
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
|
||||
|
||||
+row
|
||||
+cell #[code like_email]
|
||||
+cell bool
|
||||
+cell Does the word resemble an email address?
|
||||
|
||||
+row
|
||||
+cell #[code is_oov]
|
||||
+cell bool
|
||||
+cell Is the word out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell #[code is_stop]
|
||||
+cell bool
|
||||
+cell Is the word part of a "stop list"?
|
||||
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell int
|
||||
+cell Language of the parent vocabulary.
|
||||
+row
|
||||
+cell #[code lang_]
|
||||
+cell unicode
|
||||
+cell Language of the parent vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code prob]
|
||||
+cell float
|
||||
+cell Smoothed log probability estimate of token's type.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell A scalar value indicating the positivity or negativity of the token.
|
||||
+row
|
||||
+cell #[code lex_id]
|
||||
+cell int
|
||||
+cell ID of the token's lexical type.
|
||||
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell Verbatim text content.
|
||||
|
||||
+h(2, "init") Lexeme.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code Lexeme] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The parent vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int
|
||||
+cell The orth id of the lexeme.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Lexeme]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "set_flag") Lexeme.set_flag
|
||||
+tag method
|
||||
|
||||
p Change the value of a boolean flag.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to set.
|
||||
|
||||
+row
|
||||
+cell #[code value]
|
||||
+cell bool
|
||||
+cell The new value of the flag.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "check_flag") Lexeme.check_flag
|
||||
+tag method
|
||||
|
||||
p Check the value of a boolean flag.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to query.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell The value of the flag.
|
||||
|
||||
+h(2, "similarity") Lexeme.similarity
|
||||
+tag method
|
||||
|
||||
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code other]
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "vector") Lexeme.vector
|
||||
+tag property
|
||||
|
||||
p A real-valued meaning representation.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A real-valued meaning representation.
|
||||
|
||||
+h(2, "has_vector") Lexeme.has_vector
|
||||
+tag property
|
||||
|
||||
p A boolean value indicating whether a word vector is associated with the object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether a word vector is associated with the object.
|
179
website/docs/api/matcher.jade
Normal file
|
@ -0,0 +1,179 @@
|
|||
//- 💫 DOCS > API > MATCHER
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Match sequences of tokens, based on pattern rules.
|
||||
|
||||
+h(2, "load") Matcher.load
|
||||
+tag classmethod
|
||||
|
||||
p Load the matcher and patterns from a file path.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell #[code Path]
|
||||
+cell Path to a JSON-formatted patterns file.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary that the documents to match over will refer to.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Matcher]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "init") Matcher.__init__
|
||||
+tag method
|
||||
|
||||
p Create the Matcher.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell
|
||||
| The vocabulary object, which must be shared with the documents
|
||||
| the matcher will operate on.
|
||||
|
||||
+row
|
||||
+cell #[code patterns]
|
||||
+cell dict
|
||||
+cell Patterns to add to the matcher.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Matcher]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") Matcher.__call__
|
||||
+tag method
|
||||
|
||||
p Find all token sequences matching the supplied patterns on the Doc.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to match over.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell list
|
||||
+cell
|
||||
| A list of#[code (entity_key, label_id, start, end)] tuples,
|
||||
| describing the matches. A match tuple describes a
|
||||
| #[code span doc[start:end]]. The #[code label_id] and
|
||||
| #[code entity_key] are both integers.
|
||||
|
||||
+h(2, "pipe") Matcher.pipe
|
||||
+tag method
|
||||
|
||||
p Match a stream of documents, yielding them in turn.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell -
|
||||
+cell A stream of documents.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel, if the #[code Matcher] implementation supports
|
||||
| multi-threading.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "add_entity") Matcher.add_entity
|
||||
+tag method
|
||||
|
||||
p Add an entity to the matcher.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code entity_key]
|
||||
+cell unicode / int
|
||||
+cell An ID for the entity.
|
||||
|
||||
+row
|
||||
+cell #[code attrs]
|
||||
+cell -
|
||||
+cell Attributes to associate with the Matcher.
|
||||
|
||||
+row
|
||||
+cell #[code if_exists]
|
||||
+cell unicode
|
||||
+cell
|
||||
| #[code 'raise'], #[code 'ignore'] or #[code 'update']. Controls
|
||||
| what happens if the entity ID already exists. Defaults to
|
||||
| #[code 'raise'].
|
||||
|
||||
+row
|
||||
+cell #[code acceptor]
|
||||
+cell -
|
||||
+cell Callback function to filter matches of the entity.
|
||||
|
||||
+row
|
||||
+cell #[code on_match]
|
||||
+cell -
|
||||
+cell Callback function to act on matches of the entity.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "add_pattern") Matcher.add_pattern
|
||||
+tag method
|
||||
|
||||
p Add a pattern to the matcher.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code entity_key]
|
||||
+cell unicode / int
|
||||
+cell An ID for the entity.
|
||||
|
||||
+row
|
||||
+cell #[code token_specs]
|
||||
+cell -
|
||||
+cell Description of the pattern to be matched.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell unicode / int
|
||||
+cell Label to assign to the matched pattern. Defaults to #[code ""].
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "has_entity") Matcher.has_entity
|
||||
+tag method
|
||||
|
||||
p Check whether the matcher has an entity.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code entity_key]
|
||||
+cell unicode / int
|
||||
+cell The entity key to check.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the matcher has the entity.
|
14
website/docs/api/philosophy.jade
Normal file
|
@ -0,0 +1,14 @@
|
|||
//- 💫 DOCS > API > PHILOSOPHY
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Every product needs to know why it exists. Here's what we're trying to with spaCy and why it's different from other NLP libraries.
|
||||
|
||||
+h(2) 1. No job too big.
|
||||
p Most programs get cheaper to run over time, but NLP programs often get more expensive. The data often grows faster than the hardware improves. For web-scale tasks, Moore's law can't save us — so if we want to read the web, we have to sweat performance.
|
||||
|
||||
+h(2) 2. Take a stand.
|
||||
p Most NLP toolkits position themselves as platforms, rather than libraries. They offer a pluggable architecture, and leave it to the user to arrange the components they offer into a useful system. This is fine for researchers, but for production users, this does too little. Components go out of date quickly, and configuring a good system takes very detailed knowledge. Compatibility problems can be extremely subtle. spaCy is therefore extremely opinionated. The API does not expose any algorithmic details. You're free to configure another pipeline, but the core library eliminates redundancy, and only offers one choice of each component.
|
||||
|
||||
+h(2) 3. Stay current.
|
||||
p There's often significant improvement in NLP models year-on-year. This has been especially true recently, given the success of deep learning models. With spaCy, you should be able to build things you couldn't build yesterday. To deliver on that promise, we need to be giving you the latest stuff.
|
264
website/docs/api/span.jade
Normal file
|
@ -0,0 +1,264 @@
|
|||
//- 💫 DOCS > API > SPAN
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p A slice from a #[code Doc] object.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int
|
||||
+cell The token offset for the start of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int
|
||||
+cell The token offset for the end of the span.
|
||||
|
||||
+row
|
||||
+cell #[code start_char]
|
||||
+cell int
|
||||
+cell The character offset for the end of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end_char]
|
||||
+cell int
|
||||
+cell The character offset for the end of the span.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell int
|
||||
+cell The span's label.
|
||||
|
||||
+row
|
||||
+cell #[code label_]
|
||||
+cell unicode
|
||||
+cell The span's label.
|
||||
|
||||
+row
|
||||
+cell #[code lemma_]
|
||||
+cell unicode
|
||||
+cell The span's lemma.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id]
|
||||
+cell int
|
||||
+cell The integer ID of the named entity the token is an instance of.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id_]
|
||||
+cell unicode
|
||||
+cell The string ID of the named entity the token is an instance of.
|
||||
|
||||
+h(2, "init") Span.__init__
|
||||
+tag method
|
||||
|
||||
p Create a Span object from the #[code slice doc[start : end]].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int
|
||||
+cell The index of the first token of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int
|
||||
+cell The index of the first token after the span.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell int
|
||||
+cell A label to attach to the span, e.g. for named entities.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A meaning representation of the span.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Span]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "getitem") Span.__getitem__
|
||||
+tag method
|
||||
|
||||
p Get a #[code Token] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token within the span.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The token at #[code span[i]].
|
||||
|
||||
p Get a #[code Span] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_end]
|
||||
+cell tuple
|
||||
+cell The slice of the span to get.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Span]
|
||||
+cell The span at #[code span[start : end]].
|
||||
|
||||
+h(2, "iter") Span.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over #[code Token] objects.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A #[code Token] object.
|
||||
|
||||
+h(2, "len") Span.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of tokens in the span.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of tokens in the span.
|
||||
|
||||
+h(2, "similarity") Span.similarity
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Make a semantic similarity estimate. The default estimate is cosine
|
||||
| similarity using an average of word vectors.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code other]
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "merge") Span.merge
|
||||
+tag method
|
||||
|
||||
p Retokenize the document, such that the span is merged into a single token.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **attributes]
|
||||
+cell -
|
||||
+cell
|
||||
| Attributes to assign to the merged token. By default, attributes
|
||||
| are inherited from the syntactic root token of the span.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The newly merged token.
|
||||
|
||||
+h(2, "text") Span.text
|
||||
+tag property
|
||||
|
||||
p A unicode representation of the span text.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell unicode
|
||||
+cell The original verbatim text of the span.
|
||||
|
||||
+h(2, "text_with_ws") Span.text_with_ws
|
||||
+tag property
|
||||
|
||||
p
|
||||
| The text content of the span with a trailing whitespace character if the
|
||||
| last token has one.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell unicode
|
||||
+cell The text content of the span (with trailing whitespace).
|
||||
|
||||
+h(2, "sent") Span.sent
|
||||
+tag property
|
||||
|
||||
p The sentence span that this span is a part of.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Span]
|
||||
+cell The sentence this is part of.
|
||||
|
||||
+h(2, "root") Span.root
|
||||
+tag property
|
||||
|
||||
p
|
||||
| The token within the span that's highest in the parse tree. If there's a
|
||||
| tie, the earlist is prefered.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The root token.
|
||||
|
||||
+h(2, "lefts") Span.lefts
|
||||
+tag property
|
||||
|
||||
p Tokens that are to the left of the span, whose head is within the span.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A left-child of a token of the span.
|
||||
|
||||
+h(2, "rights") Span.rights
|
||||
+tag property
|
||||
|
||||
p Tokens that are to the right of the span, whose head is within the span.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A right-child of a token of the span.
|
||||
|
||||
+h(2, "subtree") Span.subtree
|
||||
+tag property
|
||||
|
||||
p Tokens that descend from tokens in the span, but fall outside it.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A descendant of a token within the span.
|
107
website/docs/api/stringstore.jade
Normal file
|
@ -0,0 +1,107 @@
|
|||
//- 💫 DOCS > API > STRINGSTORE
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Map strings to and from integer IDs.
|
||||
|
||||
+h(2, "init") StringStore.__init__
|
||||
+tag method
|
||||
|
||||
p Create the #[code StringStore].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell -
|
||||
+cell A sequence of unicode strings to add to the store.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code StringStore]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") StringStore.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of strings in the store.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of strings in the store.
|
||||
|
||||
+h(2, "getitem") StringStore.__getitem__
|
||||
+tag method
|
||||
|
||||
p Retrieve a string from a given integer ID, or vice versa.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string_or_id]
|
||||
+cell bytes / unicode / int
|
||||
+cell The value to encode.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell unicode / int
|
||||
+cell The value to retrieved.
|
||||
|
||||
+h(2, "contains") StringStore.__contains__
|
||||
+tag method
|
||||
|
||||
p Check whether a string is in the store.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to check.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the store contains the string.
|
||||
|
||||
+h(2, "iter") StringStore.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over the strings in the store, in order.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell unicode
|
||||
+cell A string in the store.
|
||||
|
||||
+h(2, "dump") StringStore.dump
|
||||
+tag method
|
||||
|
||||
p Save the strings to a JSON file.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code file]
|
||||
+cell buffer
|
||||
+cell The file to save the strings.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "load") StringStore.load
|
||||
+tag method
|
||||
|
||||
p Load the strings from a JSON file.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code file]
|
||||
+cell buffer
|
||||
+cell The file from which to load the strings.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
117
website/docs/api/tagger.jade
Normal file
|
@ -0,0 +1,117 @@
|
|||
//- 💫 DOCS > API > TAGGER
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p Annotate part-of-speech tags on #[code Doc] objects.
|
||||
|
||||
+h(2, "load") Tagger.load
|
||||
+tag classmethod
|
||||
|
||||
p Load the statistical model from the supplied path.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell #[code Path]
|
||||
+cell The path to load from.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared by the documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code require]
|
||||
+cell bool
|
||||
+cell Whether to raise an error if the files are not found.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Tagger]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "init") Tagger.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code Tagger].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocabulary. Must be shared with documents to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell #[thinc.linear.AveragedPerceptron]
|
||||
+cell The statistical model.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Tagger]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") Tagger.__call__
|
||||
+tag method
|
||||
|
||||
p Apply the tagger, setting the POS tags onto the #[code Doc] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The tokens to be tagged.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "pipe") Tagger.pipe
|
||||
+tag method
|
||||
|
||||
p Tag a stream of documents.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code stream]
|
||||
+cell -
|
||||
+cell The sequence of documents to tag.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel.
|
||||
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "update") Tagger.update
|
||||
+tag method
|
||||
|
||||
p Update the statistical model, with tags supplied for the given document.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The example document for the update.
|
||||
|
||||
+row
|
||||
+cell #[code gold]
|
||||
+cell #[code GoldParse]
|
||||
+cell Manager for the gold-standard tags.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell Number of tags predicted correctly.
|
460
website/docs/api/token.jade
Normal file
|
@ -0,0 +1,460 @@
|
|||
//- 💫 DOCS > API > TOKEN
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p An individual token — i.e. a word, punctuation symbol, whitespace, etc.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocab object of the parent #[code Doc].
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token within the parent document.
|
||||
+row
|
||||
+cell #[code ent_type]
|
||||
+cell int
|
||||
+cell Named entity type.
|
||||
+row
|
||||
+cell #[code ent_type_]
|
||||
+cell unicode
|
||||
+cell Named entity type.
|
||||
|
||||
+row
|
||||
+cell #[code ent_iob]
|
||||
+cell int
|
||||
+cell
|
||||
| IOB code of named entity tag.
|
||||
| #[code 1="I", 2="O", B="B"]. #[code 0] means no tag is assigned.
|
||||
|
||||
+row
|
||||
+cell #[code ent_iob_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| IOB code of named entity tag. #[code "B"]
|
||||
| means the token begins an entity, #[code "I"] means it inside an
|
||||
| entity, #[code "O"] means it is outside an entity, and
|
||||
| #[code ""] means no entity tag is set.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id]
|
||||
+cell int
|
||||
+cell ID of the entity the token is an instance of, if any.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id_]
|
||||
+cell unicode
|
||||
+cell ID of the entity the token is an instance of, if any.
|
||||
|
||||
+row
|
||||
+cell #[code lemma]
|
||||
+cell int
|
||||
+cell
|
||||
| Base form of the word, with no inflectional suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code lemma_]
|
||||
+cell unicode
|
||||
+cell Base form of the word, with no inflectional suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code lower]
|
||||
+cell int
|
||||
+cell Lower-case form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code lower_]
|
||||
+cell unicode
|
||||
+cell Lower-case form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell int
|
||||
+cell Transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code shape_]
|
||||
+cell unicode
|
||||
+cell A transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code prefix]
|
||||
+cell int
|
||||
+cell Integer ID of a length-N substring from the start of the
|
||||
| word. Defaults to #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code prefix_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| A length-N substring from the start of the word. Defaults to
|
||||
| #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code suffix]
|
||||
+cell int
|
||||
+cell
|
||||
| Length-N substring from the end of the word. Defaults to #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code suffix_]
|
||||
+cell unicode
|
||||
+cell Length-N substring from the start of the word. Defaults to #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code is_alpha]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isalpha()].
|
||||
|
||||
+row
|
||||
+cell #[code is_ascii]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code [any(ord(c) >= 128 for c in word.orth_)]].
|
||||
|
||||
+row
|
||||
+cell #[code is_digit]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isdigit()].
|
||||
|
||||
+row
|
||||
+cell #[code is_lower]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.islower()].
|
||||
|
||||
+row
|
||||
+cell #[code is_title]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.istitle()].
|
||||
|
||||
+row
|
||||
+cell #[code is_punct]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.ispunct()].
|
||||
|
||||
+row
|
||||
+cell #[code is_space]
|
||||
+cell bool
|
||||
+cell Equivalent to #[code word.orth_.isspace()].
|
||||
|
||||
+row
|
||||
+cell #[code like_url]
|
||||
+cell bool
|
||||
+cell Does the word resemble a URL?
|
||||
|
||||
+row
|
||||
+cell #[code like_num]
|
||||
+cell bool
|
||||
+cell Does the word represent a number? e.g. “10.9”, “10”, “ten”, etc.
|
||||
|
||||
+row
|
||||
+cell #[code like_email]
|
||||
+cell bool
|
||||
+cell Does the word resemble an email address?
|
||||
|
||||
+row
|
||||
+cell #[code is_oov]
|
||||
+cell bool
|
||||
+cell Is the word out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell #[code is_stop]
|
||||
+cell bool
|
||||
+cell Is the word part of a "stop list"?
|
||||
|
||||
+row
|
||||
+cell #[code pos]
|
||||
+cell int
|
||||
+cell Coarse-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code pos_]
|
||||
+cell unicode
|
||||
+cell Coarse-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code tag]
|
||||
+cell int
|
||||
+cell Fine-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code tag_]
|
||||
+cell unicode
|
||||
+cell Fine-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code dep]
|
||||
+cell int
|
||||
+cell Syntactic dependency relation.
|
||||
|
||||
+row
|
||||
+cell #[code dep_]
|
||||
+cell unicode
|
||||
+cell Syntactic dependency relation.
|
||||
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell int
|
||||
+cell Language of the parent document's vocabulary.
|
||||
+row
|
||||
+cell #[code lang_]
|
||||
+cell unicode
|
||||
+cell Language of the parent document's vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code prob]
|
||||
+cell float
|
||||
+cell Smoothed log probability estimate of token's type.
|
||||
|
||||
+row
|
||||
+cell #[code idx]
|
||||
+cell int
|
||||
+cell The character offset of the token within the parent document.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell A scalar value indicating the positivity or negativity of the token.
|
||||
|
||||
+row
|
||||
+cell #[code lex_id]
|
||||
+cell int
|
||||
+cell ID of the token's lexical type.
|
||||
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell Verbatim text content.
|
||||
+row
|
||||
+cell #[code text_with_ws]
|
||||
+cell unicode
|
||||
+cell Text content, with trailing space character if present.
|
||||
|
||||
+row
|
||||
+cell #[code whitespace]
|
||||
+cell int
|
||||
+cell Trailing space character if present.
|
||||
+row
|
||||
+cell #[code whitespace_]
|
||||
+cell unicode
|
||||
+cell Trailing space character if present.
|
||||
|
||||
|
||||
+h(2, "init") Token.__init__
|
||||
+tag method
|
||||
|
||||
p Construct a #[code Token] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A storage container for lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code offset]
|
||||
+cell int
|
||||
+cell The index of the token within the document.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") Token.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of unicode characters in the token.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of unicode characters in the token.
|
||||
|
||||
|
||||
+h(2, "check_flag") Token.check_flag
|
||||
+tag method
|
||||
|
||||
p Check the value of a boolean flag.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to check.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the flag is set.
|
||||
|
||||
+h(2, "nbor") Token.nbor
|
||||
+tag method
|
||||
|
||||
p Get a neighboring token.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The relative position of the token to get. Defaults to #[code 1].
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The token at position #[code self.doc[self.i+i]]
|
||||
|
||||
+h(2, "similarity") Token.similarity
|
||||
+tag method
|
||||
|
||||
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell other
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "is_ancestor") Token.is_ancestor
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Check whether this token is a parent, grandparent, etc. of another
|
||||
| in the dependency tree.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell descendant
|
||||
+cell #[code Token]
|
||||
+cell Another token.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether this token is the ancestor of the descendant.
|
||||
|
||||
|
||||
+h(2, "vector") Token.vector
|
||||
+tag property
|
||||
|
||||
p A real-valued meaning representation.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the token's semantics.
|
||||
|
||||
+h(2, "has_vector") Token.has_vector
|
||||
+tag property
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the token has a vector data attached.
|
||||
|
||||
+h(2, "head") Token.head
|
||||
+tag property
|
||||
|
||||
p The syntactic parent, or "governor", of this token.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The head.
|
||||
|
||||
+h(2, "conjuncts") Token.conjuncts
|
||||
+tag property
|
||||
|
||||
p A sequence of coordinated tokens, including the token itself.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A coordinated token.
|
||||
|
||||
+h(2, "children") Token.children
|
||||
+tag property
|
||||
|
||||
p A sequence of the token's immediate syntactic children.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A child token such that #[code child.head==self].
|
||||
|
||||
+h(2, "subtree") Token.subtree
|
||||
+tag property
|
||||
|
||||
p A sequence of all the token's syntactic descendents.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell A descendant token such that #[code self.is_ancestor(descendant)].
|
||||
|
||||
+h(2, "left_edge") Token.left_edge
|
||||
+tag property
|
||||
|
||||
p The leftmost token of this token's syntactic descendants.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The first token such that #[code self.is_ancestor(token)].
|
||||
|
||||
+h(2, "right_edge") Token.right_edge
|
||||
+tag property
|
||||
|
||||
p The rightmost token of this token's syntactic descendents.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Token]
|
||||
+cell The last token such that #[code self.is_ancestor(token)].
|
||||
|
||||
+h(2, "ancestors") Token.ancestors
|
||||
+tag property
|
||||
|
||||
p The rightmost token of this token's syntactic descendants.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Token]
|
||||
+cell
|
||||
| A sequence of ancestor tokens such that
|
||||
| #[code ancestor.is_ancestor(self)].
|
278
website/docs/api/vocab.jade
Normal file
|
@ -0,0 +1,278 @@
|
|||
//- 💫 DOCS > API > VOCAB
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p
|
||||
| A look-up table that allows you to access #[code Lexeme] objects. The
|
||||
| #[code Vocab] instance also provides access to the #[code StringStore],
|
||||
| and owns underlying C-data that is shared between #[code Doc] objects.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell #[code StringStore]
|
||||
+cell A table managing the string-to-int mapping.
|
||||
|
||||
+row
|
||||
+cell #[code vectors_length]
|
||||
+cell int
|
||||
+cell The dimensionality of the word vectors, if present.
|
||||
|
||||
+h(2, "load") Vocab.load
|
||||
+tag classmethod
|
||||
|
||||
p Load the vocabulary from a path.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell #[code Path]
|
||||
+cell The path to load from.
|
||||
|
||||
+row
|
||||
+cell #[code lex_attr_getters]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping attribute IDs to functions to compute them.
|
||||
| Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code lemmatizer]
|
||||
+cell -
|
||||
+cell A lemmatizer. Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code tag_map]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping fine-grained tags to coarse-grained
|
||||
| parts-of-speech, and optionally morphological attributes.
|
||||
|
||||
+row
|
||||
+cell #[code oov_prob]
|
||||
+cell float
|
||||
+cell The default probability for out-of-vocabulary words.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Vocab]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "init") Vocab.__init__
|
||||
+tag method
|
||||
|
||||
p Create the vocabulary.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex_attr_getters]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping attribute IDs to functions to compute them.
|
||||
| Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code lemmatizer]
|
||||
+cell -
|
||||
+cell A lemmatizer. Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code tag_map]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping fine-grained tags to coarse-grained
|
||||
| parts-of-speech, and optionally morphological attributes.
|
||||
|
||||
+row
|
||||
+cell #[code oov_prob]
|
||||
+cell float
|
||||
+cell The default probability for out-of-vocabulary words.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Vocab]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") Vocab.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of lexemes in the vocabulary.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The number of lexems in the vocabulary.
|
||||
|
||||
+h(2, "getitem") Vocab.__getitem__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retrieve a lexeme, given an int ID or a unicode string. If a previously
|
||||
| unseen unicode string is given, a new lexeme is created and stored.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code id_or_string]
|
||||
+cell int / unicode
|
||||
+cell The integer ID of a word, or its unicode string.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code Lexeme]
|
||||
+cell The lexeme indicated by the given ID.
|
||||
|
||||
+h(2, "iter") Span.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over the lexemes in the vocabulary.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+footrow
|
||||
+cell yield
|
||||
+cell #[code Lexeme]
|
||||
+cell An entry in the vocabulary.
|
||||
|
||||
+h(2, "contains") Vocab.__contains__
|
||||
+tag method
|
||||
|
||||
p Check whether the string has an entry in the vocabulary.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The ID string.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell bool
|
||||
+cell Whether the string has an entry in the vocabulary.
|
||||
|
||||
+h(2, "resize_vectors") Vocab.resize_vectors
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Set #[code vectors_length] to a new size, and allocate more memory for
|
||||
| the #[code Lexeme] vectors if necessary. The memory will be zeroed.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code new_size]
|
||||
+cell int
|
||||
+cell The new size of the vectors.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "add_flag") Vocab.add_flag
|
||||
+tag method
|
||||
|
||||
p Set a new boolean flag to words in the vocabulary.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_getter]
|
||||
+cell dict
|
||||
+cell A function #[code f(unicode) -> bool], to get the flag value.
|
||||
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell
|
||||
| An integer between 1 and 63 (inclusive), specifying the bit at
|
||||
| which the flag will be stored. If #[code -1], the lowest
|
||||
| available bit will be chosen.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The integer ID by which the flag value can be checked.
|
||||
|
||||
+h(2, "dump") Vocab.dump
|
||||
+tag method
|
||||
|
||||
p Save the lexemes binary data to the given location.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code loc]
|
||||
+cell #[code Path]
|
||||
+cell The path to load from.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "load_lexemes") Vocab.load_lexemes
|
||||
+tag method
|
||||
|
||||
p
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code loc]
|
||||
+cell unicode
|
||||
+cell Path to load the lexemes.bin file from.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "dump_vectors") Vocab.dump_vectors
|
||||
+tag method
|
||||
|
||||
p Save the word vectors to a binary file.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code loc]
|
||||
+cell #[code Path]
|
||||
+cell The path to save to.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell #[code None]
|
||||
+cell -
|
||||
|
||||
+h(2, "load_vectors") Vocab.load_vectors
|
||||
+tag method
|
||||
|
||||
p Load vectors from a text-based file.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code file_]
|
||||
+cell buffer
|
||||
+cell
|
||||
| The file to read from. Entries should be separated by newlines,
|
||||
| and each entry should be whitespace delimited. The first value
|
||||
| of the entry should be the word string, and subsequent entries
|
||||
| should be the values of the vector.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The length of the vectors loaded.
|
||||
|
||||
+h(2, "load_vectors_from_bin_loc") Vocab.load_vectors_from_bin_loc
|
||||
+tag method
|
||||
|
||||
p Load vectors from the location of a binary file.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code loc]
|
||||
+cell unicode
|
||||
+cell The path of the binary file to load from.
|
||||
|
||||
+footrow
|
||||
+cell return
|
||||
+cell int
|
||||
+cell The length of the vectors loaded.
|
|
@ -1,26 +1,27 @@
|
|||
//- ----------------------------------
|
||||
//- 💫 DOCS
|
||||
//- ----------------------------------
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
- var link_bool = 'http://docs.python.org/library/functions.html#bool'
|
||||
- var link_int = 'http://docs.python.org/library/functions.html#int'
|
||||
- var link_unicode = 'http://docs.python.org/library/functions.html#unicode'
|
||||
p=lorem_short
|
||||
|
||||
include _quickstart-install
|
||||
include _quickstart-examples
|
||||
+aside("Help us improve the docs")
|
||||
| Did you spot a mistake or come across explanations that
|
||||
| are unclear? You can find a "Suggest edits" button at the
|
||||
| bottom at each page that points you to the source.
|
||||
| We always appreciate
|
||||
| #[+a(gh("spaCy") + "/pulls") pull requests].#[br]#[br]
|
||||
| Have you built something cool with spaCy, or did you
|
||||
| write a tutorial to help others use spaCy?
|
||||
| #[a(href="mailto:#{EMAIL}") Let us know!]
|
||||
|
||||
+h(2, "api") API
|
||||
+grid
|
||||
each details, title in sections
|
||||
+card(false, false)
|
||||
a(href=details.url)
|
||||
+svg("graphics", details.svg, 300, 150).u-color-theme
|
||||
|
||||
include _api-language
|
||||
include _api-doc
|
||||
include _api-token
|
||||
include _api-span
|
||||
include _api-lexeme
|
||||
include _api-vocab
|
||||
include _api-stringstore
|
||||
include _api-matcher
|
||||
a(href=details.url)
|
||||
+h(3)=title
|
||||
|
||||
include _annotation-specs
|
||||
include _tutorials
|
||||
p=details.description
|
||||
+button(details.url, true, "primary")(target="_self") View
|
||||
|
|
|
@ -1,49 +0,0 @@
|
|||
{
|
||||
"training": {
|
||||
"title": "Training the tagger, entity recogniser and parser",
|
||||
"date": "2016-10-17",
|
||||
"description": "This tutorial describes how to train new statistical models for spaCy's part-of-speech tagger, named entity recognizer and dependency parser."
|
||||
},
|
||||
|
||||
"custom-pipelines": {
|
||||
"title": "Custom Pipelines",
|
||||
"date": "2016-10-17",
|
||||
"description": "spaCy 1.0 introduces dynamic pipelines, so that you can easily create custom workflows. This tutorial describes the feature, and introduces experimental support for dynamic Token attributes. The tutorial also discusses how we can make it easier to use bidirectional LSTMs with spaCy."
|
||||
},
|
||||
|
||||
"rule-based-matcher": {
|
||||
"title": "Rule-based Matcher",
|
||||
"date": "2016-10-17",
|
||||
"description": "spaCy features a rule-matching engine that operates over tokens. The rules can refer to token annotations and flags, and matches support callbacks to accept, modify and/or act on the match. The rule matcher also allows you to associate patterns with entity IDs, to allow some basic entity linking or disambiguation."
|
||||
},
|
||||
|
||||
"load-new-word-vectors": {
|
||||
"title": "Load new word vectors",
|
||||
"date": "2015-09-24",
|
||||
"description": "Word vectors allow simple similarity queries, and drive many NLP applications. This tutorial explains how to load custom word vectors into spaCy, to make use of task or data-specific representations."
|
||||
},
|
||||
|
||||
"byo-annotations": {
|
||||
"title": "Using Pre-existing Tokenization, Tags, and Other Annotations",
|
||||
"date": "2016-04-15",
|
||||
"description": "spaCy assumes by default that your data is raw text. However, sometimes your data is partially annotated, e.g. with pre-existing tokenization, part-of-speech tags, etc. This tutorial explains how to use these annotations in spaCy."
|
||||
},
|
||||
|
||||
"mark-adverbs": {
|
||||
"title": "Mark all adverbs, particularly for verbs of speech",
|
||||
"date": "2015-08-18",
|
||||
"description": "Let's say you're developing a proofreading tool, or possibly an IDE for writers. You're convinced by Stephen King's advice that adverbs are not your friend so you want to highlight all adverbs."
|
||||
},
|
||||
|
||||
"syntax-search": {
|
||||
"title": "Search Reddit for comments about Google doing something",
|
||||
"date": "2015-08-18",
|
||||
"description": "Example use of the spaCy NLP tools for data exploration. Here we will look for Reddit comments that describe Google doing something, i.e. discuss the company's actions. This is difficult, because other senses of \"Google\" now dominate usage of the word in conversation, particularly references to using Google products."
|
||||
},
|
||||
|
||||
"twitter-filter": {
|
||||
"title": "Finding Relevant Tweets",
|
||||
"date": "2015-08-18",
|
||||
"description": "In this tutorial, we will use word vectors to search for tweets about Jeb Bush. We'll do this by building up two word lists: one that represents the type of meanings in the Jeb Bush tweets, and another to help screen out irrelevant tweets that mention the common, ambiguous word \"bush\"."
|
||||
}
|
||||
}
|
|
@ -1,117 +0,0 @@
|
|||
include ../../_includes/_mixins
|
||||
|
||||
p.u-text-large spaCy assumes by default that your data is raw text. However, sometimes your data is partially annotated, e.g. with pre-existing tokenization, part-of-speech tags, etc. This tutorial explains how to use these annotations in spaCy.
|
||||
|
||||
+h(2, "quick-reference") Quick Reference
|
||||
|
||||
+table(['Description', 'Usage'], 'code')
|
||||
+row
|
||||
+cell Use pre-existing tokenization
|
||||
+cell #[code.lang-python doc = Doc(nlp.vocab, [('A', True), ('token', False), ('!', False)])]
|
||||
+row
|
||||
+cell Use pre-existing tokenization (deprecated)
|
||||
+cell #[code.lang-python doc = nlp.tokenizer.tokens_from_list([u'A', u'token', u'!'])]
|
||||
|
||||
+row
|
||||
+cell Assign pre-existing tags
|
||||
+cell #[code.lang-python nlp.tagger.tag_from_strings(doc, ['DT', 'NN'])]
|
||||
+row
|
||||
+cell Assign named entity annotations from an array
|
||||
+cell #[code.lang-python doc.from_array([ENT_TYPE, ENT_IOB], values)]
|
||||
+row
|
||||
+cell Assign dependency parse annotations from an array
|
||||
+cell #[code.lang-python doc.from_array([HEAD, DEP], values)]
|
||||
|
||||
+h(2, "examples") Examples
|
||||
|
||||
+code('python', 'Tokenization').
|
||||
import spacy
|
||||
|
||||
nlp = spacy.load('en')
|
||||
|
||||
tokens = [u'A', u'list', u'of', u'strings', u'.']
|
||||
|
||||
doc = nlp.tokenizer.tokens_from_list(tokens)
|
||||
|
||||
assert len(doc) == len(tokens)
|
||||
# With this method, we don't get to specify how the corresponding string
|
||||
# would be spaced, so we have to assume a space before every token.
|
||||
assert doc.text == u'A list of strings .'
|
||||
|
||||
+code('python', 'Tokenization').
|
||||
import spacy
|
||||
from spacy.tokens import Doc
|
||||
|
||||
nlp = spacy.load('en')
|
||||
|
||||
tokens = [u'A', u'list', u'of', u'strings', u'.']
|
||||
has_space = [True, True, True, False, False]
|
||||
|
||||
doc = Doc(nlp.vocab, orth_and_spaces=zip(tokens, has_space))
|
||||
|
||||
assert len(doc) == len(tokens)
|
||||
# Spacing is correct, given by boolean values above.
|
||||
assert doc.text == u'A list of strings.'
|
||||
# Here's how it would look with different boolean values.
|
||||
tokens = [u'A', u'list', u'of', u'strings', u'.']
|
||||
has_space = [False, True, True, True, False]
|
||||
doc = Doc(nlp.vocab, orth_and_spaces=zip(tokens, has_space))
|
||||
assert doc.text == u'Alist of strings .'
|
||||
|
||||
+code('python', 'POS Tags').
|
||||
import spacy
|
||||
|
||||
nlp = spacy.load('en')
|
||||
|
||||
# Tokenize a string into a Doc, but don't apply the whole pipeline ---
|
||||
# that is, don't predict the part-of-speech tags, syntactic parse, named
|
||||
# entities, etc.
|
||||
doc = nlp.tokenizer(u'A unicode string, untokenized.')
|
||||
nlp.tagger.tag_from_strings([u'DT', u'JJ', u'NN', u',', u'VBN', u'.'])
|
||||
# Now predict dependency parse and named entities. Note that if you assign
|
||||
# tags in a way that's very unlike the behaviour of the POS tagger model,
|
||||
# the subsequent models may perform worse. These models use the POS tags
|
||||
# as features, so if you give them unexpected tags, you may be giving them
|
||||
# run-time conditions that don't resemble the training data.
|
||||
nlp.parser(doc)
|
||||
nlp.entity(doc)
|
||||
|
||||
+code('python', 'Dependency Parse').
|
||||
import spacy
|
||||
from spacy.attrs import HEAD, DEP
|
||||
from spacy.symbols import det, nmod, root, punct
|
||||
from numpy import ndarray
|
||||
|
||||
nlp = spacy.load('en')
|
||||
|
||||
# Get the Doc object, and apply the pipeline except the dependency parser
|
||||
doc = nlp(u'A unicode string.', parse=False)
|
||||
|
||||
columns = [HEAD, DEP]
|
||||
values = ndarray(shape=(len(columns), len(doc)), dtype='int32')
|
||||
# Syntactic parse specified as head offsets
|
||||
heads = [2, 1, 0, -1]
|
||||
# Integer IDs for the dependency labels. See the parse in the displaCy
|
||||
# demo at spacy.io/demos/displacy
|
||||
labels = [det, nmod, root, punct]
|
||||
values[0] = heads
|
||||
values[1] = labels
|
||||
doc.from_array(columns, values)
|
||||
|
||||
+code('python', 'Named Entities').
|
||||
import spacy
|
||||
from spacy.attrs import ENT_TYPE, ENT_IOB
|
||||
from spacy.symbols import PERSON, ORG
|
||||
from numpy import ndarray
|
||||
|
||||
nlp = spacy.load('en')
|
||||
|
||||
# Get the Doc object, and apply the pipeline except the entity recognizer
|
||||
doc = nlp(u'My name is Matt.', entity=False)
|
||||
|
||||
columns = [ENT_TYPE, ENT_IOB]
|
||||
values = ndarray(shape=(len(columns), len(doc)), dtype='int32')
|
||||
# IOB values are 0=missing, 1=I, 2=O, 3=B
|
||||
values[0] = [1, 1, 1, 3, 1]
|
||||
values[1] = [0, 0, 0, PERSON, 0]
|
||||
doc.from_array(columns, values)
|
|
@ -1,89 +0,0 @@
|
|||
include ../../_includes/_mixins
|
||||
|
||||
p.u-text-large spaCy 1.0 introduces dynamic pipelines, so that you can easily create custom workflows. This tutorial describes the feature, and introduces experimental support for dynamic Token attributes. The tutorial also discusses how we can make it easier to use bidirectional LSTMs with spaCy.
|
||||
|
||||
p Best practices in NLP are now already pretty different from when I first designed spaCy, even though it's only been two years. The spaCy 1.0 release has a new custom pipeline API to help you use the new hotness.
|
||||
|
||||
p Before 1.0, spaCy's pipeline was hard-coded. When you called #[code nlp(text)], spaCy would apply the tokenizer, tagger, parser and named entity recognizer, in sequence. This design assumed that users should subclass the #[code Language] class to customize the pipeline. However, the #[code Language] class has gotten more complicated, and subclassing it now feels like a relatively "serious" thing to do. It feels hard.
|
||||
|
||||
p In spaCy 1.0, the order of operations is no longer hard-coded. Instead, the new #[code Language.__call__] does something like this:
|
||||
|
||||
+code.
|
||||
def __call__(self, text):
|
||||
doc = self.make_doc(text)
|
||||
for process in self.pipeline:
|
||||
process(doc)
|
||||
return doc
|
||||
|
||||
p The pipeline can consist of any sequence of callables. They should accept a Doc object, and modify it in-place. You can install the pipeline by passing a callable to the #[code spacy.load()] function, or the constructor of the #[code Language] class:
|
||||
|
||||
+code("python", "Basic Example").
|
||||
import spacy
|
||||
|
||||
def arbitrary_fixup_rules(doc):
|
||||
for token in doc:
|
||||
if token.text == u'bill' and token.tag_ == u'NNP':
|
||||
token.tag_ = u'NN'
|
||||
|
||||
def custom_pipeline(nlp):
|
||||
return (nlp.tagger, arbitrary_fixup_rules, nlp.parser, nlp.entity)
|
||||
|
||||
nlp = spacy.load('en', pipeline=custom_pipeline)
|
||||
|
||||
|
||||
p The value passed to the #[code pipeline] keyword should be a callable that takes the #[code Language] instance (i.e. #[code nlp]) as an argument. The callable should return a sequence of callables. Each member of the sequence should take a Doc object as its sole positional argument.
|
||||
|
||||
+h(2, "experimental-lstm") Experimental: Bidirectional LSTM with custom pipeline
|
||||
|
||||
p Probably the most important new technology in Natural Language Processing is the rise of bidirectional LSTM models. These models associate each word with a #[em context-specific] vector. You can also neatly include character level features, so that all relevant aspects of the word are captured. This is pretty much the best way to do feature extraction in NLP at the moment, for almost any task.
|
||||
|
||||
p spaCy doesn't feature any pre-trained LSTM models yet, and the details of this API are still being refined. But, because BiLSTMs are proving so important, I wanted to get the proposal up.
|
||||
|
||||
p Version 1.0 adds an attribute #[code tensor] to the #[code Doc] object. The #[code tensor] attribute expects a numpy ndarray object, and is publicly writeable. This gives you a place to store the output of the LSTM (or some other real-valued output you want to keep).
|
||||
|
||||
+code("python", "Basic Example").
|
||||
import spacy
|
||||
from spacy.symbols import LEMMA, TAG
|
||||
|
||||
class LSTMModel(object):
|
||||
def __init__(self, **kwargs):
|
||||
# Load your weights etc
|
||||
pass
|
||||
|
||||
def __call__(self, doc):
|
||||
features = doc.to_array([LEMMA, TAG])
|
||||
doc.tensor = lstm(features)
|
||||
|
||||
def custom_pipeline(nlp):
|
||||
return (nlp.tagger, LSTMModel(), nlp.parser, nlp.entity)
|
||||
|
||||
nlp = spacy.load('en', pipeline=custom_pipeline)
|
||||
|
||||
p Now, so far we only have the LSTM output as an attribute of the #[code Doc] object. We'd like to be able to do stuff like #[code doc[0].vector], and have that get us the LSTM vector for the token. We can do #[code doc.tensor[doc[0].i]], but I'd like a little more sugar. The details of this part are still experimental — in particular, don't take the names too seriously at this point.
|
||||
|
||||
p A relevant implementation detail of spaCy is that the #[code Token] objects are thin proxies, that can be created and destroyed as convenient. The #[code Doc] object owns all the data. This means that we can't simply assign a vector to the #[code Token] objects. Instead, we'll add a hook that gets called by #[code token.vector]. We'll also add space for hooks in other places we might need them.
|
||||
|
||||
+aside("Why don't Token and Span own their data?") Well, we want the sequence of tokens to be stored together in memory. That means we really want to have a sequence owned by the #[code Doc] object. But if we have that, then we would have to copy data to the #[code Token] objects. This gets super messy, especially if the tokens should be able to modify their state. The Token therefore proxies to the Doc, to maintain a single source of truth.
|
||||
|
||||
p Here's what that looks like:
|
||||
|
||||
+code.
|
||||
def install_vector_hook(doc):
|
||||
doc.getters_for_token['similarity'] = lambda token: doc.tensor[token.i]
|
||||
|
||||
def custom_pipeline(nlp):
|
||||
return (nlp.tagger, LSTMModel(), install_vector_hook, nlp.parser, nlp.entity)
|
||||
|
||||
nlp = spacy.load('en', pipeline=custom_pipeline)
|
||||
|
||||
p The #[code install_vector_hook] function will run after the LSTM. It modifies the #[code Doc], setting a value in a dictionary that the #[code Token] knows to look for. When you access the #[code token.vector] property, the token checks whether there's a special-case listener for that attribute:
|
||||
|
||||
+code.
|
||||
@property
|
||||
def vector(self):
|
||||
if 'vector' in self.doc.getters_for_tokens:
|
||||
return self.doc.getters_for_tokens['vector'](self)
|
||||
else:
|
||||
return self.c.lex.vector
|
||||
|
||||
p As I said — don't take the names too seriously at this point. But do test out the feature — it should be all working. You should be able to customize he behaviour of a lot of attributes this way already. Possibly we should just make it everything on the Token and the Span, but I think it might not be nice to have so much uncertainty about how some values are being calculated. There's such a thing as being too dynamic.
|