mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
Fix formatting and wording
This commit is contained in:
parent
f803da609f
commit
14148cd147
|
@ -27,8 +27,6 @@ The docs can always use another example or more detail, and they should always b
|
|||
|
||||
While all page content lives in the `.jade` files, article meta (page titles, sidebars etc.) is stored as JSON. Each folder contains a `_data.json` with all required meta for its files.
|
||||
|
||||
For simplicity, all sites linked in the [tutorials](https://spacy.io/docs/usage/tutorials) and [showcase](https://spacy.io/docs/usage/showcase) are also stored as JSON. So in order to edit those pages, there's no need to dig into the Jade files – simply edit the [`_data.json`](docs/usage/_data.json).
|
||||
|
||||
### Markup language and conventions
|
||||
|
||||
Jade/Pug is a whitespace-sensitive markup language that compiles to HTML. Indentation is used to nest elements, and for template logic, like `if`/`else` or `for`, mainly used to iterate over objects and arrays in the meta data. It also allows inline JavaScript expressions.
|
||||
|
|
|
@ -4,7 +4,7 @@ p
|
|||
| The individual components #[strong expose variables] that can be imported
|
||||
| within a language module, and added to the language's #[code Defaults].
|
||||
| Some components, like the punctuation rules, usually don't need much
|
||||
| customisation and can simply be imported from the global rules. Others,
|
||||
| customisation and can be imported from the global rules. Others,
|
||||
| like the tokenizer and norm exceptions, are very specific and will make
|
||||
| a big difference to spaCy's performance on the particular language and
|
||||
| training a language model.
|
||||
|
|
|
@ -39,7 +39,7 @@ p
|
|||
| this. The above error mostly occurs when doing a system-wide installation,
|
||||
| which will create the symlinks in a system directory. Run the
|
||||
| #[code download] or #[code link] command as administrator (on Windows,
|
||||
| simply right-click on your terminal or shell ans select "Run as
|
||||
| you can either right-click on your terminal or shell ans select "Run as
|
||||
| Administrator"), or use a #[code virtualenv] to install spaCy in a user
|
||||
| directory, instead of doing a system-wide installation.
|
||||
|
||||
|
|
|
@ -220,8 +220,8 @@ p
|
|||
|
||||
p
|
||||
| The best way to understand spaCy's dependency parser is interactively.
|
||||
| To make this easier, spaCy v2.0+ comes with a visualization module. Simply
|
||||
| pass a #[code Doc] or a list of #[code Doc] objects to
|
||||
| To make this easier, spaCy v2.0+ comes with a visualization module. You
|
||||
| can pass a #[code Doc] or a list of #[code Doc] objects to
|
||||
| displaCy and run #[+api("top-level#displacy.serve") #[code displacy.serve]] to
|
||||
| run the web server, or #[+api("top-level#displacy.render") #[code displacy.render]]
|
||||
| to generate the raw markup. If you want to know how to write rules that
|
||||
|
|
|
@ -195,7 +195,7 @@ p
|
|||
| lets you explore an entity recognition model's behaviour interactively.
|
||||
| If you're training a model, it's very useful to run the visualization
|
||||
| yourself. To help you do that, spaCy v2.0+ comes with a visualization
|
||||
| module. Simply pass a #[code Doc] or a list of #[code Doc] objects to
|
||||
| module. You can pass a #[code Doc] or a list of #[code Doc] objects to
|
||||
| displaCy and run #[+api("top-level#displacy.serve") #[code displacy.serve]] to
|
||||
| run the web server, or #[+api("top-level#displacy.render") #[code displacy.render]]
|
||||
| to generate the raw markup.
|
||||
|
|
|
@ -274,7 +274,7 @@ p
|
|||
| In spaCy v1.x, you had to add a custom tokenizer by passing it to the
|
||||
| #[code make_doc] keyword argument, or by passing a tokenizer "factory"
|
||||
| to #[code create_make_doc]. This was unnecessarily complicated. Since
|
||||
| spaCy v2.0, you can simply write to #[code nlp.tokenizer]. If your
|
||||
| spaCy v2.0, you can write to #[code nlp.tokenizer] instead. If your
|
||||
| tokenizer needs the vocab, you can write a function and use
|
||||
| #[code nlp.vocab].
|
||||
|
||||
|
|
|
@ -20,14 +20,14 @@ include _install-basics
|
|||
|
||||
p
|
||||
| To download a model directly using #[+a("https://pypi.python.org/pypi/pip") pip],
|
||||
| simply point #[code pip install] to the URL or local path of the archive
|
||||
| point #[code pip install] to the URL or local path of the archive
|
||||
| file. To find the direct link to a model, head over to the
|
||||
| #[+a(gh("spacy-models") + "/releases") model releases], right click on the archive
|
||||
| link and copy it to your clipboard.
|
||||
|
||||
+code(false, "bash").
|
||||
# with external URL
|
||||
pip install #{gh("spacy-models")}/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz
|
||||
pip install #{gh("spacy-models")}/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz
|
||||
|
||||
# with local file
|
||||
pip install /Users/you/en_core_web_md-1.2.0.tar.gz
|
||||
|
@ -69,7 +69,7 @@ p
|
|||
|
||||
p
|
||||
| You can place the #[strong model package directory] anywhere on your
|
||||
| local file system. To use it with spaCy, simply assign it a name by
|
||||
| local file system. To use it with spaCy, assign it a name by
|
||||
| creating a #[+a("#usage") shortcut link] for the data directory.
|
||||
|
||||
+h(3, "usage") Using models with spaCy
|
||||
|
|
|
@ -26,7 +26,7 @@ p
|
|||
p
|
||||
| Because all models are valid Python packages, you can add them to your
|
||||
| application's #[code requirements.txt]. If you're running your own
|
||||
| internal PyPi installation, you can simply upload the models there. pip's
|
||||
| internal PyPi installation, you can upload the models there. pip's
|
||||
| #[+a("https://pip.pypa.io/en/latest/reference/pip_install/#requirements-file-format") requirements file format]
|
||||
| supports both package names to download via a PyPi server, as well as direct
|
||||
| URLs.
|
||||
|
|
|
@ -5,7 +5,7 @@ p
|
|||
| segments it into words, punctuation and so on. This is done by applying
|
||||
| rules specific to each language. For example, punctuation at the end of a
|
||||
| sentence should be split off – whereas "U.K." should remain one token.
|
||||
| Each #[code Doc] consists of individual tokens, and we can simply iterate
|
||||
| Each #[code Doc] consists of individual tokens, and we can iterate
|
||||
| over them:
|
||||
|
||||
+code-exec.
|
||||
|
|
|
@ -72,10 +72,11 @@ p
|
|||
| you want to visualize output from other libraries, like
|
||||
| #[+a("http://www.nltk.org") NLTK] or
|
||||
| #[+a("https://github.com/tensorflow/models/tree/master/research/syntaxnet") SyntaxNet].
|
||||
| Simply convert the dependency parse or recognised entities to displaCy's
|
||||
| format and set #[code manual=True] on either #[code render()] or
|
||||
| #[code serve()]. When setting #[code ents] manually, make sure to supply
|
||||
| them in the right order, i.e. starting with the lowest start position.
|
||||
| If you set #[code manual=True] on either #[code render()] or
|
||||
| #[code serve()], you can pass in data in displaCy's format (instead of
|
||||
| #[code Doc] objects). When setting #[code ents] manually, make sure to
|
||||
| supply them in the right order, i.e. starting with the lowest start
|
||||
| position.
|
||||
|
||||
+aside-code("Example").
|
||||
ex = [{'text': 'But Google is starting from behind.',
|
||||
|
@ -109,7 +110,7 @@ p
|
|||
| If you want to use the visualizers as part of a web application, for
|
||||
| example to create something like our
|
||||
| #[+a(DEMOS_URL + "/displacy") online demo], it's not recommended to
|
||||
| simply wrap and serve the displaCy renderer. Instead, you should only
|
||||
| only wrap and serve the displaCy renderer. Instead, you should only
|
||||
| rely on the server to perform spaCy's processing capabilities, and use
|
||||
| #[+a(gh("displacy")) displaCy.js] to render the JSON-formatted output.
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user