fix typos

This commit is contained in:
svlandeg 2020-08-17 14:05:48 +02:00
parent 61dfdd9fbd
commit 319692aa53
5 changed files with 105 additions and 106 deletions

View File

@ -43,33 +43,33 @@ can also submit a [regression test](#fixing-bugs) straight away. When you're
opening an issue to report the bug, simply refer to your pull request in the
issue body. A few more tips:
- **Describing your issue:** Try to provide as many details as possible. What
exactly goes wrong? _How_ is it failing? Is there an error?
"XY doesn't work" usually isn't that helpful for tracking down problems. Always
remember to include the code you ran and if possible, extract only the relevant
parts and don't just dump your entire script. This will make it easier for us to
reproduce the error.
- **Describing your issue:** Try to provide as many details as possible. What
exactly goes wrong? _How_ is it failing? Is there an error?
"XY doesn't work" usually isn't that helpful for tracking down problems. Always
remember to include the code you ran and if possible, extract only the relevant
parts and don't just dump your entire script. This will make it easier for us to
reproduce the error.
- **Getting info about your spaCy installation and environment:** If you're
using spaCy v1.7+, you can use the command line interface to print details and
even format them as Markdown to copy-paste into GitHub issues:
`python -m spacy info --markdown`.
- **Getting info about your spaCy installation and environment:** If you're
using spaCy v1.7+, you can use the command line interface to print details and
even format them as Markdown to copy-paste into GitHub issues:
`python -m spacy info --markdown`.
- **Checking the model compatibility:** If you're having problems with a
[statistical model](https://spacy.io/models), it may be because the
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `python -m spacy validate`.
- **Checking the model compatibility:** If you're having problems with a
[statistical model](https://spacy.io/models), it may be because the
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `python -m spacy validate`.
- **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
you can run from within your script or a Jupyter notebook. For some issues, it's
helpful to **include a screenshot** of the visualization. You can simply drag and
drop the image into GitHub's editor and it will be uploaded and included.
- **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
you can run from within your script or a Jupyter notebook. For some issues, it's
helpful to **include a screenshot** of the visualization. You can simply drag and
drop the image into GitHub's editor and it will be uploaded and included.
- **Sharing long blocks of code or logs:** If you need to include long code,
logs or tracebacks, you can wrap them in `<details>` and `</details>`. This
[collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details)
so it only becomes visible on click, making the issue easier to read and follow.
- **Sharing long blocks of code or logs:** If you need to include long code,
logs or tracebacks, you can wrap them in `<details>` and `</details>`. This
[collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details)
so it only becomes visible on click, making the issue easier to read and follow.
### Issue labels
@ -94,39 +94,39 @@ shipped in the core library, and what could be provided in other packages. Our
philosophy is to prefer a smaller core library. We generally ask the following
questions:
- **What would this feature look like if implemented in a separate package?**
Some features would be very difficult to implement externally for example,
changes to spaCy's built-in methods. In contrast, a library of word
alignment functions could easily live as a separate package that depended on
spaCy — there's little difference between writing `import word_aligner` and
`import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement
[custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
and add your own attributes, properties and methods to the `Doc`, `Token` and
`Span`. If you're looking to implement a new spaCy feature, starting with a
custom component package is usually the best strategy. You won't have to worry
about spaCy's internals and you can test your module in an isolated
environment. And if it works well, we can always integrate it into the core
library later.
- **What would this feature look like if implemented in a separate package?**
Some features would be very difficult to implement externally for example,
changes to spaCy's built-in methods. In contrast, a library of word
alignment functions could easily live as a separate package that depended on
spaCy — there's little difference between writing `import word_aligner` and
`import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement
[custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
and add your own attributes, properties and methods to the `Doc`, `Token` and
`Span`. If you're looking to implement a new spaCy feature, starting with a
custom component package is usually the best strategy. You won't have to worry
about spaCy's internals and you can test your module in an isolated
environment. And if it works well, we can always integrate it into the core
library later.
- **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or
TensorFlow/Keras do lots of useful things — but we don't want to have them as
dependencies. If the feature requires functionality in one of these libraries,
it's probably better to break it out into a different package.
- **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or
TensorFlow/Keras do lots of useful things — but we don't want to have them as
dependencies. If the feature requires functionality in one of these libraries,
it's probably better to break it out into a different package.
- **Is the feature orthogonal to the current spaCy functionality, or overlapping?**
spaCy strongly prefers to avoid having 6 different ways of doing the same thing.
As better techniques are developed, we prefer to drop support for "the old way".
However, it's rare that one approach _entirely_ dominates another. It's very
common that there's still a use-case for the "obsolete" approach. For instance,
[WordNet](https://wordnet.princeton.edu/) is still very useful — but word
vectors are better for most use-cases, and the two approaches to lexical
semantics do a lot of the same things. spaCy therefore only supports word
vectors, and support for WordNet is currently left for other packages.
- **Is the feature orthogonal to the current spaCy functionality, or overlapping?**
spaCy strongly prefers to avoid having 6 different ways of doing the same thing.
As better techniques are developed, we prefer to drop support for "the old way".
However, it's rare that one approach _entirely_ dominates another. It's very
common that there's still a use-case for the "obsolete" approach. For instance,
[WordNet](https://wordnet.princeton.edu/) is still very useful — but word
vectors are better for most use-cases, and the two approaches to lexical
semantics do a lot of the same things. spaCy therefore only supports word
vectors, and support for WordNet is currently left for other packages.
- **Do you need the feature to get basic things done?** We do want spaCy to be
at least somewhat self-contained. If we keep needing some feature in our
recipes, that does provide some argument for bringing it "in house".
- **Do you need the feature to get basic things done?** We do want spaCy to be
at least somewhat self-contained. If we keep needing some feature in our
recipes, that does provide some argument for bringing it "in house".
### Getting started
@ -203,10 +203,10 @@ your files on save:
```json
{
"python.formatting.provider": "black",
"[python]": {
"editor.formatOnSave": true
}
"python.formatting.provider": "black",
"[python]": {
"editor.formatOnSave": true
}
}
```
@ -216,7 +216,7 @@ list of available editor integrations.
#### Disabling formatting
There are a few cases where auto-formatting doesn't improve readability for
example, in some of the the language data files like the `tag_map.py`, or in
example, in some of the language data files like the `tag_map.py`, or in
the tests that construct `Doc` objects from lists of words and other labels.
Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting
for that particular code. Here's an example:
@ -397,10 +397,10 @@ Python. If it's not fast enough the first time, just switch to Cython.
### Resources to get you started
- [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (python.org)
- [Official Cython documentation](http://docs.cython.org/en/latest/) (cython.org)
- [Writing C in Cython](https://explosion.ai/blog/writing-c-in-cython) (explosion.ai)
- [Multi-threading spaCys parser and named entity recogniser](https://explosion.ai/blog/multithreading-with-cython) (explosion.ai)
- [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (python.org)
- [Official Cython documentation](http://docs.cython.org/en/latest/) (cython.org)
- [Writing C in Cython](https://explosion.ai/blog/writing-c-in-cython) (explosion.ai)
- [Multi-threading spaCys parser and named entity recogniser](https://explosion.ai/blog/multithreading-with-cython) (explosion.ai)
## Adding tests
@ -440,25 +440,25 @@ simply click on the "Suggest edits" button at the bottom of a page.
We're very excited about all the new possibilities for **community extensions**
and plugins in spaCy v2.0, and we can't wait to see what you build with it!
- An extension or plugin should add substantial functionality, be
**well-documented** and **open-source**. It should be available for users to download
and install as a Python package for example via [PyPi](http://pypi.python.org).
- An extension or plugin should add substantial functionality, be
**well-documented** and **open-source**. It should be available for users to download
and install as a Python package for example via [PyPi](http://pypi.python.org).
- Extensions that write to `Doc`, `Token` or `Span` attributes should be wrapped
as [pipeline components](https://spacy.io/usage/processing-pipelines#custom-components)
that users can **add to their processing pipeline** using `nlp.add_pipe()`.
- Extensions that write to `Doc`, `Token` or `Span` attributes should be wrapped
as [pipeline components](https://spacy.io/usage/processing-pipelines#custom-components)
that users can **add to their processing pipeline** using `nlp.add_pipe()`.
- When publishing your extension on GitHub, **tag it** with the topics
[`spacy`](https://github.com/topics/spacy?o=desc&s=stars) and
[`spacy-extensions`](https://github.com/topics/spacy-extension?o=desc&s=stars)
to make it easier to find. Those are also the topics we're linking to from the
spaCy website. If you're sharing your project on Twitter, feel free to tag
[@spacy_io](https://twitter.com/spacy_io) so we can check it out.
- When publishing your extension on GitHub, **tag it** with the topics
[`spacy`](https://github.com/topics/spacy?o=desc&s=stars) and
[`spacy-extensions`](https://github.com/topics/spacy-extension?o=desc&s=stars)
to make it easier to find. Those are also the topics we're linking to from the
spaCy website. If you're sharing your project on Twitter, feel free to tag
[@spacy_io](https://twitter.com/spacy_io) so we can check it out.
- Once your extension is published, you can open an issue on the
[issue tracker](https://github.com/explosion/spacy/issues) to suggest it for the
[resources directory](https://spacy.io/usage/resources#extensions) on the
website.
- Once your extension is published, you can open an issue on the
[issue tracker](https://github.com/explosion/spacy/issues) to suggest it for the
[resources directory](https://spacy.io/usage/resources#extensions) on the
website.
📖 **For more tips and best practices, see the [checklist for developing spaCy extensions](https://spacy.io/usage/processing-pipelines#extensions).**

View File

@ -489,18 +489,17 @@ network has an internal CNN Tok2Vec layer and uses attention.
> nO = null
> ```
| Name | Type | Description |
| --------------------------- | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `pretrained_vectors` | bool | Whether or not pretrained vectors will be used in addition to the feature vectors. |
| `width` | int | Output dimension of the feature encoding step. |
| `embed_size` | int | Input dimension of the feature encoding step. |
| `conv_depth` | int | Depth of the Tok2Vec layer. |
| `window_size` | int | The number of contextual vectors to [concatenate](https://thinc.ai/docs/api-layers#expand_window) from the left and from the right. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `dropout` | float | The dropout rate. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when |
| `begin_training` is called. |
| Name | Type | Description |
| -------------------- | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `pretrained_vectors` | bool | Whether or not pretrained vectors will be used in addition to the feature vectors. |
| `width` | int | Output dimension of the feature encoding step. |
| `embed_size` | int | Input dimension of the feature encoding step. |
| `conv_depth` | int | Depth of the Tok2Vec layer. |
| `window_size` | int | The number of contextual vectors to [concatenate](https://thinc.ai/docs/api-layers#expand_window) from the left and from the right. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `dropout` | float | The dropout rate. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. |
### spacy.TextCatCNN.v1 {#TextCatCNN}
@ -527,11 +526,11 @@ A neural network model where token vectors are calculated using a CNN. The
vectors are mean pooled and used as features in a feed-forward network. This
architecture is usually less accurate than the ensemble, but runs faster.
| Name | Type | Description |
| ------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | The [`tok2vec`](#tok2vec) layer of the model. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. |
| Name | Type | Description |
| ------------------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | The [`tok2vec`](#tok2vec) layer of the model. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. |
### spacy.TextCatBOW.v1 {#TextCatBOW}
@ -549,12 +548,12 @@ architecture is usually less accurate than the ensemble, but runs faster.
An ngram "bag-of-words" model. This architecture should run much faster than the
others, but may not be as accurate, especially if texts are short.
| Name | Type | Description |
| ------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `no_output_layer` | float | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes=True`, else `Logistic`. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. |
| Name | Type | Description |
| ------------------- | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `exclusive_classes` | bool | Whether or not categories are mutually exclusive. |
| `ngram_size` | int | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. |
| `no_output_layer` | float | Whether or not to add an output layer to the model (`Softmax` activation if `exclusive_classes=True`, else `Logistic`. |
| `nO` | int | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `begin_training` is called. |
## Entity linking architectures {#entitylinker source="spacy/ml/models/entity_linker.py"}

View File

@ -169,7 +169,7 @@ python setup.py build_ext --inplace # compile spaCy
Compared to regular install via pip, the
[`requirements.txt`](https://github.com/explosion/spaCy/tree/master/requirements.txt)
additionally installs developer dependencies such as Cython. See the the
additionally installs developer dependencies such as Cython. See the
[quickstart widget](#quickstart) to get the right commands for your platform and
Python version.

View File

@ -551,9 +551,9 @@ setup(
)
```
After installing the package, the the custom colors will be used when
visualizing text with `displacy`. Whenever the label `SNEK` is assigned, it will
be displayed in `#3dff74`.
After installing the package, the custom colors will be used when visualizing
text with `displacy`. Whenever the label `SNEK` is assigned, it will be
displayed in `#3dff74`.
import DisplaCyEntSnekHtml from 'images/displacy-ent-snek.html'

View File

@ -2,7 +2,7 @@
# With additional functionality: in/not in, replace, pprint, round, + for lists,
# rendering empty dicts
# This script is mostly used to generate the JavaScript function for the
# training quicktart widget.
# training quickstart widget.
import contextlib
import json
import re