Merge branch 'spacy.io' [ci skip]

This commit is contained in:
Ines Montani 2021-03-06 17:38:54 +11:00
parent 23eef78a4a
commit dfb23a419e
3 changed files with 18 additions and 19 deletions

View File

@ -180,7 +180,7 @@ entirely **in Markdown**, without having to compromise on easy-to-use custom UI
components. We're hoping that the Markdown source will make it even easier to components. We're hoping that the Markdown source will make it even easier to
contribute to the documentation. For more details, check out the contribute to the documentation. For more details, check out the
[styleguide](/styleguide) and [styleguide](/styleguide) and
[source](https://github.com/explosion/spaCy/tree/master/website). While [source](https://github.com/explosion/spacy/tree/v2.x/website). While
converting the pages to Markdown, we've also fixed a bunch of typos, improved converting the pages to Markdown, we've also fixed a bunch of typos, improved
the existing pages and added some new content: the existing pages and added some new content:

View File

@ -161,8 +161,8 @@ debugging your tokenizer configuration.
spaCy's custom warnings have been replaced with native Python spaCy's custom warnings have been replaced with native Python
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of [`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
setting `SPACY_WARNING_IGNORE`, use the [`warnings` setting `SPACY_WARNING_IGNORE`, use the
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter) [`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
to manage warnings. to manage warnings.
```diff ```diff
@ -176,7 +176,7 @@ import spacy
#### Normalization tables #### Normalization tables
The normalization tables have moved from the language data in The normalization tables have moved from the language data in
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) to the [`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) to the
package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
If you're adding data for a new language, the normalization table should be If you're adding data for a new language, the normalization table should be
added to `spacy-lookups-data`. See added to `spacy-lookups-data`. See
@ -190,8 +190,8 @@ lexemes will be added to the vocab automatically, just as in small models
without vectors. without vectors.
To see the number of unique vectors and number of words with vectors, see To see the number of unique vectors and number of words with vectors, see
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` `nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` unique
unique vectors and `684830` words with vectors: vectors and `684830` words with vectors:
```python ```python
{ {
@ -210,8 +210,8 @@ for orth in nlp.vocab.vectors:
_ = nlp.vocab[orth] _ = nlp.vocab[orth]
``` ```
If your workflow previously iterated over `nlp.vocab`, a similar alternative If your workflow previously iterated over `nlp.vocab`, a similar alternative is
is to iterate over words with vectors instead: to iterate over words with vectors instead:
```diff ```diff
- lexemes = [w for w in nlp.vocab] - lexemes = [w for w in nlp.vocab]
@ -220,9 +220,9 @@ is to iterate over words with vectors instead:
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
provided lexemes but only 685K words with vectors. The vectors have been provided lexemes but only 685K words with vectors. The vectors have been updated
updated for most languages in v2.2, but the English models contain the same for most languages in v2.2, but the English models contain the same vectors for
vectors for both v2.2 and v2.3. both v2.2 and v2.3.
#### Lexeme.is_oov and Token.is_oov #### Lexeme.is_oov and Token.is_oov
@ -234,8 +234,7 @@ fixed in the next patch release v2.3.1.
</Infobox> </Infobox>
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
have a word vector. This is equivalent to `token.orth not in have a word vector. This is equivalent to `token.orth not in nlp.vocab.vectors`.
nlp.vocab.vectors`.
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
probability and cluster features. The probability and cluster features are no probability and cluster features. The probability and cluster features are no
@ -270,8 +269,8 @@ as part of the model vocab.
To load the probability table into a provided model, first make sure you have To load the probability table into a provided model, first make sure you have
`spacy-lookups-data` installed. To load the table, remove the empty provided `spacy-lookups-data` installed. To load the table, remove the empty provided
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the `lexeme_prob` table and then access `Lexeme.prob` for any word to load the table
table from `spacy-lookups-data`: from `spacy-lookups-data`:
```diff ```diff
+ # prerequisite: pip install spacy-lookups-data + # prerequisite: pip install spacy-lookups-data
@ -321,9 +320,9 @@ the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
provide in the tag map as a JSON dict. provide in the tag map as a JSON dict.
If you want to export a tag map from a provided model for use with the train If you want to export a tag map from a provided model for use with the train
CLI, you can save it as a JSON dict. To only use string keys as required by CLI, you can save it as a JSON dict. To only use string keys as required by JSON
JSON and to make it easier to read and edit, any internal integer IDs need to and to make it easier to read and edit, any internal integer IDs need to be
be converted back to strings: converted back to strings:
```python ```python
import spacy import spacy

View File

@ -303,7 +303,7 @@ lookup-based lemmatization and **many new languages**!
<Infobox> <Infobox>
**API:** [`Language`](/api/language) **Code:** **API:** [`Language`](/api/language) **Code:**
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) [`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang)
**Usage:** [Adding languages](/usage/adding-languages) **Usage:** [Adding languages](/usage/adding-languages)
</Infobox> </Infobox>