mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-09 08:00:34 +03:00
Merge branch 'spacy.io' [ci skip]
This commit is contained in:
parent
23eef78a4a
commit
dfb23a419e
|
@ -180,7 +180,7 @@ entirely **in Markdown**, without having to compromise on easy-to-use custom UI
|
||||||
components. We're hoping that the Markdown source will make it even easier to
|
components. We're hoping that the Markdown source will make it even easier to
|
||||||
contribute to the documentation. For more details, check out the
|
contribute to the documentation. For more details, check out the
|
||||||
[styleguide](/styleguide) and
|
[styleguide](/styleguide) and
|
||||||
[source](https://github.com/explosion/spaCy/tree/master/website). While
|
[source](https://github.com/explosion/spacy/tree/v2.x/website). While
|
||||||
converting the pages to Markdown, we've also fixed a bunch of typos, improved
|
converting the pages to Markdown, we've also fixed a bunch of typos, improved
|
||||||
the existing pages and added some new content:
|
the existing pages and added some new content:
|
||||||
|
|
||||||
|
|
|
@ -161,8 +161,8 @@ debugging your tokenizer configuration.
|
||||||
|
|
||||||
spaCy's custom warnings have been replaced with native Python
|
spaCy's custom warnings have been replaced with native Python
|
||||||
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
[`warnings`](https://docs.python.org/3/library/warnings.html). Instead of
|
||||||
setting `SPACY_WARNING_IGNORE`, use the [`warnings`
|
setting `SPACY_WARNING_IGNORE`, use the
|
||||||
filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
[`warnings` filters](https://docs.python.org/3/library/warnings.html#the-warnings-filter)
|
||||||
to manage warnings.
|
to manage warnings.
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
|
@ -176,7 +176,7 @@ import spacy
|
||||||
#### Normalization tables
|
#### Normalization tables
|
||||||
|
|
||||||
The normalization tables have moved from the language data in
|
The normalization tables have moved from the language data in
|
||||||
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang) to the
|
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang) to the
|
||||||
package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
|
package [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data).
|
||||||
If you're adding data for a new language, the normalization table should be
|
If you're adding data for a new language, the normalization table should be
|
||||||
added to `spacy-lookups-data`. See
|
added to `spacy-lookups-data`. See
|
||||||
|
@ -190,8 +190,8 @@ lexemes will be added to the vocab automatically, just as in small models
|
||||||
without vectors.
|
without vectors.
|
||||||
|
|
||||||
To see the number of unique vectors and number of words with vectors, see
|
To see the number of unique vectors and number of words with vectors, see
|
||||||
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
|
`nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000` unique
|
||||||
unique vectors and `684830` words with vectors:
|
vectors and `684830` words with vectors:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
{
|
{
|
||||||
|
@ -210,8 +210,8 @@ for orth in nlp.vocab.vectors:
|
||||||
_ = nlp.vocab[orth]
|
_ = nlp.vocab[orth]
|
||||||
```
|
```
|
||||||
|
|
||||||
If your workflow previously iterated over `nlp.vocab`, a similar alternative
|
If your workflow previously iterated over `nlp.vocab`, a similar alternative is
|
||||||
is to iterate over words with vectors instead:
|
to iterate over words with vectors instead:
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
- lexemes = [w for w in nlp.vocab]
|
- lexemes = [w for w in nlp.vocab]
|
||||||
|
@ -220,9 +220,9 @@ is to iterate over words with vectors instead:
|
||||||
|
|
||||||
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
Be aware that the set of preloaded lexemes in a v2.2 model is not equivalent to
|
||||||
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
the set of words with vectors. For English, v2.2 `md/lg` models have 1.3M
|
||||||
provided lexemes but only 685K words with vectors. The vectors have been
|
provided lexemes but only 685K words with vectors. The vectors have been updated
|
||||||
updated for most languages in v2.2, but the English models contain the same
|
for most languages in v2.2, but the English models contain the same vectors for
|
||||||
vectors for both v2.2 and v2.3.
|
both v2.2 and v2.3.
|
||||||
|
|
||||||
#### Lexeme.is_oov and Token.is_oov
|
#### Lexeme.is_oov and Token.is_oov
|
||||||
|
|
||||||
|
@ -234,8 +234,7 @@ fixed in the next patch release v2.3.1.
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
|
||||||
have a word vector. This is equivalent to `token.orth not in
|
have a word vector. This is equivalent to `token.orth not in nlp.vocab.vectors`.
|
||||||
nlp.vocab.vectors`.
|
|
||||||
|
|
||||||
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
|
||||||
probability and cluster features. The probability and cluster features are no
|
probability and cluster features. The probability and cluster features are no
|
||||||
|
@ -270,8 +269,8 @@ as part of the model vocab.
|
||||||
|
|
||||||
To load the probability table into a provided model, first make sure you have
|
To load the probability table into a provided model, first make sure you have
|
||||||
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
`spacy-lookups-data` installed. To load the table, remove the empty provided
|
||||||
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the
|
`lexeme_prob` table and then access `Lexeme.prob` for any word to load the table
|
||||||
table from `spacy-lookups-data`:
|
from `spacy-lookups-data`:
|
||||||
|
|
||||||
```diff
|
```diff
|
||||||
+ # prerequisite: pip install spacy-lookups-data
|
+ # prerequisite: pip install spacy-lookups-data
|
||||||
|
@ -321,9 +320,9 @@ the [train CLI](/api/cli#train), you can use the new `--tag-map-path` option to
|
||||||
provide in the tag map as a JSON dict.
|
provide in the tag map as a JSON dict.
|
||||||
|
|
||||||
If you want to export a tag map from a provided model for use with the train
|
If you want to export a tag map from a provided model for use with the train
|
||||||
CLI, you can save it as a JSON dict. To only use string keys as required by
|
CLI, you can save it as a JSON dict. To only use string keys as required by JSON
|
||||||
JSON and to make it easier to read and edit, any internal integer IDs need to
|
and to make it easier to read and edit, any internal integer IDs need to be
|
||||||
be converted back to strings:
|
converted back to strings:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import spacy
|
import spacy
|
||||||
|
|
|
@ -303,7 +303,7 @@ lookup-based lemmatization – and **many new languages**!
|
||||||
<Infobox>
|
<Infobox>
|
||||||
|
|
||||||
**API:** [`Language`](/api/language) **Code:**
|
**API:** [`Language`](/api/language) **Code:**
|
||||||
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang)
|
[`spacy/lang`](https://github.com/explosion/spacy/tree/v2.x/spacy/lang)
|
||||||
**Usage:** [Adding languages](/usage/adding-languages)
|
**Usage:** [Adding languages](/usage/adding-languages)
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
Loading…
Reference in New Issue
Block a user