From e626df959fdcbf7a5fbc9d24a86af8e093238c82 Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Fri, 6 May 2022 15:40:59 +0200 Subject: [PATCH] Document different ways to create a pipeline (#10762) * Document different ways to create a pipeline: moved up/slightly modified paragraph on pipeline creation. * Document different ways to create a pipeline: changed Finnish to Ukrainian in example for language without trained pipeline. * Document different ways to create a pipeline: added explanation of blank pipeline. * Document different ways to create a pipeline: exchanged Ukrainian with Yoruba. --- website/docs/usage/models.md | 51 ++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 22 deletions(-) diff --git a/website/docs/usage/models.md b/website/docs/usage/models.md index f82da44d9..56992e7e3 100644 --- a/website/docs/usage/models.md +++ b/website/docs/usage/models.md @@ -27,6 +27,35 @@ import QuickstartModels from 'widgets/quickstart-models.js' +### Usage note + +> If lemmatization rules are available for your language, make sure to install +> spaCy with the `lookups` option, or install +> [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) +> separately in the same environment: +> +> ```bash +> $ pip install -U %%SPACY_PKG_NAME[lookups]%%SPACY_PKG_FLAGS +> ``` + +If a trained pipeline is available for a language, you can download it using the +[`spacy download`](/api/cli#download) command as shown above. In order to use +languages that don't yet come with a trained pipeline, you have to import them +directly, or use [`spacy.blank`](/api/top-level#spacy.blank): + +```python +from spacy.lang.yo import Yoruba +nlp = Yoruba() # use directly +nlp = spacy.blank("yo") # blank instance +``` + +A blank pipeline is typically just a tokenizer. You might want to create a blank +pipeline when you only need a tokenizer, when you want to add more components +from scratch, or for testing purposes. Initializing the language object directly +yields the same result as generating it using `spacy.blank()`. In both cases the +default configuration for the chosen language is loaded, and no pretrained +components will be available. + ## Language support {#languages} spaCy currently provides support for the following languages. You can help by @@ -37,28 +66,6 @@ contribute to development. Also see the [training documentation](/usage/training) for how to train your own pipelines on your data. -> #### Usage note -> -> If a trained pipeline is available for a language, you can download it using -> the [`spacy download`](/api/cli#download) command. In order to use languages -> that don't yet come with a trained pipeline, you have to import them directly, -> or use [`spacy.blank`](/api/top-level#spacy.blank): -> -> ```python -> from spacy.lang.fi import Finnish -> nlp = Finnish() # use directly -> nlp = spacy.blank("fi") # blank instance -> ``` -> -> If lemmatization rules are available for your language, make sure to install -> spaCy with the `lookups` option, or install -> [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) -> separately in the same environment: -> -> ```bash -> $ pip install -U %%SPACY_PKG_NAME[lookups]%%SPACY_PKG_FLAGS -> ``` - import Languages from 'widgets/languages.js'