From 63cd539d04f703a269dd4108cd7a7d7cacd496c9 Mon Sep 17 00:00:00 2001 From: ines Date: Sun, 4 Jun 2017 20:52:10 +0200 Subject: [PATCH] Add more details on model packages and requirements.txt (see #1099) --- website/docs/usage/models.jade | 10 ++++ website/docs/usage/production-use.jade | 69 ++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) diff --git a/website/docs/usage/models.jade b/website/docs/usage/models.jade index c091c9489..51eea37d5 100644 --- a/website/docs/usage/models.jade +++ b/website/docs/usage/models.jade @@ -104,6 +104,16 @@ p | recommend using pip with a direct link, instead of relying on spaCy's | #[+api("cli#download") #[code download]] command. +p + | You can also add the direct download link to your application's + | #[code requirements.txt]. For more information on this, see the + | #[+a("https://pip.pypa.io/en/latest/reference/pip_install/#requirements-file-format") pip documentation]. + | This will only install the package and not trigger any of spaCy's internal + | commands like #[code download] or #[code link]. So you'll have to make + | sure to create a link for your model manually, or + | #[+a("#usage-import") import it as a module] instead. + + +h(3, "download-manual") Manual download and installation p diff --git a/website/docs/usage/production-use.jade b/website/docs/usage/production-use.jade index e9fd4a30f..70227e648 100644 --- a/website/docs/usage/production-use.jade +++ b/website/docs/usage/production-use.jade @@ -76,3 +76,72 @@ p | attributes to set the part-of-speech tags, syntactic dependencies, named | entities and other attributes. For details, see the respective usage | pages. + ++h(2, "models") Working with models + +p + | If your application depends on one or more #[+a("/docs/usage/models") models], + | you'll usually want to integrate them into your continuous integration + | workflow and build process. While spaCy provides a range of useful helpers + | for downloading, linking and loading models, the underlying functionality + | is entirely based on native Python packages. This allows your application + | to handle a model like any other package dependency. + ++h(3, "models-download") Downloading and requiring model dependencies + +p + | spaCy's built-in #[+api("cli#download") #[code download]] command + | is mostly intended as a convenient, interactive wrapper. It performs + | compatibility checks and prints detailed error messages and warnings. + | However, if you're downloading models as part of an automated build + | process, this only adds an unecessary layer of complexity. If you know + | which models your application needs, you should be specifying them directly. + +p + | Because all models are valid Python packages, you can add them to your + | application's #[code requirements.txt]. If you're running your own + | internal PyPi installation, you can simply upload the models there. pip's + | #[+a("https://pip.pypa.io/en/latest/reference/pip_install/#requirements-file-format") requirements file format] + | supports both package names to download via a PyPi server, as well as direct + | URLs. + ++code("requirements.txt", "text"). + spacy>=2.0.0,<3.0.0 + -e #{gh("spacy-models")}/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz + +p + | All models are versioned and specify their spaCy dependency. This ensures + | cross-compatibility and lets you specify exact version requirements for + | each model. If you've trained your own model, you can use the + | #[+api("cli#package") #[code package]] command to generate the required + | meta data and turn it into a loadable package. + ++h(3, "models-loading") Loading and testing models + +p + | Downloading models directly via pip won't call spaCy's link + | #[+api("cli#link") #[code link]] command, which creates + | symlinks for model shortcuts. This means that you'll have to run this + | command separately, or use the native #[code import] syntax to load the + | models: + ++code. + import en_core_web_sm + nlp = en_core_web_sm.load() + +p + | In general, this approach is recommended for larger code bases, as it's + | more "native", and doesn't depend on symlinks or rely on spaCy's loader + | to resolve string names to model packages. If a model can't be + | imported, Python will raise an #[code ImportError] immediately. And if a + | model is imported but not used, any linter will catch that. + +p + | Similarly, it'll give you more flexibility when writing tests that + | require loading models. For example, instead of writing your own + | #[code try] and #[code except] logic around spaCy's loader, you can use + | #[+a("http://pytest.readthedocs.io/en/latest/") pytest]'s + | #[code importorskip()] method to only run a test if a specific model or + | model version is installed. Each model package exposes a #[code __version__] + | attribute which you can also use to perform your own version compatibility + | checks before loading a model.