Rewrite usage workflow on saving and loading

2025-11-18 00:35:50 +03:00 · 2017-05-24 20:54:02 +02:00 · 2017-05-24 20:54:02 +02:00 · f4658ff053
commit f4658ff053
parent 764bfa3239
1 changed files with 93 additions and 31 deletions
--- a/website/docs/usage/saving-loading.jade
+++ b/website/docs/usage/saving-loading.jade
@ -10,6 +10,13 @@ include _spacy-101/_serialization
    |  overview of the changes, see #[+a("/docs/usage/v2#incompat") this table]
    |  and the notes on #[+a("/docs/usage/v2#migrating-saving-loading") migrating].
 +h(3, "example-doc") Example: Saving and loading a document
 p
    |  For simplicity, let's assume you've
    |  #[+a("/docs/usage/entity-recognition#setting") added custom entities] to
    |  a #[code Doc], either manually, or by using a
    |  #[+a("/docs/usage/rule-based-matching#on_match") match pattern]. You can
    |  save it locally by calling #[+api("doc#to_disk") #[code Doc.to_disk()]],
    |  and load it again via #[+api("doc#from_disk") #[code Doc.from_disk()]].
    |  This will overwrite the existing object and return it.
@ -99,53 +106,108 @@ p
    |  If you're creating the package manually, keep in mind that the directories
    |  need to be named according to the naming conventions of
    |  #[code lang_name] and #[code lang_name-version].
-    |  #[code lang] setting in the meta.json is also used to create the
+
-    |  respective #[code Language] class in spaCy, which will later be returned
+h(3, "models-custom") Customising the model setup
    |  by the model's #[code load()] method.
 p
-    |  To #[strong build the package], run the following command from within the
+    |  The meta.json includes a #[code setup] key that lets you customise how
-    |  directory. This will create a #[code .tar.gz] archive in a directory
+    |  the model should be initialised and loaded. You can define the language
-    |  #[code /dist]. For more information on building Python packages, see the
+    |  data to be loaded and the
-    |  #[+a("https://setuptools.readthedocs.io/en/latest/") Python Setuptools documentation].
+    |  #[+a("/docs/usage/language-processing-pipeline") processing pipeline] to
    |  execute.
 +table(["Setting", "Type", "Description"])
    +row
        +cell #[code lang]
        +cell unicode
        +cell ID of the language class to initialise.
    +row
        +cell #[code pipeline]
        +cell list
        +cell
            |  A list of strings mapping to the IDs of pipeline factories to
            |  apply in that order. If not set, spaCy's
            |  #[+a("/docs/usage/language-processing/pipelines") default pipeline]
            |  will be used.
 p
    |  The #[code load()] method that comes with our model package
    |  templates will take care of putting all this together and returning a
    |  #[code Language] object with the loaded pipeline and data. If your model
    |  requires custom pipeline components, you should
    |  #[strong ship then with your model] and register their
    |  #[+a("/docs/usage/language-processing-pipeline#creating-factory") factories]
    |  via  #[+api("spacy#set_factory") #[code set_factory()]].
 +aside-code("Factory example").
    def my_factory(vocab):
        # load some state
        def my_component(doc):
            # process the doc
            return doc
        return my_component
 +code.
    spacy.set_factory('custom_component', custom_component_factory)
 +infobox("Custom models with pipeline components")
    |  For more details and an example of how to package a sentiment model
    |  with a custom pipeline component, see the usage workflow on
    |  #[+a("/docs/usage/language-processing-pipeline#example2") language processing pipelines].
 +h(3, "models-building") Building the model package
 p
    |  To build the package, run the following command from within the
    |  directory. For more information on building Python packages, see the
    |  docs on Python's
    |  #[+a("https://setuptools.readthedocs.io/en/latest/") Setuptools].
 +code(false, "bash").
    python setup.py sdist
 p
    |  This will create a #[code .tar.gz] archive in a directory #[code /dist].
    |  The model can be installed by pointing pip to the path of the archive:
 +code(false, "bash").
    pip install /path/to/en_example_model-1.0.0.tar.gz
 p
    |  You can then load the model via its name, #[code en_example_model], or
    |  import it directly as a module and then call its #[code load()] method.
 +h(2, "loading") Loading a custom model package
 p
    |  To load a model from a data directory, you can use
-    |  #[+api("spacy#load") #[code spacy.load()]] with the local path:
+    |  #[+api("spacy#load") #[code spacy.load()]] with the local path. This will
    |  look for a meta.json in the directory and use the #[code setup] details
    |  to initialise a #[code Language] class with a processing pipeline and
    |  load in the model data.
 +code.
    nlp = spacy.load('/path/to/model')
 p
-    |  If you have generated a model package, you can also install it by
+    |  If you want to #[strong load only the binary data], you'll have to create
-    |  pointing pip to the model's #[code .tar.gz] archive – this is pretty
+    |  a #[code Language] class and call
-    |  much exactly what spaCy's #[+api("cli#download") #[code download]]
+    |  #[+api("language#from_disk") #[code from_disk]] instead.
    |  command does under the hood.
 +code(false, "bash").
    pip install /path/to/en_example_model-1.0.0.tar.gz
 +aside-code("Custom model names", "bash").
    # optional: assign custom name to model
    python -m spacy link en_example_model my_cool_model
 p
    |  You'll then be able to load the model via spaCy's loader, or by importing
    |  it as a module. For larger code bases, we usually recommend native
    |  imports, as this will make it easier to integrate models with your
    |  existing build process, continuous integration workflow and testing
    |  framework.
 +code.
-    # option 1: import model as module
+    from spacy.lang.en import English
-    import en_example_model
+    nlp = English().from_disk('/path/to/data')
    nlp = en_example_model.load()
-    # option 2: use spacy.load()
+infobox("Important note: Loading data in v2.x")
-    nlp = spacy.load('en_example_model')
+    .o-block
        |  In spaCy 1.x, the distinction between #[code spacy.load()] and the
        |  #[code Language] class constructor was quite unclear. You could call
        |  #[code spacy.load()] when no model was present, and it would silently
        |  return an empty object. Likewise, you could pass a path to
        |  #[code English], even if the mode required a different language.
        |  spaCy v2.0 solves this with a clear distinction between setting up
        |  the instance and loading the data.
    +code-new nlp = English.from_disk('/path/to/data')
    +code-old nlp = spacy.load('en', path='/path/to/data')