mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 18:56:36 +03:00
* Add loading.rst reference
This commit is contained in:
parent
79abe2860a
commit
1a95b490a8
62
docs/source/reference/loading.rst
Normal file
62
docs/source/reference/loading.rst
Normal file
|
@ -0,0 +1,62 @@
|
||||||
|
=================
|
||||||
|
Loading Resources
|
||||||
|
=================
|
||||||
|
|
||||||
|
99\% of the time, you will load spaCy's resources using a language pipeline class,
|
||||||
|
e.g. `spacy.en.English`. The pipeline class reads the data from disk, from a
|
||||||
|
specified directory. By default, spaCy installs data into each language's
|
||||||
|
package directory, and loads it from there.
|
||||||
|
|
||||||
|
Usually, this is all you will need:
|
||||||
|
|
||||||
|
>>> from spacy.en import English
|
||||||
|
>>> nlp = English()
|
||||||
|
|
||||||
|
If you need to replace some of the components, you may want to just make your
|
||||||
|
own pipeline class --- the English class itself does almost no work; it just
|
||||||
|
applies the modules in order. You can also provide a function or class that
|
||||||
|
produces a tokenizer, tagger, parser or entity recognizer to :code:`English.__init__`,
|
||||||
|
to customize the pipeline:
|
||||||
|
|
||||||
|
>>> from spacy.en import English
|
||||||
|
>>> from my_module import MyTagger
|
||||||
|
>>> nlp = English(Tagger=MyTagger)
|
||||||
|
|
||||||
|
In more detail:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
class English(object):
|
||||||
|
def __init__(self,
|
||||||
|
data_dir=path.join(path.dirname(__file__), 'data'),
|
||||||
|
Tokenizer=Tokenizer.from_dir,
|
||||||
|
Tagger=EnPosTagger,
|
||||||
|
Parser=Createarser(ArcEager),
|
||||||
|
Entity=CreateParser(BiluoNER),
|
||||||
|
load_vectors=True
|
||||||
|
):
|
||||||
|
|
||||||
|
:code:`data_dir`
|
||||||
|
:code:`unicode path`
|
||||||
|
|
||||||
|
The data directory. May be None, to disable any data loading (including
|
||||||
|
the vocabulary).
|
||||||
|
|
||||||
|
:code:`Tokenizer`
|
||||||
|
:code:`(Vocab vocab, unicode data_dir)(unicode) --> Tokens`
|
||||||
|
|
||||||
|
A class/function that creates the tokenizer.
|
||||||
|
|
||||||
|
:code:`Tagger` / :code:`Parser` / :code:`Entity`
|
||||||
|
:code:`(Vocab vocab, unicode data_dir)(Tokens) --> None`
|
||||||
|
|
||||||
|
A class/function that creates the part-of-speech tagger /
|
||||||
|
syntactic dependency parser / named entity recogniser.
|
||||||
|
May be None or False, to disable tagging.
|
||||||
|
|
||||||
|
:code:`load_vectors`
|
||||||
|
:code:`bool`
|
||||||
|
A boolean value to control whether the word vectors are loaded.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user