mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			63 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			63 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
=================
 | 
						||
Loading Resources
 | 
						||
=================
 | 
						||
 | 
						||
99\% of the time, you will load spaCy's resources using a language pipeline class,
 | 
						||
e.g. `spacy.en.English`. The pipeline class reads the data from disk, from a
 | 
						||
specified directory.  By default, spaCy installs data into each language's
 | 
						||
package directory, and loads it from there.
 | 
						||
 | 
						||
Usually, this is all you will need:
 | 
						||
 | 
						||
    >>> from spacy.en import English
 | 
						||
    >>> nlp = English()
 | 
						||
 | 
						||
If you need to replace some of the components, you may want to just make your
 | 
						||
own pipeline class --- the English class itself does almost no work; it just
 | 
						||
applies the modules in order. You can also provide a function or class that
 | 
						||
produces a tokenizer, tagger, parser or entity recognizer to :code:`English.__init__`,
 | 
						||
to customize the pipeline:
 | 
						||
 | 
						||
    >>> from spacy.en import English
 | 
						||
    >>> from my_module import MyTagger
 | 
						||
    >>> nlp = English(Tagger=MyTagger)
 | 
						||
 | 
						||
In more detail:
 | 
						||
 | 
						||
.. code::
 | 
						||
 | 
						||
  class English(object):
 | 
						||
      def __init__(self,
 | 
						||
        data_dir=path.join(path.dirname(__file__), 'data'),
 | 
						||
        Tokenizer=Tokenizer.from_dir,
 | 
						||
        Tagger=EnPosTagger,
 | 
						||
        Parser=Createarser(ArcEager),
 | 
						||
        Entity=CreateParser(BiluoNER),
 | 
						||
        load_vectors=True
 | 
						||
      ):
 | 
						||
 | 
						||
:code:`data_dir`
 | 
						||
  :code:`unicode path`
 | 
						||
 | 
						||
  The data directory.  May be None, to disable any data loading (including
 | 
						||
  the vocabulary).
 | 
						||
 | 
						||
:code:`Tokenizer`
 | 
						||
  :code:`(Vocab vocab, unicode data_dir)(unicode) --> Doc`
 | 
						||
  
 | 
						||
  A class/function that creates the tokenizer.
 | 
						||
 | 
						||
:code:`Tagger` / :code:`Parser` / :code:`Entity`
 | 
						||
  :code:`(Vocab vocab, unicode data_dir)(Doc) --> None`
 | 
						||
  
 | 
						||
  A class/function that creates the part-of-speech tagger /
 | 
						||
  syntactic dependency parser / named entity recogniser.
 | 
						||
  May be None or False, to disable tagging.
 | 
						||
 | 
						||
:code:`load_vectors`
 | 
						||
  :code:`bool`
 | 
						||
  A boolean value to control whether the word vectors are loaded.
 | 
						||
 | 
						||
 | 
						||
 |