* Add processing.rst reference docs

2026-03-07 21:31:30 +03:00 · 2015-07-08 15:11:09 +02:00 · 2015-07-08 15:11:09 +02:00 · 79abe2860a
commit 79abe2860a
parent b24e8be2b9
1 changed files with 69 additions and 0 deletions
--- a/docs/source/reference/processing.rst
+++ b/docs/source/reference/processing.rst
@ -0,0 +1,69 @@
+===============
+Processing Text
+===============
+
+The text processing API is very small and simple. Everything is a callable object,
+and you will almost always apply the pipeline all at once.
+
+Applying a pipeline
+-------------------
+
+
+.. py:method:: English.__call__(text, tag=True, parse=True, entity=True) --> Tokens
+
+
+text (unicode)
+  The text to be processed.  No pre-processing needs to be applied, and any
+  length of text can be submitted.  Usually you will submit a whole document.
+  Text may be zero-length. An exception is raised if byte strings are supplied.
+
+tag (bool)
+  Whether to apply the part-of-speech tagger. Required for parsing and entity recognition.
+
+parse (bool)
+  Whether to apply the syntactic dependency parser.
+
+entity (bool)
+  Whether to apply the named entity recognizer.
+
+
+**Examples**
+
+    >>> from spacy.en import English
+    >>> nlp = English()
+    >>> doc = nlp(u'Some text.) # Applies tagger, parser, entity
+    >>> doc = nlp(u'Some text.', parse=False) # Applies tagger and entity, not parser
+    >>> doc = nlp(u'Some text.', entity=False) # Applies tagger and parser, not entity
+    >>> doc = nlp(u'Some text.', tag=False) # Does not apply tagger, entity or parser
+    >>> doc = nlp(u'') # Zero-length tokens, not an error
+    >>> doc = nlp(b'Some text') # Error: need unicode
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+      File "spacy/en/__init__.py", line 128, in __call__
+      tokens = self.tokenizer(text)
+    TypeError: Argument 'string' has incorrect type (expected unicode, got str)
+    >>> doc = nlp(b'Some text'.decode('utf8')) # Encode to unicode first.
+    >>>
+
+
+
+
+Tokenizer
+---------
+
+
+.. autoclass:: spacy.tokenizer.Tokenizer
+  :members:
+
+
+Tagger
+------
+
+.. autoclass:: spacy.en.pos.EnPosTagger
+  :members:
+
+Parser and Entity Recognizer
+----------------------------
+
+.. autoclass:: spacy.syntax.parser.Parser
+  :members: